Use the EMRFS S3-optimized commit protocol - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Use the EMRFS S3-optimized commit protocol

The EMRFS S3-optimized commit protocol is an alternative FileCommitProtocol implementation that is optimized for writing files with Spark dynamic partition overwrite to Amazon S3 when using EMRFS. The protocol improves application performance by avoiding rename operations in Amazon S3 during the Spark dynamic partition overwrite job commit phase.

Note that the Use the EMRFS S3-optimized committer also improves performance by avoiding rename operations. However, it doesn't work for dynamic partition overwrite cases, while the commit protocol’s improvements only target dynamic partition overwrite cases.

The commit protocol is available with Amazon EMR release 5.30.0 and later and 6.2.0 and later and is enabled by default. Amazon EMR added a parallelism improvement starting with release 5.31.0. The protocol is used for Spark jobs that use Spark, DataFrames, or Datasets. There are circumstances under which the commit protocol is not used. For more information, see Requirements for the EMRFS S3-optimized commit protocol.