Enable the EMRFS S3-optimized committer for Amazon EMR 5.19.0 - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Enable the EMRFS S3-optimized committer for Amazon EMR 5.19.0

If you are using Amazon EMR 5.19.0 , you can manually set the spark.sql.parquet.fs.optimized.committer.optimization-enabled property to true when you create a cluster or from within Spark if you are using Amazon EMR .

Enabling the EMRFS S3-optimized committer when creating a cluster

Use the spark-defaults configuration classification to set the spark.sql.parquet.fs.optimized.committer.optimization-enabled property to true. For more information, see Configure applications.

Enabling the EMRFS S3-optimized committer from Spark

You can set spark.sql.parquet.fs.optimized.committer.optimization-enabled to true by hard-coding it in a SparkConf, passing it as a --conf parameter in the Spark shell or spark-submit and spark-sql tools, or in conf/spark-defaults.conf. For more information, see Spark configuration in Apache Spark documentation.

The following example shows how to enable the committer while running a spark-sql command.

spark-sql \ --conf spark.sql.parquet.fs.optimized.committer.optimization-enabled=true \ -e "INSERT OVERWRITE TABLE target_table SELECT * FROM source_table;"