Enable the EMRFS S3-optimized committer for Amazon EMR 5.19.0
If you are using Amazon EMR 5.19.0 , you can manually set the
spark.sql.parquet.fs.optimized.committer.optimization-enabled
property to true
when you create a cluster or from within Spark if
you are using Amazon EMR .
Enabling the EMRFS S3-optimized committer when creating a cluster
Use the spark-defaults
configuration classification to set
the
spark.sql.parquet.fs.optimized.committer.optimization-enabled
property to true
. For more information, see Configure applications.
Enabling the EMRFS S3-optimized committer from Spark
You can set
spark.sql.parquet.fs.optimized.committer.optimization-enabled
to true
by hard-coding it in a SparkConf
, passing
it as a --conf
parameter in the Spark shell or
spark-submit
and spark-sql
tools, or in
conf/spark-defaults.conf
. For more information, see Spark
configuration
The following example shows how to enable the committer while running a spark-sql command.
spark-sql \ --conf spark.sql.parquet.fs.optimized.committer.optimization-enabled=true \ -e "INSERT OVERWRITE TABLE target_table SELECT * FROM source_table;"