Amazon EMR on EKS 6.8.0 releases
The following Amazon EMR 6.8.0 releases are available for Amazon EMR on EKS. Select a specific emr-6.8.0-XXXX release to view more details such as the related container image tag.
Release notes for Amazon EMR 6.8.0
-
Supported applications ‐ Amazon SDK for Java 1.12.170, Spark 3.3.0-amzn-0, Hudi 0.11.1-amzn-0, Iceberg 0.14.0-amzn-0.
-
Supported components ‐
aws-sagemaker-spark-sdk
,emr-ddb
,emr-goodies
,emr-s3-select
,emrfs
,hadoop-client
,hudi
,hudi-spark
,iceberg
,spark-kubernetes
. -
Supported configuration classifications:
Classifications Descriptions core-site
Change values in Hadoop’s core-site.xml file.
emrfs-site
Change EMRFS settings.
spark-metrics
Change values in Spark's metrics.properties file.
spark-defaults
Change values in Spark's spark-defaults.conf file.
spark-env
Change values in the Spark environment.
spark-hive-site
Change values in Spark's hive-site.xml file.
spark-log4j
Change values in Spark's log4j.properties file.
Configuration classifications allow you to customize applications. These often correspond to a configuration XML file for the application, such as
spark-hive-site.xml
. For more information, see Configure Applications.
Notable features
-
Spark3.3.0 ‐ Amazon EMR on EKS 6.8 includes Spark 3.3.0, which supports using separate node selector labels for Spark driver executor pods. These new labels enable you to define the node types for the driver and executor pods separately in the StartJobRun API, without using pod templates.
-
Driver node selector property: spark.kubernetes.driver.node.selector.[labelKey]
-
Executor node selector property: spark.kubernetes.executor.node.selector.[labelKey]
-
-
Enhanced job failure message ‐ This release introduces the configuration
spark.stage.extraDetailsOnFetchFailures.enabled
andspark.stage.extraDetailsOnFetchFailures.maxFailuresToInclude
to track task failures due to user code. These details will be used to enhance the failure message displayed in the driver log when a stage is aborted due to shuffle fetch failure.Property name Default value Meaning Since version spark.stage.extraDetailsOnFetchFailures.enabled
false
If set to
true
, this property is used to enhance the job failure message displayed in the driver log when a stage is aborted due to Shuffle Fetch Failures. By default the last 5 task failures caused by user code is tracked, and the failure error message is appended in the Driver Logs.To increase the number of task failures with user exceptions to track, see the config
spark.stage.extraDetailsOnFetchFailures.maxFailuresToInclude
.emr-6.8
spark.stage.extraDetailsOnFetchFailures.maxFailuresToInclude
5
Number of task failures to track per stage and attempt. This property is used to enhance the job failure message with user exceptions displayed in the driver log when a stage is aborted due to Shuffle Fetch Failures.
This property works only if Config spark.stage.extraDetailsOnFetchFailures.enabled is set to true.
emr-6.8
For more information see the Apache Spark
configuration documentation
Known issue
-
Amazon EMR on EKS 6.8.0 incorrectly populates the build hash in Parquet files metadata generated using Apache Spark
. This issue may cause tools that parse the metadata version string from Parquet files generated by Amazon EMR on EKS 6.8.0 to fail. Customers who parse the version string from Parquet metadata and depend on build hash should switch to a different Amazon EMR version and rewrite the file.
Resolved issue
-
Interrupt Kernel capability for pySpark kernels ‐ In progress interactive workloads that are triggered by executing cells in a notebook can be stopped by using the
Interrupt Kernel
capability. A fix has been introduced so that this functionality works for pySpark kernels. This is also available in open source at Changes for handling interrupts for PySpark Kubernetes Kernel #1115.