Using Apache Iceberg with EMR Serverless - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Using Apache Iceberg with EMR Serverless

To use Apache Iceberg with EMR Serverless applications

  1. Set the required Spark properties in the corresponding Spark job run.

  2. Designate either the Amazon Glue Data Catalog as your metastore or configure an external metastore. To learn more about setting up your metastore, see Metastore configuration.

    Configure the metastore properties that you want to use for Iceberg. For example, if you want to use the Amazon Glue Data Catalog, set the following properties in the application configuration. spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory

    When you use the Amazon Glue Data Catalog as your metastore, you can specify the following configuration properties for your Iceberg job.

    --conf spark.jars=/usr/share/aws/iceberg/lib/iceberg-spark3-runtime.jar, --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions, --conf, --conf, --conf --conf spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory

To learn more about Apache Iceberg release versions of EMR, see Iceberg release history.