

# Using Apache Iceberg with Amazon EMR on EKS
<a name="tutorial-iceberg"></a>

The runtime JAR for Iceberg contains the necessary Iceberg classes for Spark runtime support. The following procedure shows how to start a job run using the Iceberg spark runtime.

**To use Apache Iceberg with Amazon EMR on EKS applications**

1. When you start a job run to submit a Spark job in the application configuration, include the Iceberg spark runtime JAR file:

   ```
   --job-driver '{"sparkSubmitJobDriver" : {"sparkSubmitParameters" : "--jars local:///usr/share/aws/iceberg/lib/iceberg-spark3-runtime.jar"}}'
   ```

1. Include Iceberg additional configuration:

   ```
   --configuration-overrides '{
       "applicationConfiguration": [
       "classification" : "spark-defaults", 
       "properties" : {
           "spark.sql.catalog.dev.warehouse" : "s3://amzn-s3-demo-bucket/EXAMPLE-PREFIX/ ", 
           "spark.sql.extensions ":" org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions ", 
           "spark.sql.catalog.dev" : "org.apache.iceberg.spark.SparkCatalog",
           "spark.sql.catalog.dev.catalog-impl" : "org.apache.iceberg.aws.glue.GlueCatalog",
           "spark.sql.catalog.dev.io-impl": "org.apache.iceberg.aws.s3.S3FileIO"
           }
       ]
   }'
   ```

To learn more about Apache Iceberg release versions of EMR, see [Iceberg release history](https://docs.amazonaws.cn/emr/latest/ReleaseGuide/Iceberg-release-history.html).

## Spark session configurations for catalog integration
<a name="iceberg-with-lake-formation-spark-catalog-integration-lf-eks"></a>

### Spark session configurations for Iceberg Amazon Glue catalog integration
<a name="iceberg-with-lake-formation-spark-catalog-integration-lf-glue"></a>

This sample shows how to integrate Iceberg with the Amazon Glue crawler:

```
spark-sql \
  --conf spark.sql.catalog.rms = org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.rms.type = glue \
  --conf spark.sql.catalog.rms.glue.id = {{glue RMS catalog ID}} \
  --conf spark.sql.catalog.rms.glue.account-id = {{Amazon account ID}} \
  
  --conf spark.sql.extensions=
    org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
```

The following shows a sample query:

```
SELECT * FROM rms.rmsdb.table1
```

### Spark session configurations for Iceberg REST Amazon Glue catalog integration
<a name="iceberg-with-lake-formation-spark-catalog-integration-lf-rest"></a>

This sample shows how to integrate Iceberg REST with the Amazon Glue crawler:

```
spark-sql \
  --conf spark.sql.catalog.rms = org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.rms.type = rest \
  --conf spark.sql.catalog.rms.warehouse = {{glue RMS catalog ID}} \
  --conf spark.sql.catalog.rms.uri = {{glue endpoint URI}}/iceberg \
  --conf spark.sql.catalog.rms.rest.sigv4-enabled = true \
  --conf spark.sql.catalog.rms.rest.signing-name = glue \
  
  --conf spark.sql.extensions=
    org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
```

The following shows a sample query:

```
SELECT * FROM rms.rmsdb.table1
```

This configuration works for Redshift Managed Storage only. FGAC for Amazon S3 isn't supported.