Using non-Hive table formats in Amazon Athena for Apache Spark - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using non-Hive table formats in Amazon Athena for Apache Spark

When you work with sessions and notebooks in Athena for Spark, you can use Linux Foundation Delta Lake, Apache Hudi, and Apache Iceberg tables, in addition to Apache Hive tables.

Considerations and limitations

When you use table formats other than Apache Hive with Athena for Spark, consider the following points:

  • In addition to Apache Hive, only one table format is supported per notebook. To use multiple table formats in Athena for Spark, create a separate notebook for each table format. For information about creating notebooks in Athena for Spark, see Creating your own notebook.

  • The Delta Lake, Hudi, and Iceberg table formats have been tested on Athena for Spark by using Amazon Glue as the metastore. You might be able to use other metastores, but such usage is not currently supported.

  • To use the additional table formats, override the default spark_catalog property, as indicated in the Athena console and in this documentation. These non-Hive catalogs can read Hive tables, in addition to their own table formats.

Table versions

The following table shows supported non-Hive table versions in Amazon Athena for Apache Spark.

Table format Supported version
Apache Iceberg 1.2.1
Apache Hudi 0.13
Linux Foundation Delta Lake 2.0.2

In Athena for Spark, these table format .jar files and their dependencies are loaded onto the classpath for Spark drivers and executors.