Amazon Glue versions - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Amazon Glue versions

The Amazon Glue version parameter is configured when adding or updating a job. Amazon Glue version determines the versions of Apache Spark and Python that Amazon Glue supports. The Python version indicates the version supported for jobs of type Spark. The following table lists the available Amazon Glue versions, the corresponding Spark and Python versions, and other changes in functionality.

Amazon Glue versions

Amazon Glue version Supported Spark and Python versions Changes in functionality
Amazon Glue 3.0
  • Spark 3.1.1

  • Python 3.7

Amazon Glue 3.0 is the new version of Amazon Glue. In addition to the Spark engine upgrade to 3.0, there are optimizations and upgrades built into this Amazon Glue release, such as:

  • Builds the Amazon Glue ETL Library against Spark 3.0, which is a major release for Spark.

  • Streaming jobs are supported on Amazon Glue 3.0.

  • Includes new Amazon Glue Spark runtime optimizations for performance and reliability:

    • Faster in-memory columnar processing based on Apache Arrow for reading CSV data.

    • SIMD based execution for vectorized reads with CSV data.

    • Spark upgrade also includes additional optimizations developed on Amazon EMR.

    • Upgraded EMRFS from 2.38 to 2.46 enabling new features and bug fixes for Amazon S3 access.

  • Upgraded several dependencies that were required for the new Spark version. See Appendix A: notable dependency upgrades.

  • Upgraded JDBC drivers for our natively supported data sources. See Appendix B: JDBC driver upgrades.

Limitations

The following are limitations with Amazon Glue 3.0:

  • Amazon Glue machine learning transforms are not yet available in Amazon Glue 3.0.

  • Some custom Spark connectors do not work with Amazon Glue 3.0 if they depend on Spark 2.4 and do not have compatibility with Spark 3.1.

For more information about migrating to Amazon Glue version 3.0, see Migrating Amazon Glue jobs to Amazon Glue version 3.0 Actions to migrate to Amazon Glue 3.0.

Amazon Glue 2.0
  • Spark 2.4.3

  • Python 3.7

In addition to the features provided in Amazon Glue version 1.0, Amazon Glue Version 2.0 also provides:

  • An upgraded infrastructure for running Apache Spark ETL jobs in Amazon Glue with reduced startup times.

  • Default logging is now realtime, with separate streams for drivers and executors, and outputs and errors.

  • Support for specifying additional Python modules or different versions at the job level.

Note

Amazon Glue version 2.0 differs from Amazon Glue Version 1.0 for some dependencies and versions due to underlying architectural changes. Please validate your Glue jobs before migrating across major Amazon Glue version releases.

For more information about Amazon Glue Version 2.0 features and limitations, see Running Spark ETL jobs with reduced startup times.

Amazon Glue 1.0
  • Spark 2.4.3

  • Python 2.7

  • Python 3.6

You can maintain job bookmarks for Parquet and ORC formats in Amazon Glue ETL jobs (using Amazon Glue version 1.0). Previously, you were only able to bookmark common Amazon S3 source formats such as JSON, CSV, Apache Avro and XML in Amazon Glue ETL jobs.

When setting format options for ETL inputs and outputs, you can specify to use Apache Avro reader/writer format 1.8 to support Avro logical type reading and writing (using Amazon Glue version 1.0). Previously, only the version 1.7 Avro reader/writer format was supported.

The DynamoDB connection type supports a writer option (using Amazon Glue Version 1.0).

Amazon Glue 0.9
  • Spark 2.2.1

  • Python 2.7

Jobs that were created without specifying a Amazon Glue version default to Amazon Glue 0.9.