Documentation history for Amazon Glue - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Documentation history for Amazon Glue

ChangeDescriptionDate

Support for improved shuffle management of your Spark applications

Support is now available for a new Cloud Shuffle Storage Plugin for Apache Spark. For more information, see Amazon Glue Spark shuffle plugin with Amazon S3 and Cloud Shuffle Storage Plugin for Apache Spark.

November 15, 2022

Added support for Data Catalog targets when accelerating crawls Amazon S3 event notifications

In addition to the existing support for Amazon S3 targets, support is now available for accelerating crawls for Data Catalog targets using Amazon S3 event notifications. For more information, see Accelerating Crawls Using Amazon S3 Event Notifications.

October 13, 2022

Support for specifying the maximum number of tables a crawler can create

Support is now available for specifying the maximum number of tables the crawler is allowed to create. For more information, see How to specify the maximum number of tables the crawler is allowed to create.

September 6, 2022

Support for Python 3.9 in Python shell jobs in Amazon Glue

Support is now available for running scripts compatible with Python 3.9 in Python shell jobs in Amazon Glue, and for choosing to use pre-packaged library sets. For more information, see Python shell jobs in Amazon Glue.

August 11, 2022

Support for running non-urgent or non-time sensitive Amazon Glue jobs on spare capacity

Support is now available for the configuration of flexible job runs for non-urgent jobs such as pre-production jobs, testing, and one-time data loads. For more information, see Adding jobs in Amazon Glue.

August 9, 2022

Support for a new worker type for streaming jobs

Support is now available for use of the G.025X worker type for low volume streaming jobs. For more information, see Adding jobs in Amazon Glue.

July 14, 2022

Support for the use of Kafka SASL in Amazon Glue connections

Support is now available for use of Kafka SASL in Amazon Glue connections. For more information, see Amazon Glue Kafka connection properties for client authentication.

July 5, 2022

Support for Apache kafka connector for protobuf schemas

Support is now available for Apache Kafka Connector for Protobuf schemas. For more information, see Amazon Glue Schema Registry.

June 9, 2022

Support for Auto Scaling for Amazon Glue jobs (GA)

Added information on using Auto Scaling for jobs in Amazon Glue version 3.0 to dynamically scale compute resources. For more information, see Using Auto Scaling for Amazon Glue.

April 14, 2022

Update to the documentation for Amazon Glue developing and testing Amazon Glue job scripts

Reorganized and added information on the available development and testing methods for Amazon Glue, including instructions for developing with Docker. For more information, see Developing and testing Amazon Glue job scripts.

March 14, 2022

Addition of protocol buffers (protobuf) as a supported data format for the Amazon Glue schema registry

Added information about Protobuf as a supported data format (in addition to AVRO and JSON). For more information, see Amazon Glue Schema Registry.

February 25, 2022

Support for crawling Delta Lake tables

Added information about using Amazon Glue to crawl Delta Lake tables. For more information, see How to specify configuration options for a Delta Lake data store.

February 24, 2022

Support for Amazon Glue job insights

Added information about using Amazon Glue job insights to simplify job debugging and optimization for your Amazon Glue jobs. For more information, see Monitoring with Amazon Glue job insights.

February 8, 2022

Support for crawling Amazon S3 backed Data Catalog tables using a VPC endpoint

In addition to Amazon S3 data stores, you can configure your Amazon S3 backed Data Catalog tables to be accessed only by an Amazon Virtual Private Cloud environment (Amazon VPC), for security, auditing, or control purposes. For more information, see Crawling an Amazon S3 Data Store or Amazon S3 backed Data Catalog tables using a VPC Endpoint.

February 3, 2022

Support for Lake Formation governed tables

Added information about Amazon Glue support for Lake Formation governed tables, which support ACID transactions, automatic data compaction, and time-travel queries. For more information, see Amazon Glue API and the Amazon Lake Formation developer guide.

November 30, 2021

New Amazon managed policies added for interactive sessions and notebooks

New managed policies for IAM provided enhanced security for using Amazon Glue with interactive sessions and notebooks. For more information, see Amazon Managed (Predefined) Policies for Amazon Glue.

November 30, 2021

Documentation for public preview features

Described features available in preview release for Amazon Glue and Amazon Glue Studio. For more information, see Amazon Glue and Amazon Glue Studio preview features.

November 23, 2021

Glue schema registry now supported with streaming jobs

You can create streaming jobs that access tables that are part of the Glue Schema Registry. For more information see Amazon Glue Schema Registry and Adding Streaming ETL Jobs in Amazon Glue.

November 15, 2021

Support for new machine learning features

Added information about new features for the Find matches machine learning transform, including incremental matching and match scoring. For more information, see Finding Incremental Matches and Estimating the Quality of Matches using Match Confidence Scores.

October 31, 2021

(Private preview) Support for Amazon Glue flex jobs

Added information about configuring Amazon Glue Spark jobs with a flexible execution class, appropriate for time-insensitive jobs whose start and completion times may vary. For more information, see Adding Jobs in Amazon Glue.

October 29, 2021

Support for accelerating crawls using Amazon S3 event notifications

Added information about accelerating crawls using Amazon S3 event notifications. For more information, see Accelerating Crawls Using Amazon S3 Event Notifications.

October 15, 2021

Additional security configuration options related to access-control and VPCs

Added information about how you can configure new access control permissions on Amazon Glue and configuration of VPCs. For more information, see Amazon Tags in Amazon Glue, Identity-Based Policies (IAM Policies) that Control Settings Using Condition Keys or Context Keys, and Configuring all Amazon calls to go through your VPC.

October 13, 2021

Support for VPC endpoint policies

Added information about support for Virtual Private Cloud (VPC) endpoint policies in Amazon Glue. For more information, see Amazon Glue and interface VPC endpoints (Amazon PrivateLink).

October 11, 2021

Documented the Amazon Glue version support policy

Added information about the Amazon Glue version support policy and the end of life phases for certain Amazon Glue versions. For more information, see Amazon Glue version support policy.

September 24, 2021

Support for Amazon Glue interactive sessions (private preview)

(Private preview) Added information about using Amazon Glue interactive sessions to run Spark workloads in the cloud from any Jupyter Notebook. Interactive sessions are the preferred method for developing your Amazon Glue extract, transform, and load (ETL) code when you use Amazon Glue 2.0 or later. For more information, see Setting Up and Running Amazon Glue Interactive Sessions for Jupyter Notebook.

August 24, 2021

Support for creating workflows from blueprints (GA)

Added information about coding common extract, transform, and load (ETL) use cases in blueprints and then creating workflows from blueprints. Enables data analysts to easily create and run complex ETL processes. For more information, see Performing Complex ETL Activities Using blueprints and Workflows in Amazon Glue.

August 23, 2021

Support for Amazon Glue version 3.0.

Added information about support for Amazon Glue version 3.0 which supports the Apache Spark 3.0 engine upgrade for running Apache Spark ETL jobs, and other optimizations and upgrades. For more information, see Amazon Glue Release Notes and Migrating Amazon Glue jobs to Amazon Glue version 3.0. Other features in this release include the Amazon Glue shuffle manager, a SIMD vectorized CSV reader, and catalog partition predicates. For more information see Amazon Glue Spark shuffle manager with Amazon S3, Format Options for ETL Inputs and Outputs in Amazon Glue, and Server-side filtering using catalog partition predicates.

August 18, 2021

Support for starting a workflow with an Amazon EventBridge event

Added information about how Amazon Glue can be an event consumer in an event-driven architecture. For more information, see Starting an Amazon Glue Workflow with an Amazon EventBridge Event and Viewing the EventBridge Events That Started a Workflow.

July 14, 2021

Addition of JSON as a supported data format for the Amazon Glue schema registry

Added information about JSON as a supported data format (in addition to AVRO). For more information, see Amazon Glue Schema Registry.

June 30, 2021

Create Amazon Glue streaming jobs without a Data Catalog table

The create_data_frame_from_options Python function or getSource for Scala scripts support creating streaming ETL jobs that reference the data streams directly instead of requiring a Data Catalog table.

June 15, 2021

Amazon Glue machine learning transforms now support Amazon Key Management Service keys

You can specify a security configuration or Amazon KMS key when configuring Amazon Glue Machine Learning transforms with the console, the CLI, or the Amazon Glue APIs. For more information, see Using Data Encryption with Machine Learning Transforms and Amazon Glue Machine Learning API.

June 15, 2021

Update to the AWSGlueConsoleFullAccess Amazon managed policy

Added information about a minor update to the AWSGlueConsoleFullAccess Amazon managed policy. For more information, see Amazon Glue Updates to Amazon Managed Policies.

June 10, 2021

Support for specifying a value that indicates the table location for the crawler output.

Added information about specifying a value that indicates the table location when configuring the crawler's output. For more information, see How to specify the table location.

June 4, 2021

Support for crawling a sample of files in a dataset when crawling an Amazon S3 data store

Added information about how to crawl a sample of files when crawling Amazon S3. For more information, see Crawler Properties.

May 10, 2021

Support for the Amazon Glue optimized parquet writer

Added information about using the Amazon Glue optimized parquet writer for DynamicFrames to create or update tables with the parquet classification. For more information, see Creating Tables, Updating Schema, and Adding New Partitions in the Data Catalog from Amazon Glue ETL Jobs and Format Options for ETL Inputs and Outputs in Amazon Glue.

May 4, 2021

Support for kafka client authentication passwords

Added information about how streaming ETL jobs in Amazon Glue support SSL client certificate authentication with Apache Kafka stream producers. You can now provide a custom certificate while defining an Amazon Glue connection to an Apache Kafka cluster, which Amazon Glue will use when authenticating with it. For more information, see Amazon Glue Connection Properties and Connection API.

April 28, 2021

Support for consuming data from Amazon Kinesis Data Streams in another account in streaming ETL jobs

Added information about to create a streaming ETL job to consume data from Amazon Kinesis Data Streams in another account. For more information, see Adding Streaming ETL Jobs in Amazon Glue.

March 30, 2021

Support for creating workflows from blueprints (public preview)

(Public preview) Added information about coding common extract, transform, and load (ETL) use cases in blueprints and then creating workflows from blueprints. Enables data analysts to easily create and run complex ETL processes. For more information, see Performing Complex ETL Activities Using blueprints and Workflows in Amazon Glue.

March 22, 2021

Support for column importance metrics for Amazon Glue machine learning transforms

Added information about viewing column importance metrics when working with Amazon Glue machine learning transforms. For more information see Working with Machine Learning Transforms on the Amazon Glue Console

February 5, 2021

Support for running streaming ETL jobs in glue version 2.0

Added information about support for running streaming ETL jobs in Glue version 2.0. For more information, see Adding Streaming ETL Jobs in Amazon Glue.

December 18, 2020

Support for workload partitioning with bounded execution

Added information about enabling workload partitioning to configure the upper bounds on the dataset size, or the number of files processed on ETL job runs. For more information, see Workload Partitioning with Bounded Execution.

November 23, 2020

Support for enhanced partition management

Added information about how to use new APIs to add or delete a partition index to/from an existing table. For more information, see Working with Partition Indexes.

November 23, 2020

Support for the Amazon Glue schema registry

Added information about using the Amazon Glue Schema Registry to centrally discover, control, and evolve schemas. For more information, seeAmazon Glue Schema Registry.

November 19, 2020

Support for the grok input format in streaming ETL jobs

Added information about applying Grok patterns to streaming sources such as log files. For more information, see Applying Grok Patterns to Streaming Sources.

November 17, 2020

Support for adding tags to workflows on the Amazon Glue console

Added information about adding tags when creating a workflow using the Amazon Glue console. For more information, see Creating and Building Out a Workflow Using the Amazon Glue Console.

October 27, 2020

Support for incremental crawler runs

Added information about support for incremental crawler runs, which crawl only Amazon S3 folders added since the last run. For more information, see Incremental Crawls.

October 21, 2020

Support for schema detection for streaming ETL data sources. support for Avro streaming ETL data sources and self-managed kafka

Streaming extract, transform, and load (ETL) jobs in Amazon Glue can now automatically detect the schema of incoming records and handle schema changes on a per-record basis. Self-managed Kafka data sources are now supported. Streaming ETL jobs now support the Avro format in data sources. For more information, see Streaming ETL in Amazon Glue, Defining Job Properties for a Streaming ETL Job, and Notes and Restrictions for Avro Streaming Sources.

October 7, 2020

Support for crawling MongoDB and DocumentDB data sources

Added information about support for crawling MongoDB and Amazon DocumentDB (with MongoDB compatibility) data sources. For more information, see Defining Crawlers.

October 5, 2020

Support for FIPS compliance

Added information about FIPS endpoints for customers who require FIPS 140-2 validated cryptographic modules when accessing data using Amazon Glue. For more information, see FIPS Compliance.

September 23, 2020

Amazon Glue Studio provides an easy to use visual interface for creating and monitoring jobs

You can now use a simple graph-based interface to compose jobs that move and transform data and run them on Amazon Glue. You can then use the job run dashboard in Amazon Glue Studio to monitor ETL execution and ensure that your jobs are operating as intended. For more information, see Amazon Glue Studio User Guide.

September 23, 2020

Support for creating table indexes to improve query performance

Added information about creating table indexes to allow you to retrieve a subset of the partitions from a table. For more information, see Working with Partition Indexes.

September 9, 2020

Support for reduced startup times when running Apache Spark ETL jobs in Amazon Glue version 2.0.

Added information about support for Amazon Glue version 2.0 which provides an upgraded infrastructure for running Apache Spark ETL jobs with reduced startup times, changes in logging, and support for specifying additional Python modules at the job level. For more information, see Amazon Glue Release Notes and Running Spark ETL Jobs with Reduced Startup Times.

August 10, 2020

Support for limiting the number of concurrent workflow runs.

Added information about how to limit the number of concurrent workflow runs for a particular workflow. For more information, see Creating and Building Out a Workflow Using the Amazon Glue Console.

August 10, 2020

Support for crawling an Amazon S3 data store using a VPC endpoint

Added information about configuring your Amazon S3 data store to be accessed only by an Amazon Virtual Private Cloud environment (Amazon VPC), for security, auditing, or control purposes. For more information, see Crawling an Amazon S3 Data Store using a VPC Endpoint.

August 7, 2020

Support for resuming workflow runs

Added information about how to resume workflow runs that only partially completed because one or more nodes (jobs or crawlers) did not complete successfully. For more information, see Repairing and Resuming a Workflow Run.

July 27, 2020

Support for enabling private CA certificates in kafka connections in Amazon Glue.

Added information about new connection options that support enabling private CA certificates for Kafka connections in Amazon Glue. For more information, see Connection Types and Options for ETL in Amazon Glue and Special Parameters Used by Amazon Glue.

July 20, 2020

Support for reading DynamoDB data in another account

Added information about Amazon Glue support for reading data from another Amazon account's DynamoDB table For more information, see Reading from DynamoDB Data in Another Account.

July 17, 2020

Support for a DynamoDB writer connection in Amazon Glue version 1.0 or later

Added information about support for DynamoDB writer, and new or updated connection options for DynamoDB to read or write. For more information, see Connection Types and Options for ETL in Amazon Glue.

July 17, 2020

Support for resource links and for cross-account access control using both Amazon Glue and Lake Formation

Added content about new Data Catalog objects called resource links, and about how to manage sharing Data Catalog resources across accounts with both Amazon Glue and Amazon Lake Formation. For more information, see Granting Cross-Account Access and Table Resource Links.

July 7, 2020

Support for sampling records when crawling DynamoDB data stores

Added information about new properties that you can configure when crawling a DynamoDB data store. For more information, see Crawler Properties.

June 12, 2020

Support for stopping a workflow run.

Added information about how to stop a workflow run for a particular workflow. For more information, see Stopping a Workflow Run.

May 14, 2020

Support for Spark streaming ETL jobs

Added information about creating extract, transform, and load (ETL) jobs with streaming data sources. For more information, see Adding Streaming ETL Jobs in Amazon Glue.

April 27, 2020

Support for creating tables, updating the schema, and adding new partitions in the Data Catalog after running an ETL job

Added information about how you can enable creating tables, updating the schema, and adding new partitions to see the results of your ETL job in the Data Catalog. For more information, see Creating Tables, Updating Schema, and Adding New Partitions in the Data Catalog from Amazon Glue ETL Jobs.

April 2, 2020

Support for specifying a version for the Apache Avro data format as an ETL input and output in Amazon Glue

Added information about specifying a version for the Apache Avro data format as an ETL input and output in Amazon Glue. The default version 1.7. You can use the version format option to specify Avro version 1.8 to enable logical reading/writing. For more information, see Format Options for ETL Inputs and Outputs in Amazon Glue.

March 31, 2020

Support for the EMRFS S3-optimized committer for writing Parquet data into Amazon S3

Added information about how to set a new flag to enable the EMRFR S3-optimized committer for writing Parquet data into Amazon S3 when creating or updating an Amazon Glue job. For more information, see Special Parameters Used by Amazon Glue.

March 30, 2020

Support for machine learning transforms as a resource managed by Amazon resource tags

Added information about using Amazon resource tags to manage and control access to your machine learning transforms in Amazon Glue. You can assign Amazon resource tags to jobs, triggers, endpoints, crawlers, and machine learning transforms in Amazon Glue. For more information, see Amazon Tags in Amazon Glue.

March 2, 2020

Support for non-overrideable job arguments

Added information about support for special job parameters that cannot be overridden in triggers or when you run the job. For more information see Adding Jobs in Amazon Glue.

February 12, 2020

Support for new transforms to work with datasets in Amazon S3

Added information about new transforms (Merge, Purge, and Transition) and Amazon S3 storage class exclusions for Apache Spark applications to work with datasets in Amazon S3. For more information on support for these transforms for Python, see mergeDynamicFrame and Working with Datasets in Amazon S3. For Scala, see mergeDynamicFrames and Amazon Glue Scala GlueContext APIs.

January 16, 2020

Support for updating the Data Catalog with new partition information from an ETL job

Added information about how to code an extract, transform, and load (ETL) script to update the Amazon Glue Data Catalog with new partition information. With this capability, you no longer have to rerun the crawler after job completion to view the new partitions. For more information see Updating the Data Catalog with New Partitions.

January 15, 2020

New tutorial: Using an SageMaker notebook

Added a tutorial that demonstrates how to use an Amazon SageMaker notebook to help develop your ETL and machine learning scripts. See Tutorial: Use an Amazon SageMaker Notebook with Your Development Endpoint.

January 3, 2020

Support for reading from MongoDB and Amazon DocumentDB (with MongoDB compatibility)

Added information about new connection types and connection options for reading from and writing to MongoDB and Amazon DocumentDB (with MongoDB Compatibility). For more information, see Connection Types and Options for ETL in Amazon Glue.

December 17, 2019

Various corrections and clarifications

Added corrections and clarifications throughout. Removed entries from the Known Issues chapter. Added warnings that Amazon Glue supports only symmetrical customer master keys (CMKs) when specifying Data Catalog encryption settings and creating security configurations. Added a note that Amazon Glue does not support writing to Amazon DynamoDB.

December 9, 2019

Support for custom JDBC drivers

Added information about connecting to data sources and targets with JDBC drivers that Amazon Glue does not natively support, such as MySQL version 8 and Oracle Database version 18. For more information see JDBC connectionType Values.

November 25, 2019

Support for connecting SageMaker notebooks to different development endpoints

Added information about how you can connect an SageMaker notebook to different development endpoints. Updates to describe the new console action for switching to a new development endpoint, and the new SageMaker IAM policy. For more information, see Working with Notebooks on the Amazon Glue Console and Create an IAM Policy for Amazon SageMaker Notebooks.

November 21, 2019

Support for Amazon Glue version in machine learning transforms

Added information about defining the Amazon Glue version in a machine learning transform to indicate the which version of Amazon Glue a machine learning transform is compatible with. For more information see Working with Machine Learning Transforms on the Amazon Glue Console.

November 21, 2019

Support for rewinding your job bookmarks

Added information about rewinding your job bookmarks to any previous job run, resulting in the subsequent job run reprocessing data only from the bookmarked job run. Described two new sub-options for the job-bookmark-pause option that allow you to run a job between two bookmarks. For more information, see Tracking Processed Data Using Job Bookmarks and Special Parameters Used by Amazon Glue.

October 22, 2019

Support for custom JDBC certificates for connecting to a data store

Added information about Amazon Glue support of custom JDBC certificates for SSL connections to Amazon Glue data sources or targets. For more information, see Working with Connections on the Amazon Glue Console.

October 10, 2019

Support for Python wheel

Added information about Amazon Glue support of wheel files (along with egg files) as dependencies for Python shell jobs. For more information, see Providing Your Own Python Library.

September 26, 2019

Support for versioning of development endpoints in Amazon Glue

Added information about defining the Glue version in development endpoints. Glue version determines the versions of Apache Spark and Python that Amazon Glue supports. For more information, see Adding a Development Endpoint.

September 19, 2019

Support for monitoring Amazon Glue using Spark UI

Added information about using Apache Spark UI to monitor and debug Amazon Glue ETL jobs running on the Amazon Glue job system, and Spark applications on Amazon Glue development endpoints. For more information, see Monitoring Amazon Glue Using Spark UI.

September 19, 2019

Enhancement of support for local ETL script development using the public Amazon Glue ETL library

Updated the Amazon Glue ETL library content to reflect that Amazon Glue version 1.0 is now supported. For more information, see Developing and Testing ETL Scripts Locally Using the Amazon Glue ETL Library.

September 18, 2019

Support for excluding Amazon S3 storage classes when running jobs

Added information about excluding Amazon S3 storage classes when running Amazon Glue ETL jobs that read files or partitions from Amazon S3. For more information, see Excluding Amazon S3 Storage Classes.

August 29, 2019

Support for local ETL script development using the public Amazon Glue ETL library

Added information about how to develop and test Python and Scala ETL scripts locally without the need for a network connection. For more information, see Developing and Testing ETL Scripts Locally Using the Amazon Glue ETL Library.

August 28, 2019

Known issues

Added information about known issues in Amazon Glue. For more information, see Known Issues for Amazon Glue.

August 28, 2019

Support for machine learning transforms in Amazon Glue

Added information about machine learning capabilities provided by Amazon Glue to create custom transforms. You can create these transforms when you create a job. For more information, see Machine Learning Transforms in Amazon Glue.

August 8, 2019

Support for shared Amazon Virtual Private Cloud

Added information about Amazon Glue support for shared Amazon Virtual Private Cloud. For more information, see Shared Amazon VPCs.

August 6, 2019

Support for versioning in Amazon Glue

Added information about defining the Glue version in job properties. Amazon Glue version determines the versions of Apache Spark and Python that Amazon Glue supports. For more information, see Adding Jobs in Amazon Glue.

July 24, 2019

Support for additional configuration options for development endpoints

Added information about configuration options for development endpoints that have memory-intensive workloads. You can choose from two new configurations that provide more memory per executor. For more information, see Working with Development Endpoints on the Amazon Glue Console.

July 24, 2019

Support for performing extract, transfer, and load (ETL) activities using workflows

Added information about using a new construct called a workflow to design a complex multi-job extract, transform, and load (ETL) activity that Amazon Glue can run and track as a single entity. For more information, see Performing Complex ETL Activities Using Workflows in Amazon Glue.

June 20, 2019

Support for Python 3.6 in Python shell jobs

Added information about support for Python 3.6 in Python shell jobs. You can specify either Python 2.7 or Python 3.6 as a job property. For more information, see Adding Python Shell Jobs inAmazon Glue.

June 5, 2019

Support for virtual private cloud (VPC) endpoints

Added information about connecting directly to Amazon Glue through an interface endpoint in your VPC. When you use a VPC interface endpoint, communication between your VPC and Amazon Glue is conducted entirely and securely within the Amazon network. For more information, see Using Amazon Glue with VPC Endpoints.

June 4, 2019

Support for real-time, continuous logging for Amazon Glue jobs.

Added information about enabling and viewing real-time Apache Spark job logs in CloudWatch including the driver logs, each of the executor logs, and a Spark job progress bar. For more information, see Continuous Logging for Amazon Glue Jobs.

May 28, 2019

Support for existing Data Catalog tables as crawler sources

Added information about specifying a list of existing Data Catalog tables as crawler sources. Crawlers can then detect changes to table schemas, update table definitions, and register new partitions as new data becomes available. For more information, see Crawler Properties.

May 10, 2019

Support for additional configuration options for memory-intensive jobs

Added information about configuration options for Apache Spark jobs with memory-intensive workloads. You can choose from two new configurations that provide more memory per executor. For more information, see Adding Jobs in Amazon Glue.

April 5, 2019

Support for CSV custom classifiers

Added information about using a custom CSV classifier to infer the schema of various types of CSV data. For more information, see Writing Custom Classifiers.

March 26, 2019

Support for Amazon resource tags

Added information about using Amazon resource tags to help you manage and control access to your Amazon Glue resources. You can assign Amazon resource tags to jobs, triggers, endpoints, and crawlers in Amazon Glue. For more information, see Amazon Tags in Amazon Glue.

March 20, 2019

Support of Data Catalog for Spark SQL jobs

Added information about configuring your Amazon Glue jobs and development endpoints to use the Amazon Glue Data Catalog as an external Apache Hive Metastore. This allows jobs and development endpoints to directly run Apache Spark SQL queries against the tables stored in the Amazon Glue Data Catalog. For more information, see Amazon Glue Data Catalog Support for Spark SQL Jobs.

March 14, 2019

Support for Python shell jobs

Added information about Python shell jobs and the new field Maximum capacity. For more information, see Adding Python Shell Jobs in Amazon Glue.

January 18, 2019

Support for notifications when there are changes to databases and tables

Added information about events that are generated for changes to database, table, and partition API calls. You can configure actions in CloudWatch Events to respond to these events. For more information, see Automating Amazon Glue with CloudWatch Events.

January 16, 2019

Support for encrypting connection passwords

Added information about encrypting passwords used in connection objects. For more information, see Encrypting Connection Passwords.

December 11, 2018

Support for resource-level permission and resource-based policies

Added information about using resource-level permissions and resource-based policies with Amazon Glue. For more information, see the topics within Security in Amazon Glue.

October 15, 2018

Support for SageMaker notebooks

Added information about using SageMaker notebooks with Amazon Glue development endpoints. For more information, see Managing Notebooks.

October 5, 2018

Support for encryption

Added information about using encryption with Amazon Glue. For more information, see Encryption at Rest, Encryption in Transit, and Setting Up Encryption in Amazon Glue.

August 24, 2018

Support for Apache Spark job metrics

Added information about the use of Apache Spark metrics for better debugging and profiling of ETL jobs. You can easily track runtime metrics such as bytes read and written, memory usage and CPU load of the driver and executors, and data shuffles among executors from the Amazon Glue console. For more information, see Monitoring Amazon Glue Using CloudWatch Metrics, Job Monitoring and Debugging, and Working with Jobs on the Amazon Glue Console.

July 13, 2018

Support of DynamoDB as a data source

Added information about crawling DynamoDB and using it as a data source of ETL jobs. For more information, see Cataloging Tables with a Crawler and Connection Parameters.

July 10, 2018

Updates to create notebook server procedure

Updated information about how to create a notebook server on an Amazon EC2 instance associated with a development endpoint. For more information, see Creating a Notebook Server Associated with a Development Endpoint.

July 9, 2018

Updates now available over RSS

You can now subscribe to an RSS feed to receive notifications about updates to the Amazon Glue Developer Guide.

June 25, 2018

Support delay notifications for jobs

Added information about configuring a delay threshold when a job runs. For more information, see Adding Jobs in Amazon Glue.

May 25, 2018

Configure a crawler to append new columns

Added information about new configuration option for crawlers, MergeNewColumns. For more information, see Configuring a Crawler.

May 7, 2018

Support timeout of jobs

Added information about setting a timeout threshold when a job runs. For more information, see Adding Jobs in Amazon Glue.

April 10, 2018

Support Scala ETL script and trigger jobs based on additional run states

Added information about using Scala as the ETL programming language. In addition, the trigger API now supports firing when any conditions are met (in addition to all conditions). Also, jobs can be triggered based on a "failed" or "stopped" job run (in addition to a "succeeded" job run).

January 12, 2018

Earlier updates

The following table describes the important changes in each release of the Amazon Glue Developer Guide before January 2018.

Change Description Date
Support XML data sources and new crawler configuration option Added information about classifying XML data sources and new crawler option for partition changes. November 16, 2017
New transforms, support for additional Amazon RDS database engines, and development endpoint enhancements Added information about the map and filter transforms, support for Amazon RDS Microsoft SQL Server, and Amazon RDS Oracle, and new features for development endpoints. September 29, 2017
Amazon Glue initial release This is the initial release of the Amazon Glue Developer Guide. August 14, 2017