Document history for Amazon Glue DataBrew Developer Guide - Amazon Glue DataBrew
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Document history for Amazon Glue DataBrew Developer Guide

Current API version: databrew-2017-07-25

The following table describes the documentation for this release of Amazon Glue DataBrew. If you want to be notified when the Amazon Glue DataBrew Developer Guide is updated, you can subscribe to the RSS feed.

ChangeDescriptionDate

glue:GetCustomEntityType added to Amazon managed policies

This permission is required to execute Amazon Glue DataBrew profile jobs with PII-identification enabled. For more information, see Amazon Glue DataBrew updates to Amazon managed policies.

March 20, 2024

Support for multiple hashing algorithms in the CRYPTOGRAPHIC_HASH transformation

You can now specify a hashing algorithm when hashing values in a column. For more information, see CRYPTOGRAPHIC_HASH.

August 11, 2023

glue:BatchGetCustomEntityTypes added to Amazon managed policies

This permission is required to execute Amazon Glue DataBrew profile jobs with PII-identification enabled. For more information, see Amazon Glue DataBrew updates to Amazon managed policies.

May 9, 2022

Support for Apache ORC file format

DataBrew now supports Apache ORC as a file format for DataBrew data sources and outputs. For more information, see Supported file types for data sources.

March 31, 2022

Support for cross-account Amazon Glue Data Catalog Amazon S3 access

You can now access Amazon Glue Data Catalog S3 tables from other Amazon Web Services accounts if an appropriate resource policy is created in the Amazon Glue console. After creating a policy, the relevant Data Catalog S3 tables can be selected as input sources when creating a DataBrew dataset. For more information, see Supported connections for data sources and outputs.

March 11, 2022

Support for native console integration with Amazon AppFlow

DataBrew now has native console integration with Amazon AppFlow. This integration means that you can connect to data from Salesforce, Zendesk, Slack, ServiceNow, and other software-as-a-service (SaaS) applications. You can also connect to data from Amazon Web Services services such as Amazon S3 and Amazon Redshift. For more information, see Supported connections for data sources and outputs.

November 18, 2021

Support for data quality rules

DataBrew now supports the creation of data quality rules, which are customizable validation checks that define business requirements for specific data. For more information, see Validating data quality in Amazon Glue DataBrew.

November 18, 2021

Support for custom SQL statements

DataBrew now supports custom SQL statements for retrieving data from Amazon Redshift and Snowflake. This support means that you can use a purpose-built query to select and limit the data returned from large tables. For more information, see Supported connections for data sources and outputs.

November 18, 2021

Support for PII detection

DataBrew now supports detection of personally identifiable information (PII). This gives you the option of masking PII during data preparation. For more information, see Identifying and handling personally identifiable information (PII).

November 18, 2021

Support for additional Amazon Regions

DataBrew now supports additional Amazon Regions. For a list of supported Regions, see Amazon Glue DataBrew endpoints and quotas.

October 5, 2021

Support for writing data to Lake Formation-based Amazon S3 tables

DataBrew now supports writing data into Amazon Glue Data Catalog S3 tables based on Amazon Lake Formation. DataBrew also now supports writing data into Tableau Hyper format. For more information, see Creating and working with Amazon Glue DataBrew recipe jobs.

August 13, 2021

Support for writing data into JDBC destinations

DataBrew now supports writing data directly into JDBC-supported databases and data warehouses. These include Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle Database, and PostgreSQL. For more information, see Creating and working with Amazon Glue DataBrew recipe jobs.

July 23, 2021

Support for specifying which data quality statistics are generated for a profile job

DataBrew now supports specifying which data quality statistics are autogenerated for datasets in a profile job. For more information, see Creating and working with Amazon Glue DataBrew recipe jobs.

July 23, 2021

Support for writing datasets into the Amazon Glue Data Catalog

DataBrew now includes support for writing datasets directly into the Amazon Glue Data Catalog. You can choose to store datasets created from jobs that run your data preparation recipes in Amazon S3, Amazon Redshift, and Amazon RDS tables in the Data Catalog. The RDS tables supported include those for Amazon Aurora, RDS for Oracle, RDS for Microsoft SQL Server, RDS for MySQL, and RDS for PostgreSQL.

June 30, 2021

Support for identifying advanced data types

DataBrew now includes support to automatically identify and mark advanced data types for columns, which makes it easier to normalize columns that contain certain types of data. These types of data include Social Security number, email address, phone number, gender, credit card, URL, IP address, date and time, currency, ZIP code, country, region, state, and city.

June 30, 2021

Support for using Amazon AppFlow to transfer data from SAAS applications

DataBrew now supports using Amazon AppFlow to transfer data into Amazon S3 from third-party software-as-a-service (SaaS) applications such as Salesforce, Zendesk, Slack, and ServiceNow. For more information, see Supported connections for data sources and outputs.

April 29, 2021

Support for creating DataBrew datasets with input from JDBC databases

DataBrew now supports creating datasets from data in JDBC-supported databases and data warehouses, including Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle Database, and PostgreSQL. For more information, see Supported connections for data sources and outputs.

April 2, 2021

Support for additional Amazon Web Services Regions

DataBrew now supports additional Amazon Web Services Regions. For a list of supported Regions, see Amazon Glue DataBrew endpoints and quotas.

January 28, 2021

New transforms for handling duplication

Four new transforms for handling duplication have been added to the DataBrew console and API. For more information, see DELETE_DUPLICATE_ROWS, FLAG_DUPLICATE_ROWS, FLAG_DUPLICATES_IN_COLUMN, and REMOVE_DUPLICATES in Data quality recipe steps.

January 28, 2021

Additional CSV delimiters

DataBrew now supports additional delimiters besides commas in comma-separated value (CSV) files used to create DataBrew datasets. For more information, see Creating and using Amazon Glue DataBrew datasets.

January 28, 2021

DataBrew extension for JupyterLab

Now you can use Amazon Glue DataBrew as an extension in JupyterLab. For more information, see Using DataBrew as an extension in JupyterLab.

November 20, 2020

New data preparation tool: Amazon Glue DataBrew

This is the first release of the Amazon Glue DataBrew Developer Guide.

November 11, 2020