Document history for Amazon Glue DataBrew Developer Guide

Current API version: databrew-2017-07-25

The following table describes the documentation for this release of Amazon Glue DataBrew. If you want to be notified when the Amazon Glue DataBrew Developer Guide is updated, you can subscribe to the RSS feed.

Change	Description	Date
glue:GetCustomEntityType added to Amazon managed policies	This permission is required to execute Amazon Glue DataBrew profile jobs with PII-identification enabled. For more information, see Amazon Glue DataBrew updates to Amazon managed policies.	March 20, 2024
Support for multiple hashing algorithms in the CRYPTOGRAPHIC_HASH transformation	You can now specify a hashing algorithm when hashing values in a column. For more information, see CRYPTOGRAPHIC_HASH.	August 11, 2023
glue:BatchGetCustomEntityTypes added to Amazon managed policies	This permission is required to execute Amazon Glue DataBrew profile jobs with PII-identification enabled. For more information, see Amazon Glue DataBrew updates to Amazon managed policies.	May 9, 2022
Support for Apache ORC file format	DataBrew now supports Apache ORC as a file format for DataBrew data sources and outputs. For more information, see Supported file types for data sources.	March 31, 2022
Support for cross-account Amazon Glue Data Catalog Amazon S3 access	You can now access Amazon Glue Data Catalog S3 tables from other Amazon Web Services accounts if an appropriate resource policy is created in the Amazon Glue console. After creating a policy, the relevant Data Catalog S3 tables can be selected as input sources when creating a DataBrew dataset. For more information, see Supported connections for data sources and outputs.	March 11, 2022
Support for native console integration with Amazon AppFlow	DataBrew now has native console integration with Amazon AppFlow. This integration means that you can connect to data from Salesforce, Zendesk, Slack, ServiceNow, and other software-as-a-service (SaaS) applications. You can also connect to data from Amazon Web Services services such as Amazon S3 and Amazon Redshift. For more information, see Supported connections for data sources and outputs.	November 18, 2021
Support for data quality rules	DataBrew now supports the creation of data quality rules, which are customizable validation checks that define business requirements for specific data. For more information, see Validating data quality in Amazon Glue DataBrew.	November 18, 2021
Support for custom SQL statements	DataBrew now supports custom SQL statements for retrieving data from Amazon Redshift and Snowflake. This support means that you can use a purpose-built query to select and limit the data returned from large tables. For more information, see Supported connections for data sources and outputs.	November 18, 2021
Support for PII detection	DataBrew now supports detection of personally identifiable information (PII). This gives you the option of masking PII during data preparation. For more information, see Identifying and handling personally identifiable information (PII).	November 18, 2021
Support for additional Amazon Regions	DataBrew now supports additional Amazon Regions. For a list of supported Regions, see Amazon Glue DataBrew endpoints and quotas.	October 5, 2021
Support for writing data to Lake Formation-based Amazon S3 tables	DataBrew now supports writing data into Amazon Glue Data Catalog S3 tables based on Amazon Lake Formation. DataBrew also now supports writing data into Tableau Hyper format. For more information, see Creating and working with Amazon Glue DataBrew recipe jobs.	August 13, 2021
Support for writing data into JDBC destinations	DataBrew now supports writing data directly into JDBC-supported databases and data warehouses. These include Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle Database, and PostgreSQL. For more information, see Creating and working with Amazon Glue DataBrew recipe jobs.	July 23, 2021
Support for specifying which data quality statistics are generated for a profile job	DataBrew now supports specifying which data quality statistics are autogenerated for datasets in a profile job. For more information, see Creating and working with Amazon Glue DataBrew recipe jobs.	July 23, 2021
Support for writing datasets into the Amazon Glue Data Catalog	DataBrew now includes support for writing datasets directly into the Amazon Glue Data Catalog. You can choose to store datasets created from jobs that run your data preparation recipes in Amazon S3, Amazon Redshift, and Amazon RDS tables in the Data Catalog. The RDS tables supported include those for Amazon Aurora, RDS for Oracle, RDS for Microsoft SQL Server, RDS for MySQL, and RDS for PostgreSQL.	June 30, 2021
Support for identifying advanced data types	DataBrew now includes support to automatically identify and mark advanced data types for columns, which makes it easier to normalize columns that contain certain types of data. These types of data include Social Security number, email address, phone number, gender, credit card, URL, IP address, date and time, currency, ZIP code, country, region, state, and city.	June 30, 2021
Support for using Amazon AppFlow to transfer data from SAAS applications	DataBrew now supports using Amazon AppFlow to transfer data into Amazon S3 from third-party software-as-a-service (SaaS) applications such as Salesforce, Zendesk, Slack, and ServiceNow. For more information, see Supported connections for data sources and outputs.	April 29, 2021
Support for creating DataBrew datasets with input from JDBC databases	DataBrew now supports creating datasets from data in JDBC-supported databases and data warehouses, including Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle Database, and PostgreSQL. For more information, see Supported connections for data sources and outputs.	April 2, 2021
Support for additional Amazon Web Services Regions	DataBrew now supports additional Amazon Web Services Regions. For a list of supported Regions, see Amazon Glue DataBrew endpoints and quotas.	January 28, 2021
New transforms for handling duplication	Four new transforms for handling duplication have been added to the DataBrew console and API. For more information, see DELETE_DUPLICATE_ROWS, FLAG_DUPLICATE_ROWS, FLAG_DUPLICATES_IN_COLUMN, and REMOVE_DUPLICATES in Data quality recipe steps.	January 28, 2021
Additional CSV delimiters	DataBrew now supports additional delimiters besides commas in comma-separated value (CSV) files used to create DataBrew datasets. For more information, see Creating and using Amazon Glue DataBrew datasets.	January 28, 2021
DataBrew extension for JupyterLab	Now you can use Amazon Glue DataBrew as an extension in JupyterLab. For more information, see Using DataBrew as an extension in JupyterLab.	November 20, 2020
New data preparation tool: Amazon Glue DataBrew	This is the first release of the Amazon Glue DataBrew Developer Guide.	November 11, 2020

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Quotas and constraints

Amazon Glossary