Document history for Amazon Glue DataBrew Developer Guide
Current API version: databrew-2017-07-25
The following table describes the documentation for this release of Amazon Glue DataBrew. If you want to be notified when the Amazon Glue DataBrew Developer Guide is updated, you can subscribe to the RSS feed.
Change | Description | Date |
---|---|---|
glue:GetCustomEntityType added to Amazon managed policies | This permission is required to execute Amazon Glue DataBrew profile jobs with PII-identification enabled. For more information, see Amazon Glue DataBrew updates to Amazon managed policies. | March 20, 2024 |
Support for multiple hashing algorithms in the CRYPTOGRAPHIC_HASH transformation | You can now specify a hashing algorithm when hashing values in a column. For more information, see CRYPTOGRAPHIC_HASH | August 11, 2023 |
glue:BatchGetCustomEntityTypes added to Amazon managed policies | This permission is required to execute Amazon Glue DataBrew profile jobs with PII-identification enabled. For more information, see Amazon Glue DataBrew updates to Amazon managed policies. | May 9, 2022 |
Support for Apache ORC file format | DataBrew now supports Apache ORC as a file format for DataBrew data sources and outputs. For
more information, see Supported file types for data sources | March 31, 2022 |
Support for cross-account Amazon Glue Data Catalog Amazon S3 access | You can now access Amazon Glue Data Catalog S3 tables from other Amazon Web Services accounts if an appropriate
resource policy is created in the Amazon Glue console. After creating a policy,
the relevant Data Catalog S3 tables can be selected as input sources when creating a
DataBrew dataset. For more information, see Supported connections for data sources and outputs | March 11, 2022 |
Support for native console integration with Amazon AppFlow | DataBrew now has native console integration with Amazon AppFlow. This integration means that you
can connect to data from Salesforce, Zendesk, Slack, ServiceNow, and other
software-as-a-service (SaaS) applications. You can also connect to data from
Amazon Web Services services such as Amazon S3 and Amazon Redshift. For more information, see Supported connections for data sources and outputs | November 18, 2021 |
Support for data quality rules | DataBrew now supports the creation of data quality rules, which are customizable validation
checks that define business requirements for specific data. For more information, see Validating
data quality in Amazon Glue DataBrew | November 18, 2021 |
Support for custom SQL statements | DataBrew now supports custom SQL statements for retrieving data from Amazon Redshift and Snowflake. This
support means that you can use a purpose-built query to select and limit the
data returned from large tables. For more information, see Supported connections for data sources and outputs | November 18, 2021 |
Support for PII detection | DataBrew now supports detection of personally identifiable information (PII). This gives you the option
of masking PII during data preparation. For more information, see Identifying
and handling personally identifiable information (PII) | November 18, 2021 |
Support for additional Amazon Regions | DataBrew now supports additional Amazon Regions. For a list of supported Regions,
see Amazon Glue DataBrew
endpoints and quotas | October 5, 2021 |
Support for writing data to Lake Formation-based Amazon S3 tables | DataBrew now supports writing data into Amazon Glue Data Catalog S3 tables based on Amazon Lake Formation. DataBrew also
now supports writing data into Tableau Hyper format. For more information, see
Creating and working with Amazon Glue DataBrew recipe jobs | August 13, 2021 |
Support for writing data into JDBC destinations | DataBrew now supports writing data directly into JDBC-supported databases and data
warehouses. These include Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle
Database, and PostgreSQL. For more information, see Creating and working with Amazon Glue DataBrew recipe jobs | July 23, 2021 |
Support for specifying which data quality statistics are generated for a profile job | DataBrew now supports specifying which data quality statistics are autogenerated for datasets
in a profile job. For more information, see Creating and working with Amazon Glue DataBrew recipe jobs | July 23, 2021 |
Support for writing datasets into the Amazon Glue Data Catalog | DataBrew now includes support for writing datasets directly into the Amazon Glue Data Catalog. You can choose to store datasets created from jobs that run your data preparation recipes in Amazon S3, Amazon Redshift, and Amazon RDS tables in the Data Catalog. The RDS tables supported include those for Amazon Aurora, RDS for Oracle, RDS for Microsoft SQL Server, RDS for MySQL, and RDS for PostgreSQL. | June 30, 2021 |
Support for identifying advanced data types | DataBrew now includes support to automatically identify and mark advanced data types for columns, which makes it easier to normalize columns that contain certain types of data. These types of data include Social Security number, email address, phone number, gender, credit card, URL, IP address, date and time, currency, ZIP code, country, region, state, and city. | June 30, 2021 |
Support for using Amazon AppFlow to transfer data from SAAS applications | DataBrew now supports using Amazon AppFlow to transfer data into Amazon S3 from third-party
software-as-a-service (SaaS) applications such as Salesforce, Zendesk, Slack,
and ServiceNow. For more information, see Supported connections for data sources and outputs | April 29, 2021 |
Support for creating DataBrew datasets with input from JDBC databases | DataBrew now supports creating datasets from data in JDBC-supported databases and data warehouses,
including Amazon Redshift, Snowflake, Microsoft SQL Server, MySQL, Oracle Database, and PostgreSQL. For more
information, see
Supported
connections for data sources and outputs | April 2, 2021 |
Support for additional Amazon Web Services Regions | DataBrew now supports additional Amazon Web Services Regions. For a list of supported Regions,
see Amazon Glue DataBrew
endpoints and quotas | January 28, 2021 |
New transforms for handling duplication | Four new transforms for handling duplication have been added to the DataBrew console and API. For more information,
see DELETE_DUPLICATE_ROWS | January 28, 2021 |
Additional CSV delimiters | DataBrew now supports additional delimiters besides commas in comma-separated value (CSV)
files used to create DataBrew datasets. For more information, see
Creating and using Amazon Glue DataBrew datasets | January 28, 2021 |
DataBrew extension for JupyterLab | Now you can use Amazon Glue DataBrew as an extension in JupyterLab. For more information, see Using DataBrew as an extension in JupyterLab. | November 20, 2020 |
New data preparation tool: Amazon Glue DataBrew | This is the first release of the Amazon Glue DataBrew Developer Guide. | November 11, 2020 |