Default unnesting & partitioning behavior

Partition specification and schema unnesting guide

When working with NoSQL data sources like DynamoDB and SaaS applications, data often presents unique challenges for analytics:

Records within the same table may have different schema
Nested records within the same table can be represented differently
Complex nested structures like maps and arrays require transformation for efficient querying
Optimal data organization is needed to ensure query performance at scale

Amazon Glue Zero-ETL integrations address these challenges through two powerful capabilities:

Schema Unnesting: Automatically flattens complex nested data structures into analytics-friendly formats, with configurable levels of unnesting to balance between preserving data structure and optimizing for query simplicity.
Data Partitioning: Organizes data into logical partitions based on specified columns or time-based dimensions, improving query performance and reducing costs by enabling partition pruning during query execution.

In order to query such data sources effectively, Amazon Glue Zero-ETL provides out-of-the-box schema handling and partitioning schemes for source data being replicated in the target Amazon Glue Database. You can configure schema unnesting and partitioning settings for each table through the CreateIntegrationTableProperty API, allowing for fine-tuned control over how data is structured and organized for analytics workloads.

Default unnesting & partitioning behavior

Amazon Glue Zero-ETL defaults to FULL Unnest when no Unnesting options are provided for target table
Amazon Glue Zero-ETL defaults to Bucket partitioning when no PartitionSpec are provided for target table

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Configuring a target

Schema unnesting