Specifying the table location and partitioning level

By default, when a crawler defines tables for data stored in Amazon S3 the crawler attempts to merge schemas together, and create top-level tables (year=2019). In some cases, you may expect the crawler to create a table for the folder month=Jan but instead the crawler creates a partition since a sibling folder (month=Mar) was merged into the same table.

The table level crawler option provides you the flexibility to tell the crawler where the tables are located, and how you want partitions created. When you specify a Table level, the table is created at that absolute level from the Amazon S3 bucket.

Crawler grouping with table level specified as level 2.

When configuring the crawler on the console, you can specify a value for the Table level crawler option. The value must be a positive integer that indicates the table location (the absolute level in the dataset). The level for the top level folder is 1. For example, for the path mydataset/year/month/day/hour, if the level is set to 3, the table is created at location mydataset/year/month.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Creating a single schema for each S3 paths

Specifying a table threshold