Schedule a crawler to keep the Amazon Glue Data Catalog and Amazon S3 in sync

Amazon Glue crawlers can be set up to run on a schedule or on demand. For more information, see Time-based schedules for jobs and crawlers in the Amazon Glue Developer Guide.

If you have data that arrives for a partitioned table at a fixed time, you can set up an Amazon Glue crawler to run on schedule to detect and update table partitions. This can eliminate the need to run a potentially long and expensive MSCK REPAIR command or manually run an ALTER TABLE ADD PARTITION command. For more information, see Table partitions in the Amazon Glue Developer Guide.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Use multiple data sources with a crawler

Use partition indexing and filtering