How to prevent the crawler from changing an existing schema - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

How to prevent the crawler from changing an existing schema

If you don't want a crawler to overwrite updates you made to existing fields in an Amazon S3 table definition, choose the option on the console to Add new columns only or set the configuration option MergeNewColumns. This applies to tables and partitions, unless Partitions.AddOrUpdateBehavior is overridden to InheritFromTable.

If you don't want a table schema to change at all when a crawler runs, set the schema change policy to LOG. You can also set a configuration option that sets partition schemas to inherit from the table.

If you are configuring the crawler on the console, you can choose the following actions:

  • Ignore the change and don't update the table in the Data Catalog

  • Update all new and existing partitions with metadata from the table

When you configure the crawler using the API, set the following parameters:

  • Set the UpdateBehavior field in SchemaChangePolicy structure to LOG.

  • Set the Configuration field with a string representation of the following JSON object in the crawler API; for example:

    { "Version": 1.0, "CrawlerOutput": { "Partitions": { "AddOrUpdateBehavior": "InheritFromTable" } } }