Managing the Data Catalog - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Managing the Data Catalog

The Amazon Glue Data Catalog is a central metadata repository that stores structural and operational metadata for your Amazon S3 data sets. Managing the Data Catalog effectively is crucial for maintaining data quality, performance, security, and governance.

By understanding and applying these Data Catalog management practices, you can ensure your metadata remains accurate, performant, secure, and well-governed as your data landscape evolves.

This section covers the following aspects of Data Catalog management:

  • Updating table schema and partitions   As your data evolves, you may need to update the table schema or partition structure defined in the Data Catalog. For more information on how to make these updates programmatically using the Amazon Glue ETL, see Updating the schema, and adding new partitions in the Data Catalog using Amazon Glue ETL jobs.

  • Managing column statistics: Accurate column statistics help optimize query plans and improve performance. For more information on how to generate, update, and manage column statistics, see Optimizing query performance using column statistics.

  • Encrypting the Data Catalog   To protect sensitive metadata, you can encrypt your Data Catalog using Amazon Key Management Service (Amazon KMS). This section explains how to enable and manage encryption for your Data Catalog.

  • Securing the Data Catalog with Amazon Lake Formation   Lake Formation provides a comprehensive approach to data lake security and access control. You can use Lake Formation to secure and govern access to your Data Catalog and underlying data.