Populating the Amazon Glue Data Catalog - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Populating the Amazon Glue Data Catalog

You can populate the Amazon Glue Data Catalog using the following methods:

  • Amazon Glue crawler – An Amazon Glue crawler can automatically discover and catalog data sources like databases, data lakes, and streaming data. The crawlers are the most common and recommended method to populate the Data Catalog as they can automatically discover and infer metadata for a wide variety of data sources.

  • Manually adding metadata – You can manually define databases, tables, and connection details and add them to the Data Catalog using the Amazon Glue console, Lake Formation console, Amazon CLI, or Amazon Glue APIs. Manual entry is useful when you want to catalog data sources that cannot be crawled.

  • Integrating with other Amazon services – You can populate the Data Catalog with metadata from services like Amazon Lake Formation and Amazon Athena. These services can discover and register data sources in the Data Catalog.

  • Populating from an existing metadata repository – If you have an existing metadata store like Apache Hive Metastore, you can use Amazon Glue to import that metadata into the Data Catalog. For more information, see Migration between the Hive Metastore and the Amazon Glue Data Catalog on GitHub.