Integrating with other Amazon services
While you can use Amazon Glue crawlers to populate the Amazon Glue Data Catalog, there are several Amazon services that can automatically integrate with and populate the catalog for you. The following sections provide more information about the specific use cases supported by Amazon services that can populate the Data Catalog.
Amazon Lake Formation
Amazon Lake Formation is a service that makes it easier to set up a secure data lake in Amazon. Lake Formation is built on Amazon Glue, and Lake Formation and Amazon Glue share the same Amazon Glue Data Catalog. You can register your Amazon S3 data location with Lake Formation, and use Lake Formation console to create databases and tables in the Amazon Glue Data Catalog, define data access policies, and audit data access across your data lake from a central place. You can use the Lake Formation fine-grained access control to manage your existing Data Catalog resources and Amazon S3 data locations.
With data registered with Lake Formation, you can securely share Data Catalog resources across IAM principals, Amazon accounts, Amazon organizations, and organizational units.
For more information about creating Data Catalog resources using Lake Formation, see Creating Data Catalog tables and databases in the Amazon Lake Formation Developer Guide.
Amazon Athena
Amazon Athena uses the Data Catalog to store and retrieve table metadata for the Amazon S3 data in your Amazon account. The table metadata lets the Athena query engine know how to find, read, and process the data that you want to query.
You can populate the Amazon Glue Data Catalog by using Athena CREATE TABLE
statements directly. You can manually define and populate the schema and partition metadata
in the Data Catalog without needing to run a crawler.
In the Athena console, create a database that will store the table metadata in the Data Catalog.
Use the
CREATE EXTERNAL TABLE
statement to define the schema of your data source.Use the
PARTITIONED BY
clause to define any partition keys if your data is partitioned.Use the
LOCATION
clause to specify the Amazon S3 path where your actual data files are stored.Run the
CREATE TABLE
statement.This query creates the table metadata in the Data Catalog based on your defined schema and partitions, without actually crawling the data.
You can query the table in Athena, and it will use the metadata from the Data Catalog to access and query your data files in Amazon S3.
For more information, see Creating databases and tables in the Amazon Athena User Guide.