Onboarding to Lake Formation permissions
Amazon Lake Formation uses the Amazon Glue Data Catalog to store metadata for the Amazon S3 data in the form of databases and tables. Tables store information about the underlying data, including schema information, partition information, and data location. Databases are collections of tables. The Data Catalog also contains resource links, which are links to shared databases and tables in external accounts, and are used for cross-account access to data in the data lake. Each Amazon account has one Data Catalog per Amazon Region.
Lake Formation provides a relational database management system (RDBMS) permissions model to grant or revoke access to databases, tables, and columns in the Data Catalog with underlying data in Amazon S3.
Before you learn about the details of the Lake Formation permissions model, it is helpful to review the following background information:
-
Data lakes managed by Lake Formation reside in designated locations in Amazon Simple Storage Service (Amazon S3).
-
Lake Formation maintains a Data Catalog that contains metadata about source data to be imported into your data lakes, such as data in logs and relational databases, and about data in your data lakes in Amazon S3. The metadata is organized as databases and tables. Metadata tables contain schema, location, partitioning, and other information about the data that they represent. Metadata databases are collections of tables.
-
The Lake Formation Data Catalog is the same Data Catalog used by Amazon Glue. You can use Amazon Glue crawlers to create Data Catalog tables, and you can use Amazon Glue extract, transform, and load (ETL) jobs to populate the underlying data in your data lakes.
-
The databases and tables in the Data Catalog are referred to as Data Catalog resources. Tables in the Data Catalog are referred to as metadata tables to distinguish them from tables in data sources or tabular data in Amazon S3. The data that the metadata tables point to in Amazon S3 or in data sources is referred to as underlying data.
-
A principal is a user or role, an Amazon QuickSight user or group, a user or group that authenticates with Lake Formation through a SAML provider, or for cross-account access control, an Amazon account ID, organization ID, or organizational unit ID.
-
Amazon Glue crawlers create metadata tables, but you can also manually create metadata tables with the Lake Formation console, the API, or the Amazon Command Line Interface (Amazon CLI). When creating a metadata table, you must specify a location. When you create a database, the location is optional. Table locations can be Amazon S3 locations or data source locations such as an Amazon Relational Database Service (Amazon RDS) database. Database locations are always Amazon S3 locations.
-
Services that integrate with Lake Formation, such as Amazon Athena and Amazon Redshift, can access the Data Catalog to obtain metadata and to check authorization for running queries. For a complete list of integrated services, see Amazon service integrations with Lake Formation.
Topics
- Overview of Lake Formation permissions
- Lake Formation personas and IAM permissions reference
- Changing the default settings for your data lake
- Implicit Lake Formation permissions
- Lake Formation permissions reference
- Integrating IAM Identity Center
- Adding an Amazon S3 location to your data lake
- Hybrid access mode
- Creating Data Catalog tables and databases
- Importing data using workflows in Lake Formation