Prerequisites for connecting the Data Catalog to external data sources - Amazon Lake Formation
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Prerequisites for connecting the Data Catalog to external data sources

To connect the Amazon Glue Data Catalog to external data sources, register the connection with Lake Formation, and set up federated catalogs, you need to complete the following requirements:

Note

We recommend that a Lake Formation data lake administrator creates the Amazon Glue connections to connect to external data sources, and create the federated catalogs.

  1. Create IAM roles.
    • Create a role that has the necessary permissions to deploy resources (Lambda function, Amazon S3 spill bucket, IAM role, and the Amazon Glue connection) required to create a connection to the external data source.

    • Create a role that has the necessary minimum permissions to access the Amazon Glue connection properties (the Lambda function and the Amazon S3 spill bucket). This is the role that you'll include when you register the connection with Lake Formation.

      To use Lake Formation to manage and secure the data in your data lake, you must register the Amazon Glue connection with Lake Formation. By doing so, Lake Formation can vend credentials to Amazon Athena for querying the federated data sources.

      The role must have Select or Describe permissions on the Amazon S3 bucket and the Lambda function.

      • s3:ListBucket

      • s3:GetObject

      • lambda:InvokeFunction

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:*" ], "Resource": [ "s3://"+"Your_Bucker_name"+"Your_Spill_Prefix/*", "s3://"+"Your_Bucker_name>"+"Your_Spill_Prefix" ] }, { "Sid": "lambdainvoke", "Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": "lambda_function_arn" }, { "Sid": "gluepolicy", "Effect": "Allow", "Action": "glue:*", "Resource": "*" } ] }
    • Add the following trust policy to the IAM role that is used in registering the connection:

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "lakeformation.amazonaws.com", "glue.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }
    • The data lake administrator who registers the connection must have the iam:PassRole permission on the role.

      The following is an inline policy that grants this permission. Replace <account-id> with a valid Amazon account number, and replace <role-name> with the name of the role.

      { "Version": "2012-10-17", "Statement": [ { "Sid": "PassRolePermissions", "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": [ "arn:aws:iam::<account-id>:role/<role-name>" ] } ] }
    • To create federated catalogs in Data Catalog, make sure the IAM role you’re using is a Lake Formation data lake administrator by checking the data lake settings (aws lakeformation get-data-lake-settings).

      If you're not a data lake administrator, you need the Lake Formation CREATE_CATALOG permission to create a catalog. The following example shows how to grant the required permissions to create catalogs.

      aws lakeformation grant-permissions \ --cli-input-json \ '{ "Principal": { "DataLakePrincipalIdentifier":"arn:aws:iam::123456789012:role/non-admin" }, "Resource": { "Catalog": { } }, "Permissions": [ "CREATE_CATALOG", "DESCRIBE" ] }'
  2. Add the following key policy to the Amazon KMS key if you're using a customer managed key to encrypt the data in the data source. Replace the account number with a valid Amazon account number, and specify role name. By default, the data is encrypted using an KMS key. Lake Formation provides an option to create your custom KMS key for encryption. If you're using a customer managed key, you must add specific key policies to the key.

    For more information about managing the permissions of a customer managed key, see Customer managed keys.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "arn:aws:kms:us-east-1:123456789012:key/key-1" } ] }