Crawler prerequisites - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Crawler prerequisites

The crawler assumes the permissions of the Amazon Identity and Access Management (IAM) role that you specify when you define it. This IAM role must have permissions to extract data from your data store and write to the Data Catalog. The Amazon Glue console lists only IAM roles that have attached a trust policy for the Amazon Glue principal service. From the console, you can also create an IAM role with an IAM policy to access Amazon S3 data stores accessed by the crawler. For more information about providing roles for Amazon Glue, see Identity-based policies (IAM policies) for access control.

Note

When crawling a Delta Lake data store, you must have Read/Write permissions to the Amazon S3 location.

For your crawler, you can create a role and attach the following policies:

  • The AWSGlueServiceRole Amazon managed policy, which grants the required permissions on the Data Catalog

  • An inline policy that grants permissions on the data source.

A quicker approach is to let the Amazon Glue console crawler wizard create a role for you. The role that it creates is specifically for the crawler, and includes the AWSGlueServiceRole Amazon managed policy plus the required inline policy for the specified data source.

If you specify an existing role for a crawler, ensure that it includes the AWSGlueServiceRole policy or equivalent (or a scoped down version of this policy), plus the required inline policies. For example, for an Amazon S3 data store, the inline policy would at a minimum be the following:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws-cn:s3:::bucket/object*" ] } ] }

For an Amazon DynamoDB data store, the policy would at a minimum be the following:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "dynamodb:DescribeTable", "dynamodb:Scan" ], "Resource": [ "arn:aws-cn:dynamodb:region:account-id:table/table-name*" ] } ] }

In addition, if the crawler reads Amazon Key Management Service (Amazon KMS) encrypted Amazon S3 data, then the IAM role must have decrypt permission on the Amazon KMS key. For more information, see Step 2: Create an IAM role for Amazon Glue.