Crawler prerequisites

The crawler assumes the permissions of the Amazon Identity and Access Management (IAM) role that you specify when you define it. This IAM role must have permissions to extract data from your data store and write to the Data Catalog. The Amazon Glue console lists only IAM roles that have attached a trust policy for the Amazon Glue principal service. From the console, you can also create an IAM role with an IAM policy to access Amazon S3 data stores accessed by the crawler. For more information about providing roles for Amazon Glue, see Identity-based policies for Amazon Glue.

Note

When crawling a Delta Lake data store, you must have Read/Write permissions to the Amazon S3 location.

For your crawler, you can create a role and attach the following policies:

The AWSGlueServiceRole Amazon managed policy, which grants the required permissions on the Data Catalog
An inline policy that grants permissions on the data source.
An inline policy that grants iam:PassRole permission on the role.

A quicker approach is to let the Amazon Glue console crawler wizard create a role for you. The role that it creates is specifically for the crawler, and includes the AWSGlueServiceRole Amazon managed policy plus the required inline policy for the specified data source.

If you specify an existing role for a crawler, ensure that it includes the AWSGlueServiceRole policy or equivalent (or a scoped down version of this policy), plus the required inline policies. For example, for an Amazon S3 data store, the inline policy would at a minimum be the following:


{
   "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Action": [
              "s3:GetObject"
          ],
          "Resource": [
              "arn:aws-cn:s3:::bucket/object*"
          ]
        }
    ]
}

For an Amazon DynamoDB data store, the policy would at a minimum be the following:


{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:DescribeTable",
        "dynamodb:Scan"
      ],
      "Resource": [
        "arn:aws-cn:dynamodb:region:account-id:table/table-name*"
      ]
    }
  ]
}

In addition, if the crawler reads Amazon Key Management Service (Amazon KMS) encrypted Amazon S3 data, then the IAM role must have decrypt permission on the Amazon KMS key. For more information, see Step 2: Create an IAM role for Amazon Glue.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Supported data sources for crawling

Defining and managing classifers