Prerequisites for generating column statistics - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Prerequisites for generating column statistics

To generate or update column statistics, the statistics generation task assumes an Amazon Identity and Access Management (IAM) role on your behalf. Based on the permissions granted to the role, the column statistics generation task can read the data from the Amazon S3 data store.

Note

To generate statistics for tables managed by Lake Formation, the IAM role used to generate statistics requires full table access.

To use role-based access control, you must create an IAM role with the permissions listed in the policy below, and add that role to the column statistics generation task.

To create an IAM role for generating column statistics
  1. To create an IAM role, see Create an IAM role for Amazon Glue.

  2. To update an existing role, in the IAM console, go to the IAM role that is being used by the generate column statistics process.

  3. In the Add permissions section, choose Attach policies. In the newly opened browser window, choose AWSGlueServiceRole Amazon managed policy.

  4. You also need to include permissions to read data from the Amazon S3 data location.

    In the Add permissions section, choose Create policy. In the newly opened browser window, create a new policy to use with your role.

  5. In the Create policy page, choose the JSON tab. Copy the following JSON code into the policy editor field.

    Note

    In the following policies, replace account ID with a valid Amazon Web Services account, and replace region with the Region of the table, and bucket-name with the Amazon S3 bucket name.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "S3BucketAccess", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject" ], "Resource": [ "arn:aws:s3:::<bucket-name>/*", "arn:aws:s3:::<bucket-name>" ] } ] }
  6. (Optional) If you're using Lake Formation permissions to provide access to your data, the IAM role requires lakeformation:GetDataAccess permissions.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "LakeFormationDataAccess", "Effect": "Allow", "Action": "lakeformation:GetDataAccess", "Resource": [ "*" ] } ] }

    If the Amazon S3 data location is registered with Lake Formation, and the IAM role assumed by the column statistics generation task doesn't have IAM_ALLOWED_PRINCIPALS group permissions granted on the table, the role requires Lake Formation ALTER and DESCRIBE permissions on the table. The role used for registering the Amazon S3 bucket requires Lake Formation INSERT and DELETE permissions on the table.

    If the Amazon S3 data location is not registered with Lake Formation, and the IAM role doesn't have IAM_ALLOWED_PRINCIPALS group permissions granted on the table, the role requires Lake Formation ALTER, DESCRIBE, INSERT and DELETE permissions on the table.

  7. (Optional) The column statistics generation task that writes encrypted Amazon CloudWatch Logs requires the following permissions in the key policy.

    { "Version": "2012-10-17", "Statement": [{ "Sid": "CWLogsKmsPermissions", "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:AssociateKmsKey" ], "Resource": [ "arn:aws:logs:<region>:111122223333:log-group:/aws-glue:*" ] }, { "Sid": "KmsPermissions", "Effect": "Allow", "Action": [ "kms:GenerateDataKey", "kms:Decrypt", "kms:Encrypt" ], "Resource": [ "arn:aws:kms:<region>:111122223333:key/"arn of key used for ETL cloudwatch encryption" ], "Condition": { "StringEquals": { "kms:ViaService": ["glue.<region>.amazonaws.com"] } } } ] }
  8. The role you use to run column statistics must have the iam:PassRole permission on the role.

    { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": [ "arn:aws:iam::111122223333:role/<columnstats-role-name>" ] }] }
  9. When you create an IAM role for generating column statistics, that role must also have the following trust policy that enables the service to assume the role.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "TrustPolicy", "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole", } ] }