Prerequisites for generating column statistics
To generate or update column statistics, the statistics generation task assumes an Amazon Identity and Access Management (IAM) role on your behalf. Based on the permissions granted to the role, the column statistics generation task can read the data from the Amazon S3 data store.
When you configure the column statistics generation task, Amazon Glue allows you to create a
role that includes the AWSGlueServiceRole
Amazon managed policy plus the
required inline policy for the specified data source.
If you specify an existing role for generating column statistics, ensure that it
includes the AWSGlueServiceRole
policy or equivalent (or a scoped down
version of this policy), plus the required inline policies. Follow these steps to create
a new IAM role:
Note
To generate statistics for tables managed by Lake Formation, the IAM role used to generate statistics requires full table access.
When you configure the column statistics generation task, Amazon Glue allows you to create a
role that includes the AWSGlueServiceRole
Amazon managed policy plus the
required inline policy for the specified data source.
You can also create a role and attach the the permissions listed in the policy below, and add that role
to the column statistics generation task.
To create an IAM role for generating column statistics
-
To create an IAM role, see Create an IAM role for Amazon Glue.
-
To update an existing role, in the IAM console, go to the IAM role that is being used by the generate column statistics process.
-
In the Add permissions section, choose Attach policies. In the newly opened browser window, choose
AWSGlueServiceRole
Amazon managed policy. -
You also need to include permissions to read data from the Amazon S3 data location.
In the Add permissions section, choose Create policy. In the newly opened browser window, create a new policy to use with your role.
-
In the Create policy page, choose the JSON tab. Copy the following
JSON
code into the policy editor field.Note
In the following policies, replace account ID with a valid Amazon Web Services account, and replace
region
with the Region of the table, andbucket-name
with the Amazon S3 bucket name.{ "Version": "2012-10-17", "Statement": [ { "Sid": "S3BucketAccess", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject" ], "Resource": [ "arn:aws:s3:::
<bucket-name>
/*", "arn:aws:s3:::<bucket-name>
" ] } ] } (Optional) If you're using Lake Formation permissions to provide access to your data, the IAM role requires
lakeformation:GetDataAccess
permissions.{ "Version": "2012-10-17", "Statement": [ { "Sid": "LakeFormationDataAccess", "Effect": "Allow", "Action": "lakeformation:GetDataAccess", "Resource": [ "*" ] } ] }
If the Amazon S3 data location is registered with Lake Formation, and the IAM role assumed by the column statistics generation task doesn't have
IAM_ALLOWED_PRINCIPALS
group permissions granted on the table, the role requires Lake FormationALTER
andDESCRIBE
permissions on the table. The role used for registering the Amazon S3 bucket requires Lake FormationINSERT
andDELETE
permissions on the table.If the Amazon S3 data location is not registered with Lake Formation, and the IAM role doesn't have
IAM_ALLOWED_PRINCIPALS
group permissions granted on the table, the role requires Lake FormationALTER
,DESCRIBE
,INSERT
andDELETE
permissions on the table.-
If you've enabled the catalog-level
Automatic statistics generation
option, the IAM role must have theglue:UpdateCatalog
permission or the Lake FormationALTER CATALOG
permission on the default Data Catalog. You can use theGetCatalog
operation to verify the catalog properties. -
(Optional) The column statistics generation task that writes encrypted Amazon CloudWatch Logs requires the following permissions in the key policy.
{ "Version": "2012-10-17", "Statement": [{ "Sid": "CWLogsKmsPermissions", "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents", "logs:AssociateKmsKey" ], "Resource": [ "arn:aws:logs:
<region>
:111122223333
:log-group:/aws-glue:*" ] }, { "Sid": "KmsPermissions", "Effect": "Allow", "Action": [ "kms:GenerateDataKey", "kms:Decrypt", "kms:Encrypt" ], "Resource": [ "arn:aws:kms:<region>
:111122223333
:key/"arn of key used for ETL cloudwatch encryption
" ], "Condition": { "StringEquals": { "kms:ViaService": ["glue.<region>
.amazonaws.com"] } } } ] } -
The role you use to run column statistics must have the
iam:PassRole
permission on the role.{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": [ "arn:aws:iam::
111122223333
:role/<columnstats-role-name>
" ] }] } -
When you create an IAM role for generating column statistics, that role must also have the following trust policy that enables the service to assume the role.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "TrustPolicy", "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole", } ] }