Table optimization prerequisites - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Table optimization prerequisites

The table optimizer assumes the permissions of the Amazon Identity and Access Management (IAM) role that you specify when you enable optimization options (compaction, snapshot retention, and orphan file delettion) for a table. You can either create s single role for all optimizers or create separate roles for each optimizer. The IAM role must have the permissions to read data and update metadata in the Data Catalog. You can create an IAM role and attach the following inline policies:

  • Add the following inline policy that grants Amazon S3 read/write permissions on the location for data that is not registered with Lake Formation. This policy also includes permissions to update the table in the Data Catalog, and to permit Amazon Glue to add logs in Amazon CloudWatch logs and publish metrics. For source data in Amazon S3 that isn't registered with Lake Formation, access is determined by IAM permissions policies for Amazon S3 and Amazon Glue actions.

    In the following inline policies, replace bucket-name with your Amazon S3 bucket name, aws-account-id and region with a valid Amazon account number and Region of the Data Catalog, database_name with the name of your database, and table_name with the name of the table.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::<bucket-name>/*" ] }, { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::<bucket-name>" ] }, { "Effect": "Allow", "Action": [ "glue:UpdateTable", "glue:GetTable" ], "Resource": [ "arn:aws:glue:<region>:<aws-account-id>:table/<database-name>/<table-name>", "arn:aws:glue:<region>:<aws-account-id>:database/<database-name>", "arn:aws:glue:<region>:<aws-account-id>:catalog" ] }, { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": [ "arn:aws:logs:<region>:<aws-account-id>:log-group:/aws-glue/iceberg-compaction/logs:*", "arn:aws:logs:<region>:<aws-account-id>:log-group:/aws-glue/iceberg-retention/logs:*", "arn:aws:logs:<region>:<aws-account-id>:log-group:/aws-glue/iceberg-orphan-file-deletion/logs:*" } ] }
  • Use the following policy to enable compaction for data registered with Lake Formation.

    If the optimization role doesn't have IAM_ALLOWED_PRINCIPALS group permissions granted on the table, the role requires Lake Formation ALTER, DESCRIBE, INSERT and DELETE permissions on the table.

    For more information on registering an Amazon S3 bucket with Lake Formation, see Adding an Amazon S3 location to your data lake.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "lakeformation:GetDataAccess" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "glue:UpdateTable", "glue:GetTable" ], "Resource": [ "arn:aws:glue:<region>:<aws-account-id>:table/<databaseName>/<tableName>", "arn:aws:glue:<region>:<aws-account-id>:database/<database-name>", "arn:aws:glue:<region>:<aws-account-id>:catalog" ] }, { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": [ "arn:aws:logs:<region>:<aws-account-id>:log-group:/aws-glue/iceberg-compaction/logs:*", "arn:aws:logs:<region>:<aws-account-id>:log-group:/aws-glue/iceberg-retention/logs:*", "arn:aws:logs:<region>:<aws-account-id>:log-group:/aws-glue/iceberg-orphan-file-deletion/logs:*" } ] }
  • (Optional) To optimize Iceberg tables with data in Amazon S3 buckets encrypted using Server-side encryption, the compaction role requires permissions to decrypt Amazon S3 objects and generate a new data key to write objects to the encrypted buckets. Add the following policy to the desired Amazon KMS key. We support only bucket-level encryption.

    { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::<aws-account-id>:role/<optimizer-role-name>" }, "Action": [ "kms:Decrypt", "kms:GenerateDataKey" ], "Resource": "*" }
  • (Optional) For data location registered with Lake Formation, the role used to register the location requires permissions to decrypt Amazon S3 objects and generate a new data key to write objects to the encrypted buckets. For more information, see Registering an encrypted Amazon S3 location.

  • (Optional) If the Amazon KMS key is stored in a different Amazon account, you need to include the following permissions to the compaction role.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "kms:Decrypt", "kms:GenerateDataKey" ], "Resource": ["arn:aws:kms:<REGION>:<KEY_OWNER_ACCOUNT_ID>:key/<KEY_ID>" ] } ] }
  • The role you use to run compaction must have the iam:PassRole permission on the role.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "iam:PassRole" ], "Resource": [ "arn:aws:iam::<account-id>:role/<optimizer-role-name>" ] } ] }
  • Add the following trust policy to the role for Amazon Glue service to assume the IAM role to run the compaction process.

    { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": "" }, "Action": "sts:AssumeRole" } ] }