Encrypting data written by crawlers, jobs, and development endpoints - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Encrypting data written by crawlers, jobs, and development endpoints

A security configuration is a set of security properties that can be used by Amazon Glue. You can use a security configuration to encrypt data at rest. The following scenarios show some of the ways that you can use a security configuration.

  • Attach a security configuration to an Amazon Glue crawler to write encrypted Amazon CloudWatch Logs.

  • Attach a security configuration to an extract, transform, and load (ETL) job to write encrypted Amazon Simple Storage Service (Amazon S3) targets and encrypted CloudWatch Logs.

  • Attach a security configuration to an ETL job to write its jobs bookmarks as encrypted Amazon S3 data.

  • Attach a security configuration to a development endpoint to write encrypted Amazon S3 targets.

Important

Currently, a security configuration overrides any server-side encryption (SSE-S3) setting that is passed as an ETL job parameter. Thus, if both a security configuration and an SSE-S3 parameter are associated with a job, the SSE-S3 parameter is ignored.

For more information about security configurations, see Working with security configurations on the Amazon Glue console.

Setting Up Amazon Glue to use security configurations

Follow these steps to set up your Amazon Glue environment to use security configurations.

  1. Create or update your Amazon Key Management Service (Amazon KMS) keys to grant Amazon KMS permissions to the IAM roles that are passed to Amazon Glue crawlers and jobs to encrypt CloudWatch Logs. For more information, see Encrypt Log Data in CloudWatch Logs Using Amazon KMS in the Amazon CloudWatch Logs User Guide.

    In the following example, "role1", "role2", and "role3" are IAM roles that are passed to crawlers and jobs.

    { "Effect": "Allow", "Principal": { "Service": "logs.region.amazonaws.com", "AWS": [ "role1", "role2", "role3" ] }, "Action": [ "kms:Encrypt*", "kms:Decrypt*", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:Describe*" ], "Resource": "*" }

    The Service statement, shown as "Service": "logs.region.amazonaws.com", is required if you use the key to encrypt CloudWatch Logs.

  2. Ensure that the Amazon KMS key is ENABLED before it is used.

  3. Ensure that the Amazon Glue job includes the following code in the beginning of your script for the security setting to take effect.

    job = Job(glueContext) job.init(args['JOB_NAME'], args)

Creating a route to Amazon KMS for VPC jobs and crawlers

You can connect directly to Amazon KMS through a private endpoint in your virtual private cloud (VPC) instead of connecting over the internet. When you use a VPC endpoint, communication between your VPC and Amazon KMS is conducted entirely within the Amazon network.

You can create an Amazon KMS VPC endpoint within a VPC. Without this step, your jobs or crawlers might fail with a kms timeout on jobs or an internal service exception on crawlers. For detailed instructions, see Connecting to Amazon KMS Through a VPC Endpoint in the Amazon Key Management Service Developer Guide.

As you follow these instructions, on the VPC console, you must do the following:

  • Select Enable Private DNS name.

  • Choose the Security group (with self-referencing rule) that you use for your job or crawler that accesses Java Database Connectivity (JDBC). For more information about Amazon Glue connections, see Defining connections in the Amazon Glue Data Catalog.

When you add a security configuration to a crawler or job that accesses JDBC data stores, Amazon Glue must have a route to the Amazon KMS endpoint. You can provide the route with a network address translation (NAT) gateway or with an Amazon KMS VPC endpoint. To create a NAT gateway, see NAT Gateways in the Amazon VPC User Guide.