Known issues for Amazon Glue - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Known issues for Amazon Glue

Note the following known issues for Amazon Glue.

Preventing cross-job data access

Consider the situation where you have two Amazon Glue Spark jobs in a single Amazon Web Services Account, each running in a separate Amazon Glue Spark cluster. The jobs are using Amazon Glue connections to access resources in the same virtual private cloud (VPC). In this situation, a job running in one cluster might be able to access the data from the job running in the other cluster.

The following diagram illustrates an example of this situation.


    Amazon Glue job Job-1 in Cluster-1 and Job-2 in
      Cluster-2 are communicating with an Amazon Redshift instance in Subnet-1
     within a VPC. Data is being transferred from Amazon S3 Bucket-1 and
      Bucket-2 to Amazon Redshift.

In the diagram, Amazon Glue Job-1 is running in Cluster-1, and Job-2 is running in Cluster-2. Both jobs are working with the same instance of Amazon Redshift, which resides in Subnet-1 of a VPC. Subnet-1 could be a public or private subnet.

Job-1 is transforming data from Amazon Simple Storage Service (Amazon S3) Bucket-1 and writing the data to Amazon Redshift. Job-2 is doing the same with data in Bucket-2. Job-1 uses the Amazon Identity and Access Management (IAM) role Role-1 (not shown), which gives access to Bucket-1. Job-2 uses Role-2 (not shown), which gives access to Bucket-2.

These jobs have network paths that enable them to communicate with each other's clusters and thus access each other's data. For example, Job-2 could access data in Bucket-1. In the diagram, this is shown as the path in red.

To prevent this situation, we recommend that you attach different security configurations to Job-1 and Job-2. By attaching the security configurations, cross-job access to data is blocked by virtue of certificates that Amazon Glue creates. The security configurations can be dummy configurations. That is, you can create the security configurations without enabling encryption of Amazon S3 data, Amazon CloudWatch data, or job bookmarks. All three encryption options can be disabled.

For information about security configurations, see Encrypting data written by crawlers, jobs, and development endpoints.

To attach a security configuration to a job

  1. Open the Amazon Glue console at https://console.amazonaws.cn/glue/.

  2. On the Configure the job properties page for the job, expand the Security configuration, script libraries, and job parameters section.

  3. Select a security configuration in the list.