Step 1: In Amazon Glue, provide access to consumer roles Step 2: Configure the consumer account for access Step 3: Configure a session and create a query Additional resources

Configure cross-account Amazon Glue access in Athena for Spark

This topic shows how consumer account 666666666666 and owner account 999999999999 can be configured for cross-account Amazon Glue access. When the accounts are configured, the consumer account can run queries from Athena for Spark on the owner's Amazon Glue databases and tables.

Step 1: In Amazon Glue, provide access to consumer roles

In Amazon Glue, the owner creates a policy that provides the consumer's roles access to the owner's Amazon Glue data catalog.

To add a Amazon Glue policy that allows a consumer role access to the owner's data catalog

Using the catalog owner's account, sign in to the Amazon Web Services Management Console.
Open the Amazon Glue console at https://console.amazonaws.cn/glue/.
In the navigation pane, expand Data Catalog, and then choose Catalog settings.

On the Data catalog settings page, in the Permissions section, add a policy like the following. This policy provides roles for the consumer account 666666666666 access to the data catalog in the owner account 999999999999.


{ 
  "Version": "2012-10-17", 
  "Statement": [ 
    { 
      "Sid": "Cataloguers", 
      "Effect": "Allow", 
      "Principal": { 
        "AWS": [ 
          "arn:aws:iam::666666666666:role/Admin", 
          "arn:aws:iam::666666666666:role/AWSAthenaSparkExecutionRole" 
        ] 
      }, 
      "Action": "glue:*", 
      "Resource": [ 
        "arn:aws:glue:us-west-2:999999999999:catalog", 
        "arn:aws:glue:us-west-2:999999999999:database/*", 
        "arn:aws:glue:us-west-2:999999999999:table/*" 
      ] 
    } 
  ] 
}

Step 2: Configure the consumer account for access

In the consumer account, create a policy to allow access to the owner's Amazon Glue Data Catalog, databases, and tables, and attach the policy to a role. The following example uses consumer account 666666666666.

To create a Amazon Glue policy for access to the owner's Amazon Glue Data Catalog

Using the consumer account, sign into the Amazon Web Services Management Console.
Open the IAM console at https://console.amazonaws.cn/iam/.
In the navigation pane, expand Access management, and then choose Policies.
Choose Create policy.
On the Specify permissions page, choose JSON.

In the Policy editor, enter a JSON statement like the following that allows Amazon Glue actions on the owner account's data catalog.


{ 
    "Version": "2012-10-17", 
    "Statement": [ 
        { 
            "Effect": "Allow", 
            "Action": "glue:*", 
            "Resource": [ 
                "arn:aws:glue:us-east-1:999999999999:catalog", 
                "arn:aws:glue:us-east-1:999999999999:database/*", 
                "arn:aws:glue:us-east-1:999999999999:table/*" 
            ] 
        } 
    ] 
}

Choose Next.
On the Review and create page, for Policy name, enter a name for the policy.
Choose Create policy.

Next, you use IAM console in the consumer account to attach the policy that you just created to the IAM role or roles that the consumer account will use to access the owner's data catalog.

To attach the Amazon Glue policy to the roles in the consumer account

In the consumer account IAM console navigation pane, choose Roles.
On the Roles page, find the role that you want to attach the policy to.
Choose Add permissions, and then choose Attach policies.
Find the policy that you just created.
Select the check box for the policy, and then choose Add permissions.
Repeat the steps to add the policy to other roles that you want to use.

Step 3: Configure a session and create a query

In Athena Spark, in the requester account, using the role specified, create a session to test access by creating a notebook or editing a current session. When you configure the session properties, specify one of the following:

The Amazon Glue catalog separator – With this approach, you include the owner account ID in your queries. Use this method if you are going to use the session to query data catalogs from different owners.
The Amazon Glue catalog ID – With this approach, you query the database directly. This method is more convenient if you are going to use the session to query only a single owner's data catalog.

When you edit the session properties, add the following:


{ 
    "spark.hadoop.aws.glue.catalog.separator": "/" 
}

When you run a query in a cell, use syntax like that in the following example. Note that in the FROM clause, the catalog ID and separator are required before the database name.


df = spark.sql('SELECT requestip, uri, method, status FROM `999999999999/mydatabase`.cloudfront_logs LIMIT 5') 
df.show()

When you edit the session properties, enter the following property. Replace 999999999999 with the owner account ID.


{ 
    "spark.hadoop.hive.metastore.glue.catalogid": "999999999999" 
}

When you run a query in a cell, use syntax like the following. Note that in the FROM clause, the catalog ID and separator are not required before the database name.


df = spark.sql('SELECT * FROM mydatabase.cloudfront_logs LIMIT 10') 
df.show()

Additional resources

Configure cross-account access to Amazon Glue data catalogs

Managing cross-account permissions using both Amazon Glue and Lake Formation in the Amazon Lake Formation Developer Guide.

Configure cross-account access to a shared Amazon Glue Data Catalog using Amazon Athena in Amazon Prescriptive Guidance Patterns.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Enable Spark encryption

Service quotas