Configure cross-account Amazon Glue access in Athena for Spark
This topic shows how consumer account 666666666666
and owner
account 999999999999
can be configured for cross-account Amazon Glue
access. When the accounts are configured, the consumer account can run queries from Athena
for Spark on the owner's Amazon Glue databases and tables.
Step 1: In Amazon Glue, provide access to consumer roles
In Amazon Glue, the owner creates a policy that provides the consumer's roles access to the owner's Amazon Glue data catalog.
To add a Amazon Glue policy that allows a consumer role access to the owner's data catalog
-
Using the catalog owner's account, sign in to the Amazon Web Services Management Console.
Open the Amazon Glue console at https://console.amazonaws.cn/glue/
. -
In the navigation pane, expand Data Catalog, and then choose Catalog settings.
-
On the Data catalog settings page, in the Permissions section, add a policy like the following. This policy provides roles for the consumer account
666666666666
access to the data catalog in the owner account999999999999
.{ "Version": "2012-10-17", "Statement": [ { "Sid": "Cataloguers", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::
666666666666
:role/Admin", "arn:aws:iam::666666666666
:role/AWSAthenaSparkExecutionRole" ] }, "Action": "glue:*", "Resource": [ "arn:aws:glue:us-west-2:999999999999
:catalog", "arn:aws:glue:us-west-2:999999999999
:database/*", "arn:aws:glue:us-west-2:999999999999
:table/*" ] } ] }
Step 2: Configure the consumer account for access
In the consumer account, create a policy to allow access to the owner's Amazon Glue Data Catalog,
databases, and tables, and attach the policy to a role. The following example uses
consumer account 666666666666
.
To create a Amazon Glue policy for access to the owner's Amazon Glue Data Catalog
-
Using the consumer account, sign into the Amazon Web Services Management Console.
Open the IAM console at https://console.amazonaws.cn/iam/
. -
In the navigation pane, expand Access management, and then choose Policies.
-
Choose Create policy.
-
On the Specify permissions page, choose JSON.
-
In the Policy editor, enter a JSON statement like the following that allows Amazon Glue actions on the owner account's data catalog.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "glue:*", "Resource": [ "arn:aws:glue:us-east-1:
999999999999
:catalog", "arn:aws:glue:us-east-1:999999999999
:database/*", "arn:aws:glue:us-east-1:999999999999
:table/*" ] } ] } -
Choose Next.
-
On the Review and create page, for Policy name, enter a name for the policy.
-
Choose Create policy.
Next, you use IAM console in the consumer account to attach the policy that you just created to the IAM role or roles that the consumer account will use to access the owner's data catalog.
To attach the Amazon Glue policy to the roles in the consumer account
-
In the consumer account IAM console navigation pane, choose Roles.
-
On the Roles page, find the role that you want to attach the policy to.
-
Choose Add permissions, and then choose Attach policies.
-
Find the policy that you just created.
-
Select the check box for the policy, and then choose Add permissions.
-
Repeat the steps to add the policy to other roles that you want to use.
Step 3: Configure a session and create a query
In Athena Spark, in the requester account, using the role specified, create a session to test access by creating a notebook or editing a current session. When you configure the session properties, specify one of the following:
-
The Amazon Glue catalog separator – With this approach, you include the owner account ID in your queries. Use this method if you are going to use the session to query data catalogs from different owners.
-
The Amazon Glue catalog ID – With this approach, you query the database directly. This method is more convenient if you are going to use the session to query only a single owner's data catalog.
When you edit the session properties, add the following:
{ "spark.hadoop.aws.glue.catalog.separator": "/" }
When you run a query in a cell, use syntax like that in the following example.
Note that in the FROM
clause, the catalog ID and separator are
required before the database name.
df = spark.sql('SELECT requestip, uri, method, status FROM `
999999999999
/mydatabase`.cloudfront_logs LIMIT 5') df.show()
When you edit the session properties, enter the following property. Replace
999999999999
with the owner account ID.
{ "spark.hadoop.hive.metastore.glue.catalogid": "
999999999999
" }
When you run a query in a cell, use syntax like the following. Note that in
the FROM
clause, the catalog ID and separator are not required
before the database name.
df = spark.sql('SELECT * FROM mydatabase.cloudfront_logs LIMIT 10') df.show()
Additional resources
Configure cross-account access to Amazon Glue data catalogs
Managing cross-account permissions using both Amazon Glue and Lake Formation in the Amazon Lake Formation Developer Guide.
Configure cross-account access to a shared Amazon Glue Data Catalog using Amazon Athena in Amazon Prescriptive Guidance Patterns.