Trusted Identity Propagation with Amazon Glue ETL - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Trusted Identity Propagation with Amazon Glue ETL

With IAM Identity Center, you can connect to identity providers (IdPs) and centrally manage access for users and groups across Amazon analytics services. You can integrate identity providers such as Okta, Ping, and Microsoft Entra ID (formerly Azure Active Directory) with IAM Identity Center for users in your organization to access data using a single-sign on experience. IAM Identity Center also supports connecting additional third-party identity providers.

With Amazon Glue 5.0 and higher, you can propagate user-identities from IAM Identity Center to Amazon Glue interactive sessions. Amazon Glue Interactive Sessions will further propagate supplied identity to downstream services such as Amazon S3 Access Grants, Amazon Lake Formation, and Amazon Redshift, enabling secure data access via user identity in these downstream services.

Overview

Identity Center is the recommended approach for workforce authentication and authorization on Amazon for organizations of any size and type. With Identity Center, you can create and manage user identities in Amazon, or connect your existing identity source, including Microsoft Active Directory, Okta, Ping Identity, JumpCloud, Google Workspace, and Microsoft Entra ID (formerly Azure AD).

Trusted identity propagation is an IAM Identity Center feature that administrators of connected Amazon services can use to grant and audit access to service data. Access to this data is based on user attributes such as group associations. Setting up trusted identity propagation requires collaboration between the administrators of connected Amazon services and the IAM Identity Center administrators.

Features and benefits

The Amazon Glue interactive sessions integration with IAM Identity Center Trusted identity propagation provides the following benefits:

  • The ability to enforce table-level authorization and fine grained access control with Identity Center identities on Lake Formation managed Amazon Glue data catalog tables.

  • The ability to enforce authorization with Identity Center identities on Amazon Redshift clusters.

  • Enables end to end tracking of user actions for auditing.

  • The ability to enforce Amazon S3 prefix-level authorization with Identity Center identities on Amazon S3 Access Grants-managed Amazon S3 prefixes.

Use cases

Interactive Data Exploration and Analysis

Data engineers use their corporate identities to seamlessly access and analyze data across multiple Amazon accounts. Through SageMaker Studio, they launch interactive Spark sessions via Amazon Glue ETL, connecting to various data sources including Amazon S3 and the Amazon Glue Data Catalog. As engineers explore datasets, Spark enforces fine-grained access controls defined in Lake Formation based on their identities, ensuring they can only view authorized data. All queries and data transformations are logged with the user's identity, creating a clear audit trail. This streamlined approach enables rapid prototyping of new analytics products while maintaining strict data governance across client environments.

Data Preparation and Feature Engineering

Data scientists from multiple research teams collaborate on complex projects using a unified data platform. They log into SageMaker Studio with their corporate credentials, immediately accessing a vast, shared data lake that spans multiple Amazon accounts. As they begin feature engineering for new machine learning models, Spark sessions launched through Amazon Glue ETL enforce Lake Formation's column and row-level security policies based on their propagated identities. Scientists can efficiently prepare data and engineer features using familiar tools, while compliance teams have assurance that every data interaction is automatically tracked and audited. This secure, collaborative environment accelerates research pipelines while maintaining the strict data protection standards required in regulated industries.

How it works

Architecture diagram showing Amazon Glue Interactive Sessions workflow. A user logs into client-facing applications (SageMaker Unified Studio, or custom applications) through IAM Identity Center. The user's identity is propagated to Amazon Glue Interactive Sessions, which connects to access control services including IAM Identity Center, Amazon Lake Formation, Amazon Glue Data Catalog, and Amazon S3 Access Grant, before finally accessing S3 Storage.

A user logs into client-facing applications (SageMaker AI, or custom applications) using their corporate identity through IAM Identity Center. This identity is then propagated through the entire data access pipeline.

The authenticated user launches Amazon Amazon Glue Interactive Sessions, which serve as the compute engine for data processing. These sessions maintain the user's identity context throughout the workflow.

Amazon Lake Formation and the Amazon Glue Data Catalog work together to enforce fine-grained access controls. Lake Formation applies security policies based on the user's propagated identity, while Amazon S3 Access Grant provides additional permission layers, ensuring users can only access data they're authorized to view.

Finally, the system connects to Amazon S3 Storage where the actual data resides. All access is governed by the combined security policies, maintaining data governance while enabling interactive data exploration and analysis. This architecture enables secure, identity-based data access across multiple Amazon services while maintaining a seamless user experience for data scientists and engineers working with large datasets.

Integrations

Amazon managed development environment

The following Amazon managed client-facing applications support trusted identity propagation with Amazon Glue interactive sessions:

Sagemaker Unified Studio

To use trusted identity propagation with Sagemaker Unified Studio:

  1. Set up Sagemaker Unified Studio project with trusted identity propagation enabled as the client-facing development environment.

  2. Set up Lake Formation to enable fine-grained access control for Amazon Glue tables based on the user or group in IAM Identity Center.

  3. Set up Amazon S3 Access Grants to enable temporary access to the underlying data locations in Amazon S3.

  4. Open Sagemaker Unified Studio JupyterLab IDE space and select Amazon Glue as compute for notebook execution.

Customer managed self-hosted Notebook environment

To enable trusted identity propagation for users of custom-developed applications, see Access Amazon services programmatically using trusted identity propagation in the Amazon Security Blog.