

# Amazon Lake Formation access control models


Amazon Glue 5.0 and above supports two models for accessing data through Amazon Lake Formation:

**Topics**
+ [

# Using Amazon Glue with Amazon Lake Formation for Full Table Access
](security-access-control-fta.md)
+ [

# Using Amazon Glue with Amazon Lake Formation for fine-grained access control
](security-lf-enable.md)

# Using Amazon Glue with Amazon Lake Formation for Full Table Access


## Introduction to Full Table Access


Amazon Glue 5.0 and above supports Full Table Access (FTA) control in Apache Spark based on your policies defined in Amazon Lake Formation. This feature enables read and write operations from your Amazon Glue Spark jobs on Amazon Lake Formation registered tables when the job role has full table access. FTA is ideal for use cases that need to comply with security regulations at the table level and supports Spark capabilities including Resilient Distributed Datasets (RDDs), custom libraries, and User Defined Functions (UDFs) with Amazon Lake Formation tables.

When a Amazon Glue Spark job is configured for Full Table Access (FTA), Amazon Lake Formation credentials are used to read/write Amazon S3 data for Amazon Lake Formation registered tables, while the job's runtime role credentials will be used to read/write tables not registered with Amazon Lake Formation. This capability enables Data Manipulation Language (DML) operations including CREATE, ALTER, DELETE, UPDATE, and MERGE INTO statements on Apache Hive and Iceberg tables.

**Note**  
Review your requirements and determine if Fine-Grained Access Control (FGAC) or Full Table Access (FTA) suits your needs. Only one Amazon Lake Formation permission method can be enabled for a given Amazon Glue job. A job cannot simultaneously run Full Table Access (FTA) and Fine-Grained Access Control (FGAC) at the same time.

## How Full-Table Access (FTA) works on Amazon Glue


 Amazon Lake Formation offers two approaches for data access control: Fine-Grained Access Control (FGAC) and Full Table Access (FTA). FGAC provides enhanced security through column, row, and cell-level filtering, ideal for scenarios requiring granular permissions. FTA is ideal for straightforward access control scenarios where you need table- level permissions. It simplifies implementation by eliminating the need to enable fine-grained access mode, improves performance and reduces cost by avoiding the system driver and system executors, and supports both read and write operations ( including CREATE, ALTER, DELETE, UPDATE, and MERGE INTO commands). 

 In Amazon Glue 4.0, Amazon Lake Formation based data access worked through GlueContext class, the utility class provided by Amazon Glue. In Amazon Glue 5.0, Amazon Lake Formation based data access is available through native Spark SQL, Spark DataFrames, and continues to be supported through GlueContext class. 

## Implementing Full Table Access


### Step 1: Enable Full Table Access in Amazon Lake Formation


To use Full Table Access (FTA) mode, you need to allow third-party query engines to access data without the IAM session tag validation in Amazon Lake Formation. To enable, follow the steps in [ Application integration for full table access ](https://docs.aws.amazon.com/lake-formation/latest/dg/full-table-credential-vending.html). 

### Step 2: Setup IAM permissions for job runtime role


For read or write access to underlying data, in addition to Amazon Lake Formation permissions, a job runtime role needs the `lakeformation:GetDataAccess` IAM permission. With this permission, Amazon Lake Formation grants the request for temporary credentials to access the data.

The following is an example policy of how to provide IAM permissions to access a script in Amazon S3, uploading logs to Amazon S3, Amazon Glue API permissions, and permission to access Amazon Lake Formation.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "ScriptAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws-cn:s3:::amzn-s3-demo-bucket/scripts/*"
      ]
    },
    {
      "Sid": "LoggingAccess",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws-cn:s3:::amzn-s3-demo-bucket/logs/*"
      ]
    },
    {
      "Sid": "GlueCatalogAccess",
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:CreateTable",
        "glue:UpdateTable"
      ],
      "Resource": [
        "arn:aws-cn:glue:us-east-1:111122223333:catalog",
        "arn:aws-cn:glue:us-east-1:111122223333:database/default",
        "arn:aws-cn:glue:us-east-1:111122223333:table/default/*"
      ]
    },
    {
      "Sid": "LakeFormationAccess",
      "Effect": "Allow",
      "Action": [
        "lakeformation:GetDataAccess"
      ],
      "Resource": "*"
    }
  ]
}
```

------

#### Step 2.1 Configure Amazon Lake Formation permissions


Amazon Glue Spark jobs that read data from Amazon S3 require Amazon Lake Formation SELECT permission.

Amazon Glue Spark jobs that write/delete data in Amazon S3 require Amazon Lake Formation ALL permission.

Amazon Glue Spark jobs that interact with Amazon Glue Data catalog require DESCRIBE, ALTER, DROP permission as appropriate.

### Step 3: Initialize a Spark session for Full Table Access using Amazon Lake Formation


To access tables registered with Amazon Lake Formation, the following configurations need to be set during Spark initialization to configure Spark to use Amazon Lake Formation credentials.

 To access tables registered with Amazon Lake Formation, you need to explicitly configure your Spark session to use Amazon Lake Formation credentials. Add the following configurations when initializing your Spark session: 

```
from pyspark.sql import SparkSession
        
# Initialize Spark session with Lake Formation configurations
spark = SparkSession.builder \
    .appName("Lake Formation Full Table Access") \
    .config("spark.sql.catalog.glue_catalog", "org.apache.spark.sql.catalog.hive.GlueCatalog") \
    .config("spark.sql.catalog.glue_catalog.glue.lakeformation-enabled", "true") \
    .config("spark.sql.defaultCatalog", "glue_catalog") \
    .getOrCreate()
```

 Key configurations: 
+  `spark.sql.catalog.glue_catalog`: Registers a catalog named "glue\$1catalog" that uses the GlueCatalog implementation 
+  `spark.sql.catalog.glue_catalog.glue.lakeformation-enabled`: Explicitly enables Amazon Lake Formation integration for this catalog 
+  The catalog name ("glue\$1catalog" in this example) can be customized, but must be consistent in both configuration settings 

#### Hive


```
‐‐conf spark.hadoop.fs.s3.credentialsResolverClass=com.amazonaws.Amazon Glue.accesscontrol.AWSLakeFormationCredentialResolver
--conf spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true 
--conf spark.hadoop.fs.s3.folderObject.autoAction.disabled=true
--conf spark.sql.catalog.skipLocationValidationOnCreateTable.enabled=true
--conf spark.sql.catalog.createDirectoryAfterTable.enabled=true
--conf spark.sql.catalog.dropDirectoryBeforeTable.enabled=true
```

#### Iceberg


```
--conf spark.hadoop.fs.s3.credentialsResolverClass=com.amazonaws.Amazon Glue.accesscontrol.AWSLakeFormationCredentialResolver
--conf spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true 
--conf spark.hadoop.fs.s3.folderObject.autoAction.disabled=true
--conf spark.sql.catalog.skipLocationValidationOnCreateTable.enabled=true
--conf spark.sql.catalog.createDirectoryAfterTable.enabled=true
--conf spark.sql.catalog.dropDirectoryBeforeTable.enabled=true
--conf spark.sql.catalog.<catalog>.Amazon Glue.lakeformation-enabled=true
```
+  `spark.hadoop.fs.s3.credentialsResolverClass=com.amazonaws.Amazon Glue.accesscontrol.AWSLakeFormationCredentialResolver`: Configure EMR Filesystem (EMRFS) to use Amazon Lake Formation S3 credentials for Amazon Lake Formation registered tables. If the table is not registered, use the job's runtime role credentials. 
+ `spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true` and `spark.hadoop.fs.s3.folderObject.autoAction.disabled=true`: Configure EMRFS to use content type header application/x-directory instead of \$1folder\$1 suffix when creating S3 folders. This is required when reading Amazon Lake Formation tables, as Amazon Lake Formation credentials do not allow reading table folders with \$1folder\$1 suffix. 
+  `spark.sql.catalog.skipLocationValidationOnCreateTable.enabled=true`: Configure Spark to skip validating the table location's emptiness before creation. This is necessary for Amazon Lake Formation registered tables, as Amazon Lake Formation credentials to verify the empty location are available only after Amazon Glue Data Catalog table creation. Without this configuration, the job's runtime role credentials will validate the empty table location. 
+  `spark.sql.catalog.createDirectoryAfterTable.enabled=true`: Configure Spark to create the Amazon S3 folder after table creation in the Hive metastore. This is required for Amazon Lake Formation registered tables, as Amazon Lake Formation credentials to create the Amazon S3 folder are available only after Amazon Glue Data Catalog table creation. 
+  `spark.sql.catalog.dropDirectoryBeforeTable.enabled=true`: Configure Spark to drop the Amazon S3 folder before table deletion in the Hive metastore. This is necessary for Amazon Lake Formation registered tables, as Amazon Lake Formation credentials to drop the S3 folder are not available after table deletion from the Amazon Glue Data Catalog. 
+  `spark.sql.catalog.<catalog>.Amazon Glue.lakeformation-enabled=true`: Configure Iceberg catalog to use Amazon Lake Formation Amazon S3 credentials for Amazon Lake Formation registered tables. If the table is not registered, use default environment credentials. 

## Usage Patterns


### Using FTA with DataFrames


For users familiar with Spark, DataFrames can be used with Amazon Lake Formation Full Table Access.

Amazon Glue 5.0 adds native Spark support for Lake Formation Full Table Access, simplifying how you work with protected tables. This feature enables Amazon Glue 5.0 Amazon Glue Spark jobs to directly read and write data when full table access is granted, removing limitations that previously restricted certain Extract, Transform, and Load (ETL) operations. You can now leverage advanced Spark capabilities including Resilient Distributed Datasets (RDDs), custom libraries, and User Defined Functions (UDFs) with Amazon Lake Formation tables.

#### Native Spark FTA in Amazon Glue 5.0


Amazon Glue 5.0 supports full-table access (FTA) control in Apache Spark based on your policies defined in Amazon Lake Formation. This level of control is ideal for use cases that need to comply with security regulations at the table level.

#### Apache Iceberg Table Example


```
from pyspark.sql import SparkSession

catalog_name = "spark_catalog"
aws_region = "us-east-1"
aws_account_id = "123456789012"
warehouse_path = "s3://amzn-s3-demo-bucket/warehouse/"

spark = SparkSession.builder \
    .config("spark.sql.extensions","org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config(f"spark.sql.catalog.{catalog_name}", "org.apache.iceberg.spark.SparkSessionCatalog") \
    .config(f"spark.sql.catalog.{catalog_name}.warehouse", f"{warehouse_path}") \
    .config(f"spark.sql.catalog.{catalog_name}.client.region",f"{aws_region}") \
    .config(f"spark.sql.catalog.{catalog_name}.glue.account-id",f"{aws_account_id}") \
    .config(f"spark.sql.catalog.{catalog_name}.glue.lakeformation-enabled","true") \
    .config(f"spark.sql.catalog.dropDirectoryBeforeTable.enabled", "true") \
    .config(f"spark.sql.catalog.{catalog_name}.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") \
    .config(f"spark.sql.catalog.{catalog_name}.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") \
    .config("spark.sql.defaultCatalog", catalog_name) \  # Add this line
    .getOrCreate()

database_name = "your_database"
table_name = "your_table"

df = spark.sql(f"select * from {database_name}.{table_name}")
df.show()
```

#### Required IAM Permissions


Your Amazon Glue job execution role must have:

```
{
    "Action": "lakeformation:GetDataAccess",
    "Resource": "*",
    "Effect": "Allow"
}
```

Plus appropriate S3 access permissions for your data locations.

#### Lake Formation Configuration


Before using native Spark FTA in Amazon Glue 5.0:

1. Allow third-party query engines to access data without IAM session tag validation in Amazon Lake Formation

1. Grant appropriate table permissions to your Amazon Glue job execution role through Amazon Lake Formation console

1. Configure your Spark session with the required parameters shown in the example above

### Using FTA with DynamicFrames


 Amazon Glue's native DynamicFrames can be used with Amazon Lake Formation Full Table Access for optimized ETL operations. Full Table Access (FTA) provides a security model that grants permissions at the table level, allowing for faster data processing compared to Fine-Grained Access Control (FGAC) since it bypasses the overhead of row and column-level permission checks. This approach is useful when you need to process entire tables and table-level permissions meet your security requirements. 

 In Amazon Glue 4.0, DynamicFrames with FTA required specific GlueContext configuration. While existing Amazon Glue 4.0 DynamicFrame code with FTA will continue to work in Amazon Glue 5.0, the newer version also offers native Spark FTA support with greater flexibility. For new development, consider using the native Spark approach described in the DataFrames section, especially if you need additional capabilities such as Resilient Distributed Datasets (RDDs), custom libraries, and User Defined Functions (UDFs) with Amazon Lake Formation tables. 

#### Required Permissions


The IAM role executing your Glue job must have:
+ `lakeformation:GetDataAccess` permission
+ Appropriate Lake Formation table permissions granted through the Lake Formation console

#### Example DynamicFrame Implementation in Amazon Glue 5.0


```
from awsglue.context import GlueContext
from pyspark.context import SparkContext

# Initialize Glue context
sc = SparkContext()
glueContext = GlueContext(sc)

# Configure catalog for Iceberg tables
catalog_name = "glue_catalog"
aws_region = "us-east-1"
aws_account_id = "123456789012"
warehouse_path = "s3://amzn-s3-demo-bucket/warehouse/"

spark = glueContext.spark_session
spark.conf.set(f"spark.sql.catalog.{catalog_name}", "org.apache.iceberg.spark.SparkCatalog")
spark.conf.set(f"spark.sql.catalog.{catalog_name}.warehouse", f"{warehouse_path}")
spark.conf.set(f"spark.sql.catalog.{catalog_name}.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
spark.conf.set(f"spark.sql.catalog.{catalog_name}.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
spark.conf.set(f"spark.sql.catalog.{catalog_name}.glue.lakeformation-enabled","true")
spark.conf.set(f"spark.sql.catalog.{catalog_name}.client.region",f"{aws_region}")
spark.conf.set(f"spark.sql.catalog.{catalog_name}.glue.id", f"{aws_account_id}")

# Read Lake Formation-protected table with DynamicFrame
df = glueContext.create_data_frame.from_catalog(
    database="your_database",
    table_name="your_table"
)
```

## Additional Configuration


### Configure full table access mode in Amazon Glue Studio notebooks


To access Amazon Lake Formation registered tables from interactive Spark sessions in Amazon Glue Studio notebooks, you must use compatibility permission mode. Use the `%%configure` magic command to set up your Spark configuration before starting your interactive session. This configuration must be the first command in your notebook, as it cannot be applied after the session has started. Choose the configuration based on your table type:

#### For Hive tables


```
%%configure
--conf spark.hadoop.fs.s3.credentialsResolverClass=com.amazonaws.glue.accesscontrol.AWSLakeFormationCredentialResolver
--conf spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true
--conf spark.hadoop.fs.s3.folderObject.autoAction.disabled=true
--conf spark.sql.catalog.skipLocationValidationOnCreateTable.enabled=true
--conf spark.sql.catalog.createDirectoryAfterTable.enabled=true
--conf spark.sql.catalog.dropDirectoryBeforeTable.enabled=true
```

#### For Iceberg tables


```
%%configure
--conf spark.hadoop.fs.s3.credentialsResolverClass=com.amazonaws.glue.accesscontrol.AWSLakeFormationCredentialResolver
--conf spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true
--conf spark.hadoop.fs.s3.folderObject.autoAction.disabled=true
--conf spark.sql.catalog.skipLocationValidationOnCreateTable.enabled=true
--conf spark.sql.catalog.createDirectoryAfterTable.enabled=true
--conf spark.sql.catalog.dropDirectoryBeforeTable.enabled=true
--conf spark.sql.catalog.glue_catalog.glue.lakeformation-enabled=true
--conf spark.sql.catalog.glue_catalog.warehouse=s3://example-s3-bucket_DATA_LOCATION
--conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
--conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
--conf spark.sql.catalog.glue_catalog.glue.account-id=ACCOUNT_ID
--conf spark.sql.catalog.glue_catalog.glue.region=REGION
```

Replace the placeholders:
+  S3\$1DATA\$1LOCATION: *s3://amzn-s3-demo-bucket* 
+  REGION: *Amazon Region (e.g., us-east-1)* 
+  ACCOUNT\$1ID: *Your Amazon Account ID* 

**Note**  
You must set these configurations before executing any Spark operations in your notebook.

### Supported Operations


These operations will use Amazon Lake Formation credentials to access the table data.

**Note**  
 On enabling Amazon Lake Formation:   
 For FTA: Enable the Spark configuration `spark.sql.catalog.{catalog_name}.glue.lakeformation-enabled` 
+ CREATE TABLE
+ ALTER TABLE
+ INSERT INTO
+  INSERT OVERWRITE 
+ SELECT
+ UPDATE
+ MERGE INTO
+ DELETE FROM
+ ANALYZE TABLE
+ REPAIR TABLE
+ DROP TABLE
+ Spark datasource queries
+ Spark datasource writes

**Note**  
Operations not listed above will continue to use IAM permissions to access table data.

## Migrating from Amazon Glue 4.0 to Amazon Glue 5.0 FTA


When migrating from Amazon Glue 4.0 GlueContext FTA to Amazon Glue 5.0 native Spark FTA:

1. Allow third-party query engines to access data without the IAM session tag validation in Amazon Lake Formation. Follow [Step 1: Enable Full Table Access in Amazon Lake Formation](#security-access-control-fta-step-1). 

1. You do not need to change the job runtime role. However, verify that the Amazon Glue job execution role has lakeformation:GetDataAccess IAM permission.

1. Modify spark session configurations in the script. Ensure the following spark configurations are present:

   ```
   --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
   --conf spark.sql.catalog.spark_catalog.warehouse=s3://<bucket-name>/warehouse/
   --conf spark.sql.catalog.spark_catalog.client.region=<REGION>
   --conf spark.sql.catalog.spark_catalog.glue.account-id=ACCOUNT_ID
   --conf spark.sql.catalog.spark_catalog.glue.lakeformation-enabled=true
   --conf spark.sql.catalog.dropDirectoryBeforeTable.enabled=true
   ```

1. Update script such that GlueContext DataFrames are changed to native spark DataFrames.

1. Update your Amazon Glue job to use Amazon Glue 5.0

## Considerations and Limitations

+ If a Hive table is created using a job that doesn't have full table access enabled, and no records are inserted, subsequent reads or writes from a job with full table access will fail. This is because Amazon Glue Spark without full table access adds the \$1folder\$1 suffix to the table folder name. To resolve this, you can either:
  +  Insert at least one row into the table from a job that does not have FTA enabled. 
  +  Configure the job that does not have FTA enabled to not use \$1folder\$1 suffix in folder name in S3. This can be achieved by setting Spark configuration `spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true`. 
  +  Create an Amazon S3 folder at the table location `s3://path/to/table/table_name` using the Amazon S3 console or Amazon S3 CLI. 
+ Full Table Access works exclusively with EMR Filesystem (EMRFS). S3A filesystem is not compatible.
+  Full Table Access is supported for Hive and Iceberg tables. Support for Hudi and Delta tables has not yet been added. 
+ Jobs referencing tables with Amazon Lake Formation Fine-Grained Access Control (FGAC) rules or Amazon Glue Data Catalog Views will fail. To query a table with an FGAC rules or a Amazon Glue Data Catalog View, you need to use the FGAC mode. You can enable FGAC mode by following the steps outlined in the Amazon documentation: Using Amazon Glue with Amazon Lake Formation for fine-grained access control. 
+  Full table access does not support Spark Streaming. 
+ Cannot be used simultaneously with FGAC.

# Using Amazon Glue with Amazon Lake Formation for fine-grained access control
Lake Formation for FGAC

## Overview


With Amazon Glue version 5.0 and higher, you can leverage Amazon Lake Formation to apply fine-grained access controls on Data Catalog tables that are backed by S3. This capability lets you configure table, row, column, and cell level access controls for read queries within your Amazon Glue for Apache Spark jobs. See the following sections to learn more about Lake Formation and how to use it with Amazon Glue.

`GlueContext`-based table-level access control with Amazon Lake Formation permissions supported in Glue 4.0 or before is not supported in Glue 5.0. Use the new Spark native fine-grained access control (FGAC) in Glue 5.0. Note the following details:
+ If you need fine grained access control (FGAC) for row/column/cell access control, you will need to migrate from `GlueContext`/Glue DynamicFrame in Glue 4.0 and prior to Spark dataframe in Glue 5.0. For examples, see [Migrating from GlueContext/Glue DynamicFrame to Spark DataFrame](security-lf-migration-spark-dataframes.md)
+  If you need Full Table Access control (FTA), you can leverage FTA with DynamicFrames in Amazon Glue 5.0. You can also migrate to native Spark approach for additional capabilities such as Resilient Distributed Datasets (RDDs), custom libraries, and User Defined Functions (UDFs) with Amazon Lake Formation tables. For examples, see [ Migrating from Amazon Glue 4.0 to Amazon Glue 5.0](https://docs.amazonaws.cn/glue/latest/dg/migrating-version-50.html). 
+ If you don't need FGAC, then no migration to Spark dataframe is necessary and `GlueContext` features like job bookmarks, push down predicates will continue to work.
+ Jobs with FGAC require a minimum of 4 workers: one user driver, one system driver, one system executor, and one standby user executor.

Using Amazon Glue with Amazon Lake Formation incurs additional charges.

## How Amazon Glue works with Amazon Lake Formation
How it works

Using Amazon Glue with Lake Formation lets you enforce a layer of permissions on each Spark job to apply Lake Formation permissions control when Amazon Glue executes jobs. Amazon Glue uses [ Spark resource profiles](https://spark.apache.org/docs/latest/api/java/org/apache/spark/resource/ResourceProfile.html) to create two profiles to effectively execute jobs. The user profile executes user-supplied code, while the system profile enforces Lake Formation policies. For more information, see [What is Amazon Lake Formation](https://docs.amazonaws.cn/lake-formation/latest/dg/what-is-lake-formation.html) and [Considerations and limitations](https://docs.amazonaws.cn/glue/latest/dg/security-lf-enable-considerations.html).

The following is a high-level overview of how Amazon Glue gets access to data protected by Lake Formation security policies.

![\[The diagram shows how fine-grained access control works with the Amazon Glue StartJobRun API.\]](http://docs.amazonaws.cn/en_us/glue/latest/dg/images/glue-50-fgac-start-job-run-api-diagram.png)


1. A user calls the `StartJobRun` API on an Amazon Lake Formation-enabled Amazon Glue job.

1. Amazon Glue sends the job to a user driver and runs the job in the user profile. The user driver runs a lean version of Spark that has no ability to launch tasks, request executors, access S3 or the Glue Catalog. It builds a job plan.

1. Amazon Glue sets up a second driver called the system driver and runs it in the system profile (with a privileged identity). Amazon Glue sets up an encrypted TLS channel between the two drivers for communication. The user driver uses the channel to send the job plans to the system driver. The system driver does not run user-submitted code. It runs full Spark and communicates with S3, and the Data Catalog for data access. It request executors and compiles the Job Plan into a sequence of execution stages. 

1. Amazon Glue then runs the stages on executors with the user driver or system driver. User code in any stage is run exclusively on user profile executors.

1. Stages that read data from Data Catalog tables protected by Amazon Lake Formation or those that apply security filters are delegated to system executors.

## Minimum worker requirement
Minimum workers

A Lake Formation-enabled job in Amazon Glue requires a minimum of 4 workers: one user driver, one system driver, one system executor, and one standby User Executor. This is up from the minimum of 2 workers required for standard Amazon Glue jobs.

A Lake Formation-enabled job in Amazon Glue utilizes two Spark drivers—one for the system profile and another for the user profile. Similarly, the executors are also divided into two profiles:
+ System executors: handle tasks where Lake Formation data filters are applied.
+ User executors: are requested by the system driver as needed.

As Spark jobs are lazy in nature, Amazon Glue reserves 10% of the total workers (minimum of 1), after deducting the two drivers, for user executors.

All Lake Formation-enabled jobs have auto-scaling enabled, meaning the user executors will only start when needed.

For an example configuration, see [Considerations and limitations](https://docs.amazonaws.cn/glue/latest/dg/security-lf-enable-considerations.html).

## Job runtime role IAM permissions
Enable runtime permissions

Lake Formation permissions control access to Amazon Glue Data Catalog resources, Amazon S3 locations, and the underlying data at those locations. IAM permissions control access to the Lake Formation and Amazon Glue APIs and resources. Although you might have the Lake Formation permission to access a table in the Data Catalog (SELECT), your operation fails if you don’t have the IAM permission on the `glue:Get*` API operation. 

The following is an example policy of how to provide IAM permissions to access a script in S3, uploading logs to S3, Amazon Glue API permissions, and permission to access Lake Formation.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "ScriptAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws-cn:s3:::*.amzn-s3-demo-bucket/scripts",
        "arn:aws-cn:s3:::*.amzn-s3-demo-bucket/*"
      ]
    },
    {
      "Sid": "LoggingAccess",
      "Effect": "Allow",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws-cn:s3:::amzn-s3-demo-bucket/logs/*"
      ]
    },
    {
      "Sid": "GlueCatalogAccess",
      "Effect": "Allow",
      "Action": [
        "glue:Get*",
        "glue:Create*",
        "glue:Update*"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "LakeFormationAccess",
      "Effect": "Allow",
      "Action": [
        "lakeformation:GetDataAccess"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

## Setting up Lake Formation permissions for job runtime role
Set up runtime permissions

First, register the location of your Hive table with Lake Formation. Then create permissions for your job runtime role on your desired table. For more details about Lake Formation, see [ What is Amazon Lake Formation?](https://docs.amazonaws.cn/lake-formation/latest/dg/what-is-lake-formation.html) in the *Amazon Lake Formation Developer Guide*.

After you set up the Lake Formation permissions, you can submit Spark jobs on Amazon Glue.

## Submitting a job run


After you finish setting up the Lake Formation grants, you can submit Spark jobs on Amazon Glue. To run Iceberg jobs, you must provide the following Spark configurations. To configure through Glue job parameters, put the following parameter:
+ Key:

  ```
  --conf
  ```
+ Value:

  ```
  spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
  					  --conf spark.sql.catalog.spark_catalog.warehouse=<S3_DATA_LOCATION> 
  					  --conf spark.sql.catalog.spark_catalog.glue.account-id=<ACCOUNT_ID> 
  					  --conf spark.sql.catalog.spark_catalog.client.region=<REGION> 
  					  --conf spark.sql.catalog.spark_catalog.glue.endpoint=https://glue.<REGION>.amazonaws.com
  ```

## Using an Interactive Session


 After you finish setting up the Amazon Lake Formation grants, you can use Interactive Sessions on Amazon Glue. You must provide the following Spark configurations via the `%%configure` magic prior to executing code. 

```
%%configure
{
    "--enable-lakeformation-fine-grained-access": "true",
    "--conf": "spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog --conf spark.sql.catalog.spark_catalog.warehouse=<S3_DATA_LOCATION> --conf spark.sql.catalog.spark_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.spark_catalog.client.region=<REGION> --conf spark.sql.catalog.spark_catalog.glue.account-id=<ACCOUNT_ID> --conf spark.sql.catalog.spark_catalog.glue.endpoint=https://glue.<REGION>.amazonaws.com"
}
```

## FGAC for Amazon Glue 5.0 Notebook or interactive sessions


To enable Fine-Grained Access Control (FGAC) in Amazon Glue you must specify the Spark confs required for Lake Formation as part of the %%configure magic before you create first cell.

Specifying it later using the calls `SparkSession.builder().conf("").get()` or `SparkSession.builder().conf("").create()` will not be enough. This is a change from the Amazon Glue 4.0 behavior.

## Open-table format support
Supported operations

Amazon Glue version 5.0 or later includes support for fine-grained access control based on Lake Formation. Amazon Glue supports Hive and Iceberg table types. The following table describes all of the supported operations.

[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/glue/latest/dg/security-lf-enable.html)

# Migrating from GlueContext/Glue DynamicFrame to Spark DataFrame


The following are Python and Scala examples of migrating `GlueContext`/Glue `DynamicFrame` in Glue 4.0 to Spark `DataFrame` in Glue 5.0.

**Python**  
Before:

```
escaped_table_name= '`<dbname>`.`<table_name>`'

additional_options = {
  "query": f'select * from {escaped_table_name} WHERE column1 = 1 AND column7 = 7'
}

# DynamicFrame example
dataset = glueContext.create_data_frame_from_catalog(
    database="<dbname>",
    table_name=escaped_table_name, 
    additional_options=additional_options)
```

After:

```
table_identifier= '`<catalogname>`.`<dbname>`.`<table_name>`"' #catalogname is optional

# DataFrame example
dataset = spark.sql(f'select * from {table_identifier} WHERE column1 = 1 AND column7 = 7')
```

**Scala**  
Before:

```
val escapedTableName = "`<dbname>`.`<table_name>`"

val additionalOptions = JsonOptions(Map(
    "query" -> s"select * from $escapedTableName WHERE column1 = 1 AND column7 = 7"
    )
)

# DynamicFrame example
val datasource0 = glueContext.getCatalogSource(
    database="<dbname>", 
    tableName=escapedTableName, 
    additionalOptions=additionalOptions).getDataFrame()
```

After:

```
val tableIdentifier = "`<catalogname>`.`<dbname>`.`<table_name>`" //catalogname is optional

# DataFrame example
val datasource0 = spark.sql(s"select * from $tableIdentifier WHERE column1 = 1 AND column7 = 7")
```

# Considerations and limitations
Considerations

Consider the following considerations and limitations when you use Lake Formation with Amazon Glue. 

Amazon Glue with Lake Formation is available in all supported Regions except Amazon GovCloud (US-East) and Amazon GovCloud (US-West).
+ Amazon Glue supports fine-grained access control via Lake Formation only for Apache Hive and Apache Iceberg tables. Apache Hive formats include Parquet, ORC, and CSV. 
+ You can only use Lake Formation with Spark jobs.
+ Amazon Glue with Lake Formation only supports a single Spark session throughout a job.
+ When Lake Formation is enabled,Amazon Glue requires a greater number of workers because it requires one system driver, system executors, one user driver, and optionally user executors (required when your job has UDFs or `spark.createDataFrame`).
+ Amazon Glue with Lake Formation only supports cross-account table queries shared through resource links. The resource-link needs to be named identically to the source account's resource.
+ To enable fine-grained access control for Amazon Glue jobs, pass the `--enable-lakeformation-fine-grained-access` job parameter.
+ You can configure your Amazon Glue jobs to work with the Amazon Glue multi-catalog hierarchy. For information on the configuration parameters to use with the Amazon Glue `StartJobRun` API, see [Working with Amazon Glue multi-catalog hierarchy on EMR Serverless](https://docs.amazonaws.cn/emr/latest/EMR-Serverless-UserGuide/external-metastore-glue-multi.html).
+ The following aren't supported:
  + Resilient distributed datasets (RDD)
  + Spark streaming
  + Write with Lake Formation granted permissions
  + Access control for nested columns
+ Amazon Glue blocks functionalities that might undermine the complete isolation of system driver, including the following:
  + UDTs, HiveUDFs, and any user-defined function that involves custom classes
  + Custom data sources
  + Supply of additional jars for Spark extension, connector, or metastore
  + `ANALYZE TABLE` command
+ To enforce access controls, `EXPLAIN PLAN` and DDL operations such as `DESCRIBE TABLE` don't expose restricted information.
+ Amazon Glue restricts access to system driver Spark logs on Lake Formation-enabled applications. Since the system driver runs with more access, events and logs that the system driver generates can include sensitive information. To prevent unauthorized users or code from accessing this sensitive data, Amazon Glue disabled access to system driver logs. For troubleshooting, contact Amazon support.
+ If you registered a table location with Lake Formation, the data access path goes through the Lake Formation stored credentials regardless of the IAM permission for the Amazon Glue job runtime role. If you misconfigure the role registered with table location, jobs submitted that use the role with S3 IAM permission to the table location will fail.
+ Writing to a Lake Formation table uses IAM permission rather than Lake Formation granted permissions. If your job runtime role has the necessary S3 permissions, you can use it to run write operations.

The following are considerations and limitations when using Apache Iceberg:
+ You can only use Apache Iceberg with session catalog and not arbitrarily named catalogs.
+ Iceberg tables that are registered in Lake Formation only support the metadata tables `history`, `metadata_log_entries`, `snapshots`, `files`, `manifests`, and `refs`. Amazon Glue hides the columns that might have sensitive data, such as `partitions`, `path`, and `summaries`. This limitation doesn't apply to Iceberg tables that aren't registered in Lake Formation.
+ Tables that you don't register in Lake Formation support all Iceberg stored procedures. The `register_table` and `migrate` procedures aren't supported for any tables.
+ We recommend that you use Iceberg DataFrameWriterV2 instead of V1.

## Example worker allocation


For a job configured with the following parameters:

```
--enable-lakeformation-fine-grained-access=true  
--number-of-workers=20
```

The worker allocation would be:
+ One worker for the user driver.
+ One worker for the system driver.
+ 10% of the remaining 18 workers (that is, 2 workers) reserved for the user executors.
+ Up to 16 workers allocated for system executors.

With auto-scaling enabled, the user executors can utilize any of the unallocated capacity from the system executors if needed.

## Controlling user executor allocation


You can adjust the reservation percentage for user executors using the following configuration:

```
--conf spark.dynamicAllocation.maxExecutorsRatio=<value between 0 and 1>
```

This configuration allows fine-tuned control over how many user executors are reserved relative to the total available capacity.

# Troubleshooting


See the following sections for troubleshooting solutions.

## Logging


Amazon Glue uses Spark resources profiles to split job execution. Amazon Glue uses the user profile to run the code you supplied, while the system profile enforces Lake Formation policies. You can access the logs for the tasks ran as the user profile.

## Live UI and Spark History Server


The Live UI and the Spark History Server have all Spark events generated from the user profile and redacted events generated from the system driver.

You can see all of the tasks from both the user and system drivers in the **Executors** tab. However, log links are available only for the user profile. Also, some information is redacted from Live UI, such as the number of output records.

## Job failed with insufficient Lake Formation permissions


Make sure that your job runtime role has the permissions to run SELECT and DESCRIBE on the table that you are accessing.

## Job with RDD execution failed


Amazon Glue currently doesn't support resilient distributed dataset (RDD) operations on Lake Formation-enabled jobs.

## Unable to access data files in Amazon S3


Make sure you have registered the location of the data lake in Lake Formation.

## Security validation exception


Amazon Glue detected a security validation error. Contact Amazon support for assistance.

## Sharing Amazon Glue Data Catalog and tables across accounts


You can share databases and tables across accounts and still use Lake Formation. For more information, see [Cross-account data sharing in Lake Formation](https://docs.amazonaws.cn/lake-formation/latest/dg/cross-account-permissions.html) and [How do I share Amazon Glue Data Catalog and tables cross-account using ?](https://repost.aws/knowledge-center/glue-lake-formation-cross-account).

The following table describes summary of how to choose between Fine-grained access control (FGAC) and Full table access (FTA) for your workload.


| Feature | Fine-grained access control (FGAC) | Full table access (FTA) | 
| --- |--- |--- |
| Access Level | Column/row level | Full table | 
| Use Case | Queries and ETL with limited permissions | ETL | 
| Performance Impact | Requires system/user space transitions for access control evaluation, adding latency | Optimized performance | 