

# Integrating third-party services with Lake Formation
Integrating with Lake Formation

Integrating with Amazon Lake Formation enables third-party services to securely access data in their Amazon S3 based data lakes. You can use Lake Formation as your authorization engine to manage or enforce permissions to your data lake with integrated Amazon services such as Amazon Glue ETL, Amazon Athena, Amazon EMR, and Redshift Spectrum. Lake Formation provides two options for integrating services:

1. The Lake Formation application integration settings: Lake Formation can vend scoped-down temporary credentials in the form of Amazon STS tokens to registered Amazon S3 locations based on the effective permissions, so that authorized applications can access data on behalf of users.

1.  Central enforcement: Lake Formation[ querying API](https://docs.amazonaws.cn/lake-formation/latest/APIReference/API_StartQueryPlanning.html) operations retrieve data from Amazon S3 and filter the results based on effective permissions. The engine or application integrating with the querying API operation can depend on Lake Formation to evaluate the calling identity’s permissions and securely filter the data based on these permissions. Third-party query engines only see and operate on filtered data. 

 Lake Formation credential vending doesn't integrate with spark sql queries. Credential vending only works with queries that run through the Amazon Glue ETL library. 

**Topics**
+ [

# Using Lake Formation application integration
](using-cred-vending.md)

# Using Lake Formation application integration


Lake Formation allows third-party services to integrate with Lake Formation and get temporary access to Amazon S3 data on behalf of their users by using [GetTemporaryGlueTableCredentials](https://docs.amazonaws.cn/lake-formation/latest/APIReference/API_GetTemporaryGlueTableCredentials.html) and [GetTemporaryGluePartitionCredentials](https://docs.amazonaws.cn/lake-formation/latest/APIReference/API_GetTemporaryGluePartitionCredentials.html) operations. This allows third-party services to use the same authorization and credential vending feature that the rest of Amazon analytics services use. This section describes how to use these API operations to integrate a third-party query engine with Lake Formation.

 These API operations are disabled by default. There are two options to authorize Lake Formation to integrate applications:
+ Configure IAM session tags that are validated every time the application integration API operations are called

  For more information, see [Enabling permissions for a third-party query engine to call application integration API operations](permitting-third-party-call.md).
+ Enable the option that **Allows external engines to access data in Amazon S3 locations with full table access**

  This option allows query engines and applications to get credentials without IAM session tags if the user has full table access. It provides query engines and applications performance benefits as well as simplifies data access. Amazon EMR on Amazon EC2 is able to leverage this setting. 

  For more information, see [Application integration for full table access](full-table-credential-vending.md).

**Topics**
+ [

# How Lake Formation application integration works
](how-vending-works.md)
+ [

# Roles and responsibilities in Lake Formation application integration
](roles-and-responsibilities.md)
+ [

# Lake Formation workflow for application integration API operations
](api-overview.md)
+ [

# Registering a third-party query engine
](register-query-engine.md)
+ [

# Enabling permissions for a third-party query engine to call application integration API operations
](permitting-third-party-call.md)
+ [

# Application integration for full table access
](full-table-credential-vending.md)

# How Lake Formation application integration works


This section describes how to use application integration API operations to integrate a third-party application (query engine) with Lake Formation.

![\[Lake Formation data access workflow with user authentication and service integration.\]](http://docs.amazonaws.cn/en_us/lake-formation/latest/dg/images/credential-vending-new.png)


1. The Lake Formation administrator performs the following activities:
   + Registers an Amazon S3 location with Lake Formation by providing an IAM role (used for vending credentials) that has appropriate permissions to access data within the Amazon S3 location
   + Registers a third-party application to be able to call Lake Formation's credential vending API operations. See [Registering a third-party query engine](register-query-engine.md)
   + Grants permissions for users to access databases and tables

     For example, if you want to publish a user sessions data set that includes some columns containing personally identifiable information (PII), to restrict access, you assign these columns an [LF-TBAC](https://docs.amazonaws.cn/lake-formation/latest/dg/tag-based-access-control.html.html) tag named “classification” with a value of “sensitive”. Next, you define a permission that allows a business analyst to access the user sessions data, but exclude those columns tagged with *classification = sensitive*. 

1. A principal (user) submits a query to an integrated service.

1. The integrated application sends the request to Lake Formation asking for table information and credentials to access the table. 

1. If the querying principal is authorized to access the table, Lake Formation returns the credentials to the integrated application, which allows data access.
**Note**  
Lake Formation doesn't access the underlying data when vending credentials.

1. The integrated service reads data from Amazon S3, filters columns based on the policies it received, and returns the results back to the principal.

**Important**  
Lake Formation credential vending API operations enable a **distributed-enforcement with explicit deny on failure (fail-close) model.** This introduces a three-party security model between customers, third-party services and Lake Formation. Integrated services are trusted to properly enforce Lake Formation permissions (distributed-enforcement). 

The integrated service is responsible for filtering the data read from Amazon S3 based on the policies returned from Lake Formation before the filtered data is returned back to the user. Integrated services follow a fail-close model, which means that they must fail the query if they are unable to enforce required Lake Formation permissions. 

# Roles and responsibilities in Lake Formation application integration


The following are the roles and their associated responsibilities for enabling third-party application integration with Amazon Lake Formation.


****  

| Role | Responsibility | 
| --- | --- | 
| The customer |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/lake-formation/latest/dg/roles-and-responsibilities.html)  | 
| The third-party |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/lake-formation/latest/dg/roles-and-responsibilities.html)  | 
| Amazon Lake Formation |  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/lake-formation/latest/dg/roles-and-responsibilities.html)  | 

# Lake Formation workflow for application integration API operations


The following is the work flow for application integration API operations:

1. A user submits a query or request for data using an integrated third-party query engine. The query engine assumes an IAM role that represents the user or a group of users, and retrieves trusted credentials to be used when calling the application integration API operations.

1.  The query engine calls `GetUnfilteredTableMetadata`, and if it is a partitioned table, the query engine calls `GetUnfilteredPartitionsMetadata` to retrieve metadata and policy information from the Data Catalog.

1.  Lake Formation performs authorization for the request. If the user doesn't have appropriate permissions on the table, then *AccessDeniedException* is thrown. 

1. As part of the request, the query engine sends the filtering it supports. There are two flags that can be sent within an array: *COLUMN\$1PERMISSIONS* and *CELL\$1FILTER\$1PERMISSION*. If the query engine doesn't support any of these features, and a policy exists on the table for the feature, then a *PermissionTypeMismatchException* is thrown and the query fails. This is to avoid data leakage.

1. The returned response contains the following:
   + The entire schema for the table so that query engines can use it to parse the data from storage.
   + A list of authorized columns that the user has access. If the authorized column list is empty, it indicates that the user has `DESCRIBE` permissions, but does not have `SELECT` permissions, and the query fails.
   + A flag, `IsRegisteredWithLakeFormation`, which indicates if Lake Formation can vend credentials to this resources data. If this returns false, then the customers' credentials should be used to access Amazon S3. 
   +  A list of `CellFilters` if any that should be applied to rows of data. This list contains columns and an expression to evaluate each row. This should only be populated if *CELL\$1FILTER\$1PERMISSION* is sent as part of the request and there is a data filter against the table for the calling user.

1. After the metadata is retrieved, the query engine calls `GetTemporaryGlueTableCredentials` or `GetTemporaryGluePartitionCredentials` to get Amazon credentials to retrieve data from the Amazon S3 location. 

1. The query engine reads relevant objects from Amazon S3, filters the data based on the policies it received in step 2, and returns the results to the user. 

The application integration API operations for Lake Formation contain additional content for configuring integration with third-party query engines. You can see the operation details in the [Credential vending API operations section.](aws-lake-formation-api-credential-vending.md)

 The `QuerySessionContext` is a structure that query engines can additionally send to Lake Formation for these application integration API operations. It allows Lake Formation to store and utilize additional context for a given query. The following provides an example of how [QuerySessionContext](https://docs.amazonaws.cn/glue/latest/webapi/API_QuerySessionContext.html) should be used:

1. The query engine makes a `GetInternalUnfilteredMetadata` call, passing in a QSC structure containing a unique query id in the request:

   ```
   {
       "QuerySessionContext": {
           "QueryId": "your-unique-identifier-here"
       }
   }
   ```

1. The `GetInternalUnfilteredMetadata` call will have returned a `QueryAuthorizationId` string in the response. On the next (and any subsequent) query call that accepts a QSC structure in the input, the query engine passes the same QSC structure that now also contains the `QueryAuthorizationId` returned by Lake Formation. Suppose this next call is `GetTemporaryGlueTableCredentials`; the request will contain:

   ```
   {
       "QuerySessionContext": {
           "QueryAuthorizationId": "lf-returned-query-authz-id-here",
           "QueryId": "your-unique-identifier-here"
       },
   }
   ```

# Registering a third-party query engine


Before a third-party query engine can use the application integration API operations, you need to explicitly enable permissions for the query engine to call the API operations on your behalf. This is done in a few steps:

1. You need to specify the Amazon accounts and IAM session tags that require permission to call the application integration API operations through the Amazon Lake Formation console, the Amazon CLI or the API/SDK. 

1. When the third-party query engine assumes the execution role in your account, the query engine must attach a session tag that is registered with Lake Formation representing the third-party engine. Lake Formation uses this tag to validate that if the request is coming from an approved engine. For more information about session tags, see [Session tags](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_session-tags.html) in the IAM User Guide.

1. When setting up a third-party query engine execution role, you must have the following minimum set of permissions in the IAM policy:

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": {"Effect": "Allow",
       "Action": [
         "lakeformation:GetDataAccess",      
         "glue:GetTable",
         "glue:GetTables",
         "glue:GetDatabase",
         "glue:GetDatabases",
         "glue:CreateDatabase",
         "glue:GetUserDefinedFunction",
         "glue:GetUserDefinedFunctions",
         "glue:GetPartition",
         "glue:GetPartitions"
       ],
       "Resource": "*"
     }
   }
   ```

------

1. Set up a role trust policy on the query engine execution role to have fine access control on which session tag key value pair can be attached to this role. In the following example, this role is only allowed to have session tag key `"LakeFormationAuthorizedCaller"` and session tag value `"engine1"` to be attached, and no other session tag key value pair is allowed.

   ```
   {
       "Sid": "AllowPassSessionTags",
       "Effect": "Allow",
       "Principal": {
           "AWS": "arn:aws:iam::111122223333:role/query-execution-role"
       },
       "Action": "sts:TagSession",
       "Condition": {
       "StringLike": {
           "aws:RequestTag/LakeFormationAuthorizedCaller": "engine1"        }
       }
   }
   ```

When `LakeFormationAuthorizedCaller` calls the STS:AssumeRole API operation to fetch credentials for the query engine to use, the session tag must be included in the [ AssumeRole request](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_session-tags.html#id_session-tags_adding-assume-role). The returned temporary credential can be used to make Lake Formation application integration API requests.

Lake Formation application integration API operations require the calling principal to be an IAM role. The IAM role must include a session tag with a predetermined value that has been registered with Lake Formation. This tag allows Lake Formation to verify that the role used to call the application integration API operations is allowed to do so.

# Enabling permissions for a third-party query engine to call application integration API operations


Follow these steps to allow a third-party query engine to call application integration API operations through the Amazon Lake Formation console, the Amazon CLI or API/SDK.

------
#### [ Console ]

**To register your account for external data filtering:**

1. Sign in to the Amazon Web Services Management Console, and open the Lake Formation console at [https://console.amazonaws.cn/lakeformation/](https://console.amazonaws.cn/lakeformation/).

1. In the left-side navigation, expand **Administration**, and then choose **Application integration setting**.

1. On the **Application integration setting** page, choose the option **Allow external engines to filter data in Amazon S3 locations registered with Lake Formation**.

1. Enter the session tags that you created for the third-party engine. For information about session tags, see [Passing session tags in Amazon STS](https://docs.amazonaws.cn/IAM/latest/UserGuide/id_session-tags.html) in the *Amazon Identity and Access Management User Guide*.

1. Enter the account IDs for users that can use the third-party engine to access unfiltered metadata information and the data access credentials of resources in the current account.

   You can also use the Amazon account ID field for configuring cross-account access.  
![\[The screenshot shows the Application integration settings page for Lake Formation. The option Allow external engines to filter data in Amazon S3 locations registered with Lake Formationis selected. For Session tag values, the text box is empty, but there are six tags displayed below the field, with the values "engine1, "engine2", "engine3", "session1", "session2", and "session3". The last field shows the Amazon Web Services account IDs field. The text field is empty, but there are three tags displayed below this field with account IDs. The account ID values are redacted.\]](http://docs.amazonaws.cn/en_us/lake-formation/latest/dg/images/cred-vending-external-data-filtering.png)

------
#### [ CLI ]

Use the `put-data-lake-settings` CLI command to set the following parameters.

There are three fields to configure when using this Amazon CLI command:
+ `allow-external-data-filtering ` – (boolean) Indicates that a third-party engine can access unfiltered metadata information and data access credentials of resources in the current account. 
+ `external-data-filtering-allow-list` – (array) A list of account IDs that can access unfiltered metadata information and data access credentials of resources in the current account when using a third-party engine. When AllowExternalDataFiltering is set to true, the ExternalDataFilteringAllowList property must include at least one account ID. An empty list is not allowed.
+ `authorized-sessions-tag-value-list` – (array) A list of authorized session tag values (strings). If an IAM role credential has been attached with an authorized key-value pair, then if the session tag is included in the list, the session is granted access to unfiltered metadata information and data access credentials on resources in the configured account. The authorized session tag key is defined as `*LakeFormationAuthorizedCaller*`.
+ `AllowFullTableExternalDataAccess` - (boolean) Whether to allow a third-party query engine to get data access credentials without session tags when a caller has full data access permissions. 

For example:

```
aws lakeformation put-data-lake-settings --cli-input-json file://datalakesettings.json

{
  "DataLakeSettings": {
    "DataLakeAdmins": [
      {
        "DataLakePrincipalIdentifier": "arn:aws:iam::111111111111:user/lakeAdmin"
      }
    ],
    "CreateDatabaseDefaultPermissions": [],
    "CreateTableDefaultPermissions": [],
    "TrustedResourceOwners": [],
    "AllowExternalDataFiltering": true,
    "ExternalDataFilteringAllowList": [
        {"DataLakePrincipalIdentifier": "111111111111"}
        ],
    "AuthorizedSessionTagValueList": ["engine1"],
    "AllowFullTableExternalDataAccess": false
    }
    

}
```

------
#### [ API/SDK ]

Use the `PutDataLakeSetting` API operation to set the following parameters. 

There are three fields to configure when using this API operation:
+ `AllowExternalDataFiltering` – (Boolean) Indicates whether a third-party engine can access unfiltered metadata information and data access credentials of resources in the current account. 
+ `ExternalDataFilteringAllowList` – (array) A list of account IDs that can access unfiltered metadata information and the data access credentials of resources in the current account using a third-party engine. 
+ `AuthorizedSectionsTagValueList` – (array) A list of authorized tag values (strings). If an IAM role credential has been attached with an authorized tag, then the session is granted access to unfiltered metadata information and the data access credentials on resources in the configured account. The authorized session tag key is defined as `*LakeFormationAuthorizedCaller*`. 
+  `AllowFullTableExternalDataAccess` - (boolean) Whether to allow a third-party query engine to get data access credentials without session tags when a caller has full data access permissions. 

For example:

```
//Enable session tag on existing data lake settings
public void sessionTagSetUpForExternalFiltering(AWSLakeFormationClient lakeformation) {
    GetDataLakeSettingsResult getDataLakeSettingsResult = lfClient.getDataLakeSettings(new GetDataLakeSettingsRequest());
    DataLakeSettings dataLakeSettings = getDataLakeSettingsResult.getDataLakeSettings();
    
    //set account level flag to allow external filtering
    dataLakeSettings.setAllowExternalDataFiltering(true);
    
    //set account that are allowed to call credential vending or Glue GetFilteredMetadata API
    List<DataLakePrincipal> allowlist = new ArrayList<>();
    allowlist.add(new DataLakePrincipal().withDataLakePrincipalIdentifier("111111111111"));
    dataLakeSettings.setWhitelistedForExternalDataFiltering(allowlist);
    
    //set registered session tag values
    List<String> registeredTagValues = new ArrayList<>();
    registeredTagValues.add("engine1");
    dataLakeSettings.setAuthorizedSessionTagValueList(registeredTagValues);

    lakeformation.putDataLakeSettings(new PutDataLakeSettingsRequest().withDataLakeSettings(dataLakeSettings));
}
```

------

# Application integration for full table access
Application integration for full table access

Follow these steps to enable third-party query engines to access data without the IAM session tag validation: 

------
#### [ Console ]

1. Sign in to the Lake Formation console at [https://console.amazonaws.cn/lakeformation/](https://console.amazonaws.cn/lakeformation/).

1. In the left-side navigation, expand **Administration**, and choose **Application integration settings**.

1. On the **Application integration settings** page, choose the **Allow external engines to access data in Amazon S3 locations with full table access** option. 

   When you enable this option, Lake Formation returns credentials to the querying application directly without IAM session tag validation. 

![\[The screenshot shows the Application integration setting page for Lake Formation. The option Allow external engines to access data in Amazon S3 locations with full table access is selected.\]](http://docs.amazonaws.cn/en_us/lake-formation/latest/dg/images/cred-vending-external-full-table.png)


------
#### [ Amazon CLI ]

Use the `put-data-lake-settings` CLI command to set the `AllowFullTableExternalDataAccess` parameter.

```
aws lakeformation put-data-lake-settings —cli-input-json file://put-data-lake-settings.json —region ap-northeast-1 
{
    "DataLakeSettings": {
        "DataLakeAdmins": [
            {
                "DataLakePrincipalIdentifier": "arn:aws:iam::111111111111:user/lakeAdmin"
            }
        ],
        "AllowFullTableExternalDataAccess": true
    }
}
```

------