Lake Formation workflow for application integration API operations
The following is the work flow for application integration API operations:
-
A user submits a query or request for data using an integrated third-party query engine. The query engine assumes an IAM role that represents the user or a group of users, and retrieves trusted credentials to be used when calling the application integration API operations.
-
The query engine calls
GetUnfilteredTableMetadata
, and if it is a partitioned table, the query engine callsGetUnfilteredPartitionsMetadata
to retrieve metadata and policy information from the Data Catalog. -
Lake Formation performs authorization for the request. If the user doesn't have appropriate permissions on the table, then AccessDeniedException is thrown.
-
As part of the request, the query engine sends the filtering it supports. There are two flags that can be sent within an array: COLUMN_PERMISSIONS and CELL_FILTER_PERMISSION. If the query engine doesn't support any of these features, and a policy exists on the table for the feature, then a PermissionTypeMismatchException is thrown and the query fails. This is to avoid data leakage.
-
The returned response contains the following:
-
The entire schema for the table so that query engines can use it to parse the data from storage.
-
A list of authorized columns that the user has access. If the authorized column list is empty, it indicates that the user has
DESCRIBE
permissions, but does not haveSELECT
permissions, and the query fails. -
A flag,
IsRegisteredWithLakeFormation
, which indicates if Lake Formation can vend credentials to this resources data. If this returns false, then the customers' credentials should be used to access Amazon S3. -
A list of
CellFilters
if any that should be applied to rows of data. This list contains columns and an expression to evaluate each row. This should only be populated if CELL_FILTER_PERMISSION is sent as part of the request and there is a data filter against the table for the calling user.
-
-
After the metadata is retrieved, the query engine calls
GetTemporaryGlueTableCredentials
orGetTemporaryGluePartitionCredentials
to get Amazon credentials to retrieve data from the Amazon S3 location. -
The query engine reads relevant objects from Amazon S3, filters the data based on the policies it received in step 2, and returns the results to the user.
The application integration API operations for Lake Formation contain additional content for configuring integration with third-party query engines. You can see the operation details in the Credential vending API operations section.