Step 12: Query the data in the data lake using Amazon Redshift Spectrum - Amazon Lake Formation
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Step 12: Query the data in the data lake using Amazon Redshift Spectrum

You can set up Amazon Redshift Spectrum to query the data that you imported into your Amazon Simple Storage Service (Amazon S3) data lake. First, create an Amazon Identity and Access Management (IAM) role that is used to launch the Amazon Redshift cluster and to query the Amazon S3 data. Then, grant this role the Select permissions on the tables that you want to query. Then, grant the user permissions to use the Amazon Redshift query editor. Finally, create an Amazon Redshift cluster and run queries.

You create the cluster as an administrator, and query the cluster as a data analyst.

For more information about Amazon Redshift Spectrum, see Using Amazon Redshift Spectrum to Query External Data in the Amazon Redshift Database Developer Guide.

To set up permissions to run Amazon Redshift queries
  1. Open the IAM console at https://console.amazonaws.cn/iam/. Sign in as the administrator user that you created in (user name Administrator) or as a user with the AdministratorAccess Amazon managed policy.

  2. In the navigation pane, choose Policies.

    If this is your first time choosing Policies, the Welcome to Managed Policies page appears. Choose Get Started.

  3. Choose Create policy.

  4. Choose the JSON tab.

  5. Paste in the following JSON policy document.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "lakeformation:GetDataAccess", "glue:GetTable", "glue:GetTables", "glue:SearchTables", "glue:GetDatabase", "glue:GetDatabases", "glue:GetPartitions", "lakeformation:GetResourceLFTags", "lakeformation:ListLFTags", "lakeformation:GetLFTag", "lakeformation:SearchTablesByLFTags", "lakeformation:SearchDatabasesByLFTags" ], "Resource": "*" } ] }
  6. When you are finished, choose Review to review the policy. The policy validator reports any syntax errors.

  7. On the Review policy page, enter the Name as RedshiftLakeFormationPolicy for the policy that you are creating. Enter a Description (optional). Review the policy Summary to see the permissions that are granted by your policy. Then choose Create policy to save your work.

  8. In the navigation pane of the IAM console, choose Roles, and then choose Create role.

  9. For Select trusted entity, choose Amazon service.

  10. Choose the Amazon Redshift service to assume this role.

  11. Choose the Redshift Customizable use case for your service. Then choose Next: Permissions.

  12. Search for the permissions policy that you created, RedshiftLakeFormationPolicy, and select the check box next to the policy name in the list.

  13. Choose Next: Tags.

  14. Choose Next: Review.

  15. For Role name, enter the name RedshiftLakeFormationRole.

  16. (Optional) For Role description, enter a description for the new role.

  17. Review the role, and then choose Create role.

To grant Select permissions on the table to be queried in the Lake Formation database
  1. Open the Lake Formation console at https://console.amazonaws.cn/lakeformation/. Sign in as the data lake administrator.

  2. In the navigation pane, under Permissions, choose Data lake permissions, and then choose Grant.

  3. Provide the following information:

    • For IAM users and roles, choose the IAM role you created, RedshiftLakeFormationRole. When you run the Amazon Redshift Query Editor, it uses this IAM role for permission to the data.

    • For Database, choose lakeformation_tutorial.

      The tables list populates.

    • For Table, choose a table within the data source to query.

    • Choose the Select table permission.

  4. Choose Grant.

To set up Amazon Redshift Spectrum and run queries
  1. Open the Amazon Redshift console at https://console.amazonaws.cn/redshift. Sign in as the user Administrator.

  2. Choose Create cluster.

  3. On the Create cluster page, enter redshift-lakeformation-demo for the Cluster identifier.

  4. For the Node type, select dc2.large.

  5. Scroll down, and under Database configurations, enter or accept these parameters:

    • Admin user name: awsuser

    • Admin user password: (Choose a password)

  6. Expand Cluster permissions, and for Available IAM roles, choose RedshiftLakeFormationRole. Then choose Add IAM role.

  7. If you must use a different port than the default value of 5439, next to Additional configurations, turn off the Use defaults option. Expand the section for Database configurations, and enter a new Database port number.

  8. Choose Create cluster.

    The Clusters page loads.

  9. Wait until the cluster status becomes Available. Choose the refresh icon periodically.

  10. Grant the data analyst permission to run queries against the cluster. To do so, complete the following steps.

    1. Open the IAM console at https://console.amazonaws.cn/iam/, and sign in as the Administrator user.

    2. In the navigation pane, choose Users, and attach the following managed policies to the user datalake_user.

      • AmazonRedshiftQueryEditor

      • AmazonRedshiftReadOnlyAccess

  11. Sign out of the Amazon Redshift console and sign back in as user datalake_user.

  12. In the left vertical toolbar, choose the EDITOR icon to open the query editor and connect to the cluster. If the Connect to database dialog box appears, choose the cluster name redshift-lakeformation-demo, and enter the database name dev, the user name awsuser, and the password that you created. Then choose Connect to database.

    Note

    If you are not prompted for connection parameters and another cluster is already selected in the query editor, choose Change Connection to open the Connect to database dialog box.

  13. In the New Query 1 text box, enter and run the following statement to map the database lakeformation_tutorial in Lake Formation to the Amazon Redshift schema name redshift_jdbc:

    Important

    Replace <account-id> with a valid Amazon account number, and <region> with a valid Amazon Region name (for example, us-east-1).

    create external schema if not exists redshift_jdbc from DATA CATALOG database 'lakeformation_tutorial' iam_role 'arn:aws:iam::<account-id>:role/RedshiftLakeFormationRole' region '<region>';
  14. In the schema list under Select schema, choose redshift_jdbc.

    The tables list populates. The query editor shows only the tables on which you were granted Lake Formation data lake permissions.

  15. On the pop-up menu next to a table name, choose Preview data.

    Amazon Redshift returns the first 10 rows.

    You can now run queries against the tables and columns for which you have permissions.