Querying the Amazon Glue Data Catalog - Amazon Redshift
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Querying the Amazon Glue Data Catalog

You can use query editor v2 to query data cataloged in your Amazon Glue Data Catalog by using specific SQL commands and granting the permissions outlined in this section. By default, the Amazon Glue Data Catalog is listed as a query editor v2 database named awsdatacatalog. Querying the Amazon Glue Data Catalog is not available in all Amazon Redshift Amazon Web Services Regions. Use the SHOW command to determine if this capability is available. For more information about Amazon Glue, see What is Amazon Glue? in the Amazon Glue Developer Guide.

Note

Querying the Amazon Glue Data Catalog is only supported in Amazon Redshift RA3 node type clusters and Amazon Redshift Serverless.

You can configure your data warehouse and view the Amazon Glue database objects cataloged using the following SQL commands:

  • SHOW – to display whether awsdatacatalog is mounted for the currently connected data warehouse. For example, to show the data_catalog_auto_mount parameter value, run:

    SHOW data_catalog_auto_mount;

    For more information, see SHOW in the Amazon Redshift Database Developer Guide.

  • ALTER SYSTEM – to change the system-level configuration of data_catalog_auto_mount. For example, to change the data_catalog_auto_mount parameter value to on, run:

    ALTER SYSTEM SET data_catalog_auto_mount = on;

    The change takes effect when a provisioned cluster is rebooted or a serverless workgroup is automatically paused and resumed. For more information, see ALTER SYSTEM in the Amazon Redshift Database Developer Guide.

  • SHOW SCHEMAS – shows a list of schemas. The schemas in the database named awsdatacatalog represent the Amazon Glue databases cataloged in the Amazon Glue Data Catalog. For example, to show these schemas, run:

    SHOW SCHEMAS FROM DATABASE awsdatacatalog;

    For more information, see SHOW SCHEMAS in the Amazon Redshift Database Developer Guide.

  • SHOW TABLES – shows a list of tables in a schema. For example, to show the tables in the Amazon Glue Data Catalog database named awsdatacatalog that are in schema myglue run:

    SHOW TABLES FROM SCHEMA awsdatacatalog.myschema;

    For more information, see SHOW TABLES in the Amazon Redshift Database Developer Guide.

  • SHOW COLUMNS – shows a list of columns in a table. For example, to show the columns in the Amazon Glue Data Catalog database named awsdatacatalog that are in schema myglue and table mytable run:

    SHOW COLUMNS FROM TABLE awsdatacatalog.myglue.mytable;

    For more information, see SHOW COLUMNS in the Amazon Redshift Database Developer Guide.

To grant your IAM user or role permission to query the Amazon Glue Data Catalog,
  1. In the tree-view pane, connect to your initial database in your provisioned cluster or serverless workgroup using the Database user name and password authentication method. For example, connect to the dev database using the admin user and password you used when you created the cluster or workgroup.

  2. In an editor tab, run the following SQL statement to grant an IAM user access to the Amazon Glue Data Catalog.

    GRANT USAGE ON DATABASE awsdatacatalog to "IAM:myIAMUser"

    Where IAM:myIAMUser is an IAM user that you want to grant usage privilege to the Amazon Glue Data Catalog. Alternatively, you can grant usage privilege to IAMR:myIAMRole for an IAM role.

  3. In the tree-view pane, edit or delete the connection to the cluster or workgroup you previously created. Connect to either your cluster or workgroup in one of the following ways:

    • To access the awsdatacatalog database from a cluster, you must use the authentication method Temporary credentials using your IAM identity. For more information about this authentication method, see Connecting to an Amazon Redshift database. Your query editor v2 administrator might need to configure the Account settings for the account to display this authentication method on the connection window.

    • To access the awsdatacatalog database from a workgroup, you must use the authentication method Federated user. For more information about this authentication method, see Connecting to an Amazon Redshift database.

  4. With the granted privilege, you can use your IAM identity to run SQL against your Amazon Glue Data Catalog.

After connecting, you can use query editor v2 to query data cataloged in Amazon Glue Data Catalog. On the query editor v2 tree-view pane, choose the cluster or workgroup and awsdatacatalog database. In the editor or notebook pane, confirm the correct cluster or workgroup is chosen. The database chosen should be the initial Amazon Redshift database such as dev. For information about authoring queries, see Authoring queries with Amazon Redshift and Notebooks in Amazon Redshift. The database named awsdatacatalog is reserved to reference the external Data Catalog database in your account. Queries against the awsdatacatalog database can only be read-only. Use three-part notation to reference the table in your SELECT statement. Where the first part is the database name, the second part is the Amazon Glue database name, and the third part is the Amazon Glue table name.

SELECT * FROM awsdatacatalog.<aws-glue-db-name>.<aws-glue-table-name>;

You can perform various scenarios that read the Amazon Glue Data Catalog data and populate Amazon Redshift tables.

The following example SQL joins two tables that are defined in Amazon Glue.

SELECT pn.emp_id, alias, role, project_name FROM "awsdatacatalog"."empl_db"."project_name_table" pn, "awsdatacatalog"."empl_db"."project_alias_table" pa WHERE pn.emp_id = pa.emp_id;

The following example SQL creates an Amazon Redshift table and populates it with data from a join of two Amazon Glue tables.

CREATE TABLE dev.public.glue AS SELECT pn.emp_id, alias, role, project_name FROM "awsdatacatalog"."empl_db"."project_name_table" pn, "awsdatacatalog"."empl_db"."project_alias_table" pa WHERE pn.emp_id = pa.emp_id;

Querying Amazon S3 tables (preview)

You can use query editor v2 to query data held in Amazon S3table catalogs mounted to the Amazon Glue Data Catalog. Amazon S3 table catalogs are mounted to the Amazon Glue Data Catalog on creation, and automatically appear as external databases on all provisioned clusters and serverless workgroups in the same Amazon Web Services Region under the same account. For more information on accessing Amazon S3 tables using Amazon Redshift, see Accessing Amazon S3 tables with Amazon Redshift in the Amazon Simple Storage Service User Guide.