Querying the Amazon Glue Data Catalog
You can use query editor v2 to query data cataloged in your Amazon Glue Data Catalog by using specific SQL
commands and granting the permissions outlined in this section. By default, the Amazon Glue Data Catalog
is listed as a query editor v2 database named awsdatacatalog
. Querying the Amazon Glue Data Catalog
is not available in all Amazon Redshift Amazon Web Services Regions. Use the SHOW command to determine if this
capability is available. For more information about Amazon Glue, see What is Amazon Glue? in the
Amazon Glue Developer Guide.
Note
Querying the Amazon Glue Data Catalog is only supported in Amazon Redshift RA3 node type clusters and Amazon Redshift Serverless.
You can configure your data warehouse and view the Amazon Glue database objects cataloged using the following SQL commands:
-
SHOW – to display whether
awsdatacatalog
is mounted for the currently connected data warehouse. For example, to show thedata_catalog_auto_mount
parameter value, run:SHOW data_catalog_auto_mount;
For more information, see SHOW in the Amazon Redshift Database Developer Guide.
-
ALTER SYSTEM – to change the system-level configuration of
data_catalog_auto_mount
. For example, to change thedata_catalog_auto_mount
parameter value toon
, run:ALTER SYSTEM SET data_catalog_auto_mount = on;
The change takes effect when a provisioned cluster is rebooted or a serverless workgroup is automatically paused and resumed. For more information, see ALTER SYSTEM in the Amazon Redshift Database Developer Guide.
-
SHOW SCHEMAS – shows a list of schemas. The schemas in the database named
awsdatacatalog
represent the Amazon Glue databases cataloged in the Amazon Glue Data Catalog. For example, to show these schemas, run:SHOW SCHEMAS FROM DATABASE awsdatacatalog;
For more information, see SHOW SCHEMAS in the Amazon Redshift Database Developer Guide.
-
SHOW TABLES – shows a list of tables in a schema. For example, to show the tables in the Amazon Glue Data Catalog database named
awsdatacatalog
that are in schemamyglue
run:SHOW TABLES FROM SCHEMA awsdatacatalog.myschema;
For more information, see SHOW TABLES in the Amazon Redshift Database Developer Guide.
-
SHOW COLUMNS – shows a list of columns in a table. For example, to show the columns in the Amazon Glue Data Catalog database named
awsdatacatalog
that are in schemamyglue
and tablemytable
run:SHOW COLUMNS FROM TABLE awsdatacatalog.myglue.mytable;
For more information, see SHOW COLUMNS in the Amazon Redshift Database Developer Guide.
To grant your IAM user or role permission to query the Amazon Glue Data Catalog,
-
In the tree-view pane, connect to your initial database in your provisioned cluster or serverless workgroup using the Database user name and password authentication method. For example, connect to the
dev
database using the admin user and password you used when you created the cluster or workgroup. -
In an editor tab, run the following SQL statement to grant an IAM user access to the Amazon Glue Data Catalog.
GRANT USAGE ON DATABASE awsdatacatalog to "
IAM:myIAMUser
"Where
IAM:myIAMUser
is an IAM user that you want to grant usage privilege to the Amazon Glue Data Catalog. Alternatively, you can grant usage privilege toIAMR:myIAMRole
for an IAM role. -
In the tree-view pane, edit or delete the connection to the cluster or workgroup you previously created. Connect to either your cluster or workgroup in one of the following ways:
-
To access the
awsdatacatalog
database from a cluster, you must use the authentication method Temporary credentials using your IAM identity. For more information about this authentication method, see Connecting to an Amazon Redshift database. Your query editor v2 administrator might need to configure the Account settings for the account to display this authentication method on the connection window. -
To access the
awsdatacatalog
database from a workgroup, you must use the authentication method Federated user. For more information about this authentication method, see Connecting to an Amazon Redshift database.
-
-
With the granted privilege, you can use your IAM identity to run SQL against your Amazon Glue Data Catalog.
After connecting, you can use query editor v2 to query data cataloged in Amazon Glue Data Catalog. On the query editor v2
tree-view pane, choose the cluster or workgroup and awsdatacatalog
database. In
the editor or notebook pane, confirm the correct cluster or workgroup is chosen. The
database chosen should be the initial Amazon Redshift database such as dev
. For
information about authoring queries, see Authoring queries with Amazon Redshift and Notebooks in Amazon Redshift. The
database named awsdatacatalog
is reserved to reference the external Data Catalog
database in your account. Queries against the awsdatacatalog
database can only
be read-only. Use three-part notation to reference the table in your SELECT statement. Where
the first part is the database name, the second part is the Amazon Glue database name, and the
third part is the Amazon Glue table name.
SELECT * FROM awsdatacatalog.
<aws-glue-db-name
>.<aws-glue-table-name
>;
You can perform various scenarios that read the Amazon Glue Data Catalog data and populate Amazon Redshift tables.
The following example SQL joins two tables that are defined in Amazon Glue.
SELECT pn.emp_id, alias, role, project_name FROM "awsdatacatalog"."empl_db"."project_name_table" pn, "awsdatacatalog"."empl_db"."project_alias_table" pa WHERE pn.emp_id = pa.emp_id;
The following example SQL creates an Amazon Redshift table and populates it with data from a join of two Amazon Glue tables.
CREATE TABLE dev.public.glue AS SELECT pn.emp_id, alias, role, project_name FROM "awsdatacatalog"."empl_db"."project_name_table" pn, "awsdatacatalog"."empl_db"."project_alias_table" pa WHERE pn.emp_id = pa.emp_id;
Querying Amazon S3 tables (preview)
You can use query editor v2 to query data held in Amazon S3table catalogs mounted to the Amazon Glue Data Catalog. Amazon S3 table catalogs are mounted to the Amazon Glue Data Catalog on creation, and automatically appear as external databases on all provisioned clusters and serverless workgroups in the same Amazon Web Services Region under the same account. For more information on accessing Amazon S3 tables using Amazon Redshift, see Accessing Amazon S3 tables with Amazon Redshift in the Amazon Simple Storage Service User Guide.