Use the Amazon CLI with Hive metastores
You can use aws athena CLI commands to manage the Hive metastore data
catalogs that you use with Athena. After you have defined one or more catalogs to use with
Athena, you can reference those catalogs in your aws athena DDL and DML
commands.
Using the Amazon CLI to manage Hive metastore catalogs
Registering a catalog: Create-data-catalog
To register a data catalog, you use the create-data-catalog command.
Use the name parameter to specify the name that you want to use for the
catalog. Pass the ARN of the Lambda function to the metadata-function
option of the parameters argument. To create tags for the new catalog,
use the tags parameter with one or more space-separated
Key=
argument pairs.key,Value=value
The following example registers the Hive metastore catalog named
hms-catalog-1. The command has been formatted for
readability.
$ aws athena create-data-catalog --name "hms-catalog-1" --type "HIVE" --description "Hive Catalog 1" --parameters "metadata-function=arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-v3,sdk-version=1.0" --tags Key=MyKey,Value=MyValue --region us-east-1
Showing catalog details: Get-data-catalog
To show the details of a catalog, pass the name of the catalog to the
get-data-catalog command, as in the following example.
$ aws athena get-data-catalog --name "hms-catalog-1" --region us-east-1
The following sample result is in JSON format.
{ "DataCatalog": { "Name": "hms-catalog-1", "Description": "Hive Catalog 1", "Type": "HIVE", "Parameters": { "metadata-function": "arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-v3", "sdk-version": "1.0" } } }
Listing registered catalogs: List-data-catalogs
To list the registered catalogs, use the list-data-catalogs command
and optionally specify a Region, as in the following example. The catalogs listed
always include Amazon Glue.
$ aws athena list-data-catalogs --region us-east-1
The following sample result is in JSON format.
{ "DataCatalogs": [ { "CatalogName": "AwsDataCatalog", "Type": "GLUE" }, { "CatalogName": "hms-catalog-1", "Type": "HIVE", "Parameters": { "metadata-function": "arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-v3", "sdk-version": "1.0" } } ] }
Updating a catalog: Update-data-catalog
To update a data catalog, use the update-data-catalog command, as in
the following example. The command has been formatted for readability.
$ aws athena update-data-catalog --name "hms-catalog-1" --type "HIVE" --description "My New Hive Catalog Description" --parameters "metadata-function=arn:aws:lambda:us-east-1:111122223333:function:external-hms-service-new,sdk-version=1.0" --region us-east-1
Deleting a catalog: Delete-data-catalog
To delete a data catalog, use the delete-data-catalog command, as in
the following example.
$ aws athena delete-data-catalog --name "hms-catalog-1" --region us-east-1
Showing database details: Get-database
To show the details of a database, pass the name of the catalog and the database
to the get-database command, as in the following example.
$ aws athena get-database --catalog-name hms-catalog-1 --database-name mydb
The following sample result is in JSON format.
{ "Database": { "Name": "mydb", "Description": "My database", "Parameters": { "CreatedBy": "Athena", "EXTERNAL": "TRUE" } } }
Listing databases in a catalog: List-databases
To list the databases in a catalog, use the list-databases command
and optionally specify a Region, as in the following example.
$ aws athena list-databases --catalog-name AwsDataCatalog --region us-west-2
The following sample result is in JSON format.
{ "DatabaseList": [ { "Name": "default" }, { "Name": "mycrawlerdatabase" }, { "Name": "mydatabase" }, { "Name": "sampledb", "Description": "Sample database", "Parameters": { "CreatedBy": "Athena", "EXTERNAL": "TRUE" } }, { "Name": "tpch100" } ] }
Showing table details: Get-table-metadata
To show the metadata for a table, including column names and datatypes, pass the
name of the catalog, database, and table name to the get-table-metadata
command, as in the following example.
$ aws athena get-table-metadata --catalog-name AwsDataCatalog --database-name mydb --table-name cityuseragent
The following sample result is in JSON format.
{ "TableMetadata": { "Name": "cityuseragent", "CreateTime": 1586451276.0, "LastAccessTime": 0.0, "TableType": "EXTERNAL_TABLE", "Columns": [ { "Name": "city", "Type": "string" }, { "Name": "useragent1", "Type": "string" } ], "PartitionKeys": [], "Parameters": { "COLUMN_STATS_ACCURATE": "false", "EXTERNAL": "TRUE", "inputformat": "org.apache.hadoop.mapred.TextInputFormat", "last_modified_by": "hadoop", "last_modified_time": "1586454879", "location": "s3://amzn-s3-demo-bucket/", "numFiles": "1", "numRows": "-1", "outputformat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "rawDataSize": "-1", "serde.param.serialization.format": "1", "serde.serialization.lib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "totalSize": "61" } } }
Showing metadata for all tables in a database: List-table-metadata
To show the metadata for all tables in a database, pass the name of the catalog
and database name to the list-table-metadata command. The
list-table-metadata command is similar to the
get-table-metadata command, except that you do not specify a table
name. To limit the number of results, you can use the --max-results
option, as in the following example.
$ aws athena list-table-metadata --catalog-name AwsDataCatalog --database-name sampledb --region us-east-1 --max-results 2
The following sample result is in JSON format.
{ "TableMetadataList": [ { "Name": "cityuseragent", "CreateTime": 1586451276.0, "LastAccessTime": 0.0, "TableType": "EXTERNAL_TABLE", "Columns": [ { "Name": "city", "Type": "string" }, { "Name": "useragent1", "Type": "string" } ], "PartitionKeys": [], "Parameters": { "COLUMN_STATS_ACCURATE": "false", "EXTERNAL": "TRUE", "inputformat": "org.apache.hadoop.mapred.TextInputFormat", "last_modified_by": "hadoop", "last_modified_time": "1586454879", "location": "s3://amzn-s3-demo-bucket/", "numFiles": "1", "numRows": "-1", "outputformat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "rawDataSize": "-1", "serde.param.serialization.format": "1", "serde.serialization.lib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "totalSize": "61" } }, { "Name": "clearinghouse_data", "CreateTime": 1589255544.0, "LastAccessTime": 0.0, "TableType": "EXTERNAL_TABLE", "Columns": [ { "Name": "location", "Type": "string" }, { "Name": "stock_count", "Type": "int" }, { "Name": "quantity_shipped", "Type": "int" } ], "PartitionKeys": [], "Parameters": { "EXTERNAL": "TRUE", "inputformat": "org.apache.hadoop.mapred.TextInputFormat", "location": "s3://amzn-s3-demo-bucket/", "outputformat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat", "serde.param.serialization.format": "1", "serde.serialization.lib": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe", "transient_lastDdlTime": "1589255544" } } ], "NextToken": "eyJsYXN0RXZhbHVhdGVkS2V5Ijp7IkhBU0hfS0VZIjp7InMiOiJ0Ljk0YWZjYjk1MjJjNTQ1YmU4Y2I5OWE5NTg0MjFjYTYzIn0sIlJBTkdFX0tFWSI6eyJzIjoiY2xlYXJpbmdob3VzZV9kYXRhIn19LCJleHBpcmF0aW9uIjp7InNlY29uZHMiOjE1ODkzNDIwMjIsIm5hbm9zIjo2NTUwMDAwMDB9fQ==" }
Running DDL and DML statements
When you use the Amazon CLI to run DDL and DML statements, you can pass the name of the Hive metastore catalog in one of two ways:
-
Directly into the statements that support it.
-
To the
--query-execution-contextCatalogparameter.
DDL statements
The following example passes in the catalog name directly as part of the
show create table DDL statement. The command has been formatted for
readability.
$ aws athena start-query-execution --query-string "show create table hms-catalog-1.hms_tpch_partitioned.lineitem" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"
The following example DDL show create table statement uses the
Catalog parameter of --query-execution-context to pass
the Hive metastore catalog name hms-catalog-1. The command has been
formatted for readability.
$ aws athena start-query-execution --query-string "show create table lineitem" --query-execution-context "Catalog=hms-catalog-1,Database=hms_tpch_partitioned" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"
DML statements
The following example DML select statement passes the catalog name
into the query directly. The command has been formatted for readability.
$ aws athena start-query-execution --query-string "select * from hms-catalog-1.hms_tpch_partitioned.customer limit 100" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"
The following example DML select statement uses the
Catalog parameter of --query-execution-context to pass
in the Hive metastore catalog name hms-catalog-1. The command has been
formatted for readability.
$ aws athena start-query-execution --query-string "select * from customer limit 100" --query-execution-context "Catalog=hms-catalog-1,Database=hms_tpch_partitioned" --result-configuration "OutputLocation=s3://amzn-s3-demo-bucket/lambda/results"