Understanding tables, databases, and data catalogs - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Understanding tables, databases, and data catalogs

In Athena, catalogs, databases, and tables are containers for the metadata definitions that define a schema for underlying source data.

Athena uses the following terms to refer to hierarchies of data objects:

  • Data source – a group of databases

  • Database – a group of tables

  • Table – data organized as a group of rows or columns

Sometimes these objects are also referred to with alternate but equivalent names such as the following:

  • A data source is sometimes referred to as a catalog.

  • A database is sometimes referred to as a schema.

Note

This terminology can vary in the federated data sources that you use with Athena. For more information, see Athena and federated table name qualifiers.

The following example query in the Athena console uses the awsdatacatalog data source, the default database, and the some_table table.

Hierarchy of data objects in the Athena query editor.

For each dataset, a table needs to exist in Athena. The metadata in the table tells Athena where the data is located in Amazon S3, and specifies the structure of the data, for example, column names, data types, and the name of the table. Databases are a logical grouping of tables, and also hold only metadata and schema information for a dataset.

For each dataset that you'd like to query, Athena must have an underlying table it will use for obtaining and returning query results. Therefore, before querying data, a table must be registered in Athena. The registration occurs when you either create tables automatically or manually.

You can create a table automatically using an Amazon Glue crawler. For more information about Amazon Glue and crawlers, see Integration with Amazon Glue. When Amazon Glue creates a table, it registers it in its own Amazon Glue Data Catalog. Athena uses the Amazon Glue Data Catalog to store and retrieve this metadata, using it when you run queries to analyze the underlying dataset.

Regardless of how the tables are created, the table creation process registers the dataset with Athena. This registration occurs in the Amazon Glue Data Catalog and enables Athena to run queries on the data. In the Athena query editor, this catalog (or data source) is referred to with the label AwsDataCatalog.

After you create a table, you can use SQL SELECT statements to query it, including getting specific file locations for your source data. Your query results are stored in Amazon S3 in the query result location that you specify.

The Amazon Glue Data Catalog is accessible throughout your Amazon Web Services account. Other Amazon Web Services can share the Amazon Glue Data Catalog, so you can see databases and tables created throughout your organization using Athena and vice versa.

  • To create a table manually:

    • Use the Athena console to run the Create Table Wizard.

    • Use the Athena console to write Hive DDL statements in the Query Editor.

    • Use the Athena API or CLI to run a SQL query string with DDL statements.

    • Use the Athena JDBC or ODBC driver.

When you create tables and databases manually, Athena uses HiveQL data definition language (DDL) statements such as CREATE TABLE, CREATE DATABASE, and DROP TABLE under the hood to create tables and databases in the Amazon Glue Data Catalog.

To get started, you can use a tutorial in the Athena console or work through a step-by-step guide in the Athena documentation.

  • To use the tutorial in the Athena console, choose the information icon on the upper right of the console, and then choose the Tutorial tab.

  • For a step-by-step tutorial on creating a table and writing queries in the Athena query editor, see Getting started.