Table location in Amazon S3 - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Table location in Amazon S3

When you run a CREATE TABLE query in Athena, Athena registers your table with the Amazon Glue Data Catalog, which is where Athena stores your metadata.

To specify the path to your data in Amazon S3, use the LOCATION property, as shown in the following example:

CREATE EXTERNAL TABLE `test_table`( ... ) ROW FORMAT ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ... LOCATION s3://bucketname/folder/
  • For information about naming buckets, see Bucket restrictions and limitations in the Amazon Simple Storage Service User Guide.

  • For information about using folders in Amazon S3, see Using folders in the Amazon Simple Storage Service User Guide.

The LOCATION in Amazon S3 specifies all of the files representing your table.

Important

Athena reads all data stored in the Amazon S3 folder that you specify. If you have data that you do not want Athena to read, do not store that data in the same Amazon S3 folder as the data that you do want Athena to read. If you are leveraging partitioning, to ensure Athena scans data within a partition, your WHERE filter must include the partition. For more information, see Table location and partitions.

When you specify the LOCATION in the CREATE TABLE statement, use the following guidelines:

Use:

s3://bucketname/folder/
s3://access-point-name-metadata-s3alias/folder/

Do not use any of the following items for specifying the LOCATION for your data.

  • Do not use filenames, underscores, wildcards, or glob patterns for specifying file locations.

  • Do not add the full HTTP notation, such as s3.amazon.com to the Amazon S3 bucket path.

  • Do not use empty folders like // in the path, as follows: S3://bucketname/folder//folder/. While this is a valid Amazon S3 path, Athena does not allow it and changes it to s3://bucketname/folder/folder/, removing the extra /.

    Do not use:

    s3://path_to_bucket s3://path_to_bucket/* s3://path_to_bucket/mySpecialFile.dat s3://bucketname/prefix/filename.csv s3://test-bucket.s3.amazon.com S3://bucket/prefix//prefix/ arn:aws:s3:::bucketname/prefix s3://arn:aws:s3:<region>:<account_id>:accesspoint/<accesspointname> https://<accesspointname>-<number>.s3-accesspoint.<region>.amazonaws.com

Table location and partitions

Your source data may be grouped into Amazon S3 folders called partitions based on a set of columns. For example, these columns may represent the year, month, and day the particular record was created.

When you create a table, you can choose to make it partitioned. When Athena runs a SQL query against a non-partitioned table, it uses the LOCATION property from the table definition as the base path to list and then scan all available files. However, before a partitioned table can be queried, you must update the Amazon Glue Data Catalog with partition information. This information represents the schema of files within the particular partition and the LOCATION of files in Amazon S3 for the partition.

When Athena runs a query on a partitioned table, it checks to see if any partitioned columns are used in the WHERE clause of the query. If partitioned columns are used, Athena requests the Amazon Glue Data Catalog to return the partition specification matching the specified partition columns. The partition specification includes the LOCATION property that tells Athena which Amazon S3 prefix to use when reading data. In this case, only data stored in this prefix is scanned. If you do not use partitioned columns in the WHERE clause, Athena scans all the files that belong to the table's partitions.

For examples of using partitioning with Athena to improve query performance and reduce query costs, see Top 10 performance tuning tips for Amazon Athena.