Table location in Amazon S3
When you run a CREATE TABLE
query in Athena, Athena registers your table with the
Amazon Glue Data Catalog, which is where Athena stores your metadata.
To specify the path to your data in Amazon S3, use the LOCATION
property, as shown
in the following example:
CREATE EXTERNAL TABLE `test_table`(
...
)
ROW FORMAT ...
STORED AS INPUTFORMAT ...
OUTPUTFORMAT ...
LOCATION s3://bucketname
/folder
/
-
For information about naming buckets, see Bucket restrictions and limitations in the Amazon Simple Storage Service User Guide.
-
For information about using folders in Amazon S3, see Using folders in the Amazon Simple Storage Service User Guide.
The LOCATION
in Amazon S3 specifies all of the files
representing your table.
Important
Athena reads all data stored in the Amazon S3 folder that you specify.
If you have data that you do not want Athena to read, do not store
that data in the same Amazon S3 folder as the data that you do want Athena to read. If you are
leveraging partitioning, to ensure Athena scans data within a partition, your
WHERE
filter must include the partition. For more information, see
Table location and partitions.
When you specify the LOCATION
in the CREATE TABLE
statement, use
the following guidelines:
-
Use a trailing slash.
-
You can use a path to an Amazon S3 folder or an Amazon S3 access point alias. For information about Amazon S3 access point aliases, see Using a bucket-style alias for your access point in the Amazon S3 User Guide.
Use:
s3://bucketname
/folder
/
s3://access-point-name
-metadata
-s3alias/folder
/
Do not use any of the following items for specifying the LOCATION
for your
data.
-
Do not use filenames, underscores, wildcards, or glob patterns for specifying file locations.
-
Do not add the full HTTP notation, such as
s3.amazon.com
to the Amazon S3 bucket path. -
Do not use empty folders like
//
in the path, as follows:S3://
. While this is a valid Amazon S3 path, Athena does not allow it and changes it tobucketname
/folder
//folder
/s3://
, removing the extrabucketname
/folder
/folder
//
.Do not use:
s3://path_to_bucket s3://path_to_bucket/* s3://path_to_bucket/mySpecialFile.dat s3://bucketname/prefix/filename.csv s3://test-bucket.s3.amazon.com S3://bucket/prefix//prefix/ arn:aws:s3:::bucketname/prefix s3://arn:aws:s3:
<region>
:<account_id>
:accesspoint/<accesspointname>
https://<accesspointname>
-<number>
.s3-accesspoint.<region>
.amazonaws.com
Table location and partitions
Your source data may be grouped into Amazon S3 folders called partitions based on a set of columns. For example, these columns may represent the year, month, and day the particular record was created.
When you create a table, you can choose to make it partitioned. When Athena runs a SQL
query against a non-partitioned table, it uses the LOCATION
property from
the table definition as the base path to list and then scan all available files.
However, before a partitioned table can be queried, you must update the Amazon Glue Data Catalog
with partition information. This information represents the schema of files within the
particular partition and the LOCATION
of files in Amazon S3 for the partition.
-
To learn how the Amazon Glue crawler adds partitions, see How does a crawler determine when to create partitions? in the Amazon Glue Developer Guide.
-
To learn how to configure the crawler so that it creates tables for data in existing partitions, see Using multiple data sources with crawlers.
-
You can also create partitions in a table directly in Athena. For more information, see Partitioning data in Athena.
When Athena runs a query on a partitioned table, it checks to see if any partitioned
columns are used in the WHERE
clause of the query. If partitioned columns
are used, Athena requests the Amazon Glue Data Catalog to return the partition specification
matching the specified partition columns. The partition specification includes the
LOCATION
property that tells Athena which Amazon S3 prefix to use when
reading data. In this case, only data stored in this prefix is
scanned. If you do not use partitioned columns in the WHERE
clause, Athena
scans all the files that belong to the table's partitions.
For examples of using partitioning with Athena to improve query performance and reduce
query costs, see Top performance tuning tips for Amazon Athena