How to choose partition keys - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

How to choose partition keys

You can specify how partition projection maps the partition locations to partition keys. In the CREATE TABLE example in the previous section, the date and hour were combined into one partition key called datehour, but other schemes are possible. For example, you could also configure a table with separate partition keys for the year, month, day, and hour.

However, splitting dates into year, month, and day means that the date partition projection type can't be used. An alternative is to separate the date from the hour to still leverage the date partition projection type, but make queries that specify hour ranges easier to read.

With that in mind, the following CREATE TABLE example separates the date from the hour. Because date is a reserved word in SQL, the example uses day as the name for the partition key that represents the date.

CREATE EXTERNAL TABLE my_ingested_data2 ( ... ) ... PARTITIONED BY ( day STRING, hour INT ) LOCATION "s3://amzn-s3-demo-bucket/prefix/" TBLPROPERTIES ( "projection.enabled" = "true", "projection.day.type" = "date", "projection.day.format" = "yyyy/MM/dd", "projection.day.range" = "2021/01/01,NOW", "projection.day.interval" = "1", "projection.day.interval.unit" = "DAYS", "projection.hour.type" = "integer", "projection.hour.range" = "0,23", "projection.hour.digits" = "2", "storage.location.template" = "s3://amzn-s3-demo-bucket/prefix/${day}/${hour}/" )

In the example CREATE TABLE statement, the hour is a separate partition key, configured as an integer. The configuration for the hour partition key specifies the range 0 to 23, and that the hour should be formatted with two digits when Athena generates the partition locations.

A query for the my_ingested_data2 table might look like this:

SELECT * FROM my_ingested_data2 WHERE day = '2021/11/09' AND hour > 3

Understand partition key and partition projection data types

Note that datehour key in the first CREATE TABLE example is configured as date in the partition projection configuration, but the type of the partition key is string. The same is true for day in the second example. The types in the partition projection configuration only tell Athena how to format the values when it generates the partition locations. The types that you specify do not change the type of the partition key — in queries, datehour and day are of type string.

When a query includes a condition like day = '2021/11/09', Athena parses the string on the right side of the expression using the date format specified in the partition projection configuration. After Athena verifies that the date is within the configured range, it uses the date format again to insert the date as a string into the storage location template.

Similarly, for a query condition like day > '2021/11/09', Athena parses the right side and generates a list of all matching dates within the configured range. It then uses the date format to insert each date into the storage location template to create the list of partition locations.

Writing the same condition as day > '2021-11-09' or day > DATE '2021-11-09' does not work. In the first case, the date format does not match (note the hyphens instead of the forward slashes), and in the second case, the data types do not match.