Querying JSON - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Querying JSON

Amazon Athena lets you query JSON-encoded data, extract data from nested JSON, search for values, and find length and size of JSON arrays. To learn the basics of querying JSON data in Athena, consider the following sample planet data:

{name:"Mercury",distanceFromSun:0.39,orbitalPeriod:0.24,dayLength:58.65} {name:"Venus",distanceFromSun:0.72,orbitalPeriod:0.62,dayLength:243.02} {name:"Earth",distanceFromSun:1.00,orbitalPeriod:1.00,dayLength:1.00} {name:"Mars",distanceFromSun:1.52,orbitalPeriod:1.88,dayLength:1.03}

Notice how each record (essentially, each row in the table) is on a separate line. To query this JSON data, you can use a CREATE TABLE statement like the following:

CREATE EXTERNAL TABLE `planets_json`( `name` string, `distancefromsun` double, `orbitalperiod` double, `daylength` double) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' LOCATION 's3://DOC-EXAMPLE-BUCKET/json/'

To query the data, use a simple SELECT statement like the following example.

SELECT * FROM planets_json

The query results look like the following.

# name distancefromsun orbitalperiod daylength
1 Mercury 0.39 0.24 58.65
2 Venus 0.72 0.62 243.02
3 Earth 1.0 1.0 1.0
4 Mars 1.52 1.88 1.03

Notice how the CREATE TABLE statement uses the OpenX JSON SerDe, which requires each JSON record to be on a separate line. If the JSON is in pretty print format, or if all records are on a single line, the data will not be read correctly.

To query JSON data that is in pretty print format, you can use the Amazon Ion Hive SerDe instead of the OpenX JSON SerDe. Consider the previous data stored in pretty print format:

{ name:"Mercury", distanceFromSun:0.39, orbitalPeriod:0.24, dayLength:58.65 } { name:"Venus", distanceFromSun:0.72, orbitalPeriod:0.62, dayLength:243.02 } { name:"Earth", distanceFromSun:1.00, orbitalPeriod:1.00, dayLength:1.00 } { name:"Mars", distanceFromSun:1.52, orbitalPeriod:1.88, dayLength:1.03 }

To query this data without reformatting, you can use a CREATE TABLE statement like the following. Notice that, instead of specifying the OpenX JSON SerDe, the statement specifies STORED AS ION.

CREATE EXTERNAL TABLE `planets_ion`( `name` string, `distancefromsun` DECIMAL(10, 2), `orbitalperiod` DECIMAL(10, 2), `daylength` DECIMAL(10, 2)) STORED AS ION LOCATION 's3://DOC-EXAMPLE-BUCKET/json-ion/'

The query SELECT * FROM planets_ion produces the same results as before. For more information about creating tables in this way using the Amazon Ion Hive SerDe, see Using CREATE TABLE to create Amazon Ion tables.

The preceding example JSON data does not contain complex data types such as nested arrays or structs. For more information about querying nested JSON data, see Example: deserializing nested JSON.