Searching for values in JSON arrays - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Searching for values in JSON arrays

To determine if a specific value exists inside a JSON-encoded array, use the json_array_contains function.

The following query lists the names of the users who are participating in "project2".

WITH dataset AS ( SELECT * FROM (VALUES (JSON '{"name": "Bob Smith", "org": "legal", "projects": ["project1"]}'), (JSON '{"name": "Susan Smith", "org": "engineering", "projects": ["project1", "project2", "project3"]}'), (JSON '{"name": "Jane Smith", "org": "finance", "projects": ["project1", "project2"]}') ) AS t (users) ) SELECT json_extract_scalar(users, '$.name') AS user FROM dataset WHERE json_array_contains(json_extract(users, '$.projects'), 'project2')

This query returns a list of users.

+-------------+ | user | +-------------+ | Susan Smith | +-------------+ | Jane Smith | +-------------+

The following query example lists the names of users who have completed projects along with the total number of completed projects. It performs these actions:

  • Uses nested SELECT statements for clarity.

  • Extracts the array of projects.

  • Converts the array to a native array of key-value pairs using CAST.

  • Extracts each individual array element using the UNNEST operator.

  • Filters obtained values by completed projects and counts them.

Note

When using CAST to MAP you can specify the key element as VARCHAR (native String in Presto), but leave the value as JSON, because the values in the MAP are of different types: String for the first key-value pair, and Boolean for the second.

WITH dataset AS ( SELECT * FROM (VALUES (JSON '{"name": "Bob Smith", "org": "legal", "projects": [{"name":"project1", "completed":false}]}'), (JSON '{"name": "Susan Smith", "org": "engineering", "projects": [{"name":"project2", "completed":true}, {"name":"project3", "completed":true}]}'), (JSON '{"name": "Jane Smith", "org": "finance", "projects": [{"name":"project2", "completed":true}]}') ) AS t (users) ), employees AS ( SELECT users, CAST(json_extract(users, '$.projects') AS ARRAY(MAP(VARCHAR, JSON))) AS projects_array FROM dataset ), names AS ( SELECT json_extract_scalar(users, '$.name') AS name, projects FROM employees, UNNEST (projects_array) AS t(projects) ) SELECT name, count(projects) AS completed_projects FROM names WHERE cast(element_at(projects, 'completed') AS BOOLEAN) = true GROUP BY name

This query returns the following result:

+----------------------------------+ | name | completed_projects | +----------------------------------+ | Susan Smith | 2 | +----------------------------------+ | Jane Smith | 1 | +----------------------------------+