Examples of using semi-structured data in Amazon Redshift - Amazon Redshift
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Redshift will no longer support the creation of new Python UDFs starting November 1, 2025. If you would like to use Python UDFs, create the UDFs prior to that date. Existing Python UDFs will continue to function as normal. For more information, see the blog post .

Examples of using semi-structured data in Amazon Redshift

The following examples demonstrate how to work with semi-structured data in Amazon Redshift using PartiQL syntax. You will create a sample table to load a sample set of semi-structured data, then query semi-structured data objects in a variety of use cases.

Note

We recommend that you set the enable_case_sensitive_identifier and enable_case_sensitive_super_attribute configuration options before working with the SUPER data type. For more information, see enable_case_sensitive_identifier and enable_case_sensitive_super_attribute.

Loading semi-structured data

The following statements create a sample table and load a sample JSON object into the all_data SUPER column.

DROP TABLE IF EXISTS test_json; SET enable_case_sensitive_super_attribute TO true; SET enable_case_sensitive_identifier TO true; CREATE TABLE test_json (all_data SUPER); INSERT INTO test_json VALUES (JSON_PARSE(' { "data":{ "pnr":{ "type":"pnr", "pnrid":"123PQRS-2024-09-20", "bookingIdentifier":"123PQRS", "version":"5", "triggerType":"", "events":[ { "eventType":"UPDATED", "type":"PART", "id":"123PQRS-2024-09-20-HO-1" }, { "eventType":"CREATED", "type":"ABC", "id":"123PQRS-2024-09-20-OT-38" } ], "create":{ "pnrCreateDate":"2024-09-20T16:56:00Z", "officeID":"OFFCID1234", "officeIDCategory":"Email" }, "lastModification":{ "dateTime":"2024-09-20T17:09:00Z" }, "PARTDetails":[ { "path":"1", "TrainPARTs":[ { "PARTID":"123PQRS-2024-09-20-HO-1", "departure":{ "departureStation":"XYZ", "departureTimeLocal":"2024-10-03T06:30:00", "departureTimeGMT":"2024-10-03T10:30:00Z" }, "arrival":{ "arrivalStation":"ABC", "arrivalTimeLocal":"2024-10-03T08:20:00", "arrivalTimeGMT":"2024-10-03T15:20:00Z" }, "marketing":{ "carrierCode":"XX", "TrainNumber":"100" }, "operating":{ "carrierCode":"YY", "TrainNumber":"100-A" }, "status":"ON", "aircraft":{ "code":"222" }, "class":"WC", "first":"Y", "seating":[ ] } ] } ], "commuterInformation":[ { "commuterID":"2", "commuterPNR":"123PQRS-2024-09-20-RO-2", "commuterTypeCode":"DOM", "firstName":"JOHN", "lastName":"MILLER" } ], "contactDetail":[ { "emailContacts":[ { "id":"123PQRS-2024-09-20-OT-4", "contact":"JOHNMILLER@EXAMPLE.COM", "purpose":[ "BUSINESS" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ], "language":"EN" }, { "id":"123PQRS-2024-09-20-OT-5", "contact":"HARVEYCORMIER@EXAMPLE.COM", "purpose":[ "NOTIFICATION" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ], "language":"EN" } ] }, { "phoneContacts":[ { "id":"123PQRS-2024-09-20-OT-3", "contact":"1234567890", "purpose":[ "NOTIFICATION" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ], "language":"" } ] }, { "addressInfo":[ { "id":"123PQRS-2024-09-20-OT-6", "addressline":[ "112 PORT STREET" ], "provinceState":"CA", "cityName":"SAN JOSE", "postalCode":"12345", "countryCode":"USA", "purpose":[ "MAILING" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] } ] } ], "PendingService":[ { "id":"123PQRS-2024-09-20-OT-26", "code":"MONO", "status":"", "text":"Broken Seat at Coach-No XYZ123 Seat-No 567", "trainCode":"WC-1", "TrainsArray":[ "123PQRS-2024-09-20-HO-1" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] }, { "id":"123PQRS-2024-09-20-OT-27", "code":"OTHS", "status":"", "text":"Broken Seat at Coach-No XYZ567 Seat-No 111", "trainCode":"WC-1", "TrainsArray":[ "123PQRS-2024-09-20-HO-1" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] }, { "id":"123PQRS-2024-09-20-OT-28", "code":"OTHS", "status":"", "text":"Broken Seat at Coach-No XYZ890 Seat-No 123", "trainCode":"WC-1", "TrainsArray":[ "123PQRS-2024-09-20-HO-1" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] }, { "id":"123PQRS-2024-09-20-OT-29", "code":"OTHS", "status":"", "text":"Broken Seat at Coach-No XYZ111 Seat-No 333", "trainCode":"WC-1", "TrainsArray":[ "123PQRS-2024-09-20-HO-1" ], "commuter":[ "123PQRS-2024-09-20-RO-2" ] } ], "parts": [ { "partname": "prop", "manufacturer": "local parts co", "quality": 2, "price": 10.00 }, { "partname": "prop", "manufacturer": "big parts co", "quality": null, "price": 9.00 }, { "partname": "prop", "manufacturer": "small parts co", "quality": 1, "price": 12.00 }, { "partname": "rudder", "manufacturer": "local parts co", "quality": 1, "price": 2.50 }, { "partname": "rudder", "manufacturer": "big parts co", "quality": 2, "price": 3.75 }, { "partname": "rudder", "manufacturer": "small parts co", "quality": null, "price": 1.90 }, { "partname": "wing", "manufacturer": "local parts co", "quality": null, "price": 7.50 }, { "partname": "wing", "manufacturer": "big parts co", "quality": 1, "price": 15.20 }, { "partname": "wing", "manufacturer": "small parts co", "quality": null, "price": 11.80 } ], "count_by_color": [ { "quality": "high", "red": 15, "green": 20, "blue": 7 }, { "quality": "normal", "red": 35, "green": null, "blue": 40 }, { "quality": "low", "red": 10, "green": 23, "blue": null } ] } }, "id":"abcdefgh-ijklmnop-qrstuvwxyz123", "mainIds":[ { "ID":"pqrstuvwxyz-aabbcc123", "Source":"NYC" } ] } '));

Querying nested semi-structured data

The following statement uses PartiQL’s dot notation to extract the pnrid field, which is nested three levels deep inside the top-level all_data object.

select all_data.data.pnr.pnrid::varchar from test_json; pnrid -------------------- 123PQRS-2024-09-20

The following statement uses PartiQL’s bracket notation to specify and extract only the first element from the events array nested inside the top-level object.

SELECT all_data.data.pnr.events[0] FROM test_json; events --------------------------------- { "eventType":"UPDATED", "type":"PART", "id":"123PQRS-2024-09-20-HO-1" }

The following statement extracts the eventType property of only the specified element from the events array.

SELECT all_data.data.pnr.events[0].eventType FROM test_json; eventtype ----------- "UPDATED"

The following statements

Using enable_case_sensitive_identifier and enable_case_sensitive_super_attribute with semi-structured data

The following examples show how the configuration options enable_case_sensitive_identifier and enable_case_sensitive_super_attribute differ when used for querying semi-structured data. For more information on these configuration options, see Accessing JSON fields with uppercase and mixed-case field names or attributes.

In the following statement, resetting both configuration options to their default of false makes the query return NULL.

RESET enable_case_sensitive_identifier; RESET enable_case_sensitive_super_attribute; SELECT all_data.data.pnr.events[0].eventType FROM test_json; eventtype ----------- NULL

In following example, the sample query returns the desired result after you wrap the case sensitive attributes in double quotation marks and set enable_case_sensitive_identifier to true.

RESET enable_case_sensitive_identifier; RESET enable_case_sensitive_super_attribute; SELECT all_data.data.pnr.events[0]."eventType" FROM test_json; eventtype ----------- NULL SET enable_case_sensitive_identifier TO true; SELECT all_data.data.pnr.events[0]."eventType" FROM test_json; eventtype ----------- "UPDATED"

In the following example, the sample query returns the desired result after you set enable_case_sensitive_super_attribute to true without wrapping the case sensitive attributes in double quotation marks.

RESET enable_case_sensitive_identifier; RESET enable_case_sensitive_super_attribute; SELECT all_data.data.pnr.events[0].eventType FROM test_json; eventtype ----------- NULL SET enable_case_sensitive_super_attribute TO true; SELECT all_data.data.pnr.events[0].eventType FROM test_json; eventtype ----------- "UPDATED"

Filtering semi-structured data

The following statement uses PartiQL syntax in the WHERE clause of a statement that counts events of the type UPDATED to retrieve data of a certain attribute from inside an array. You can use this syntax in any portion of the query where you would normally reference columns.

SELECT COUNT(*) FROM test_json WHERE all_data.data.pnr.events[0].eventType = 'UPDATED'; count ------ 1

The following example uses PartiQL’s bracket and dot syntax in both GROUP BY and ORDER BY clauses.

SELECT all_data.data.pnr.events[0].eventType::varchar, COUNT(*) FROM test_json WHERE all_data.data.pnr.events[0].eventType IS NOT NULL GROUP BY all_data.data.pnr.events[0].eventType ORDER BY all_data.data.pnr.events[0].eventType; eventtype | count -----------+------- "UPDATED" | 1

Unnesting semi-structured data

The following statement uses PartiQL joins to unnest the events array. Note that this join works even when the number of indexes for the array aren’t static.

For examples of unnesting semi-structured data using UNNEST in the FROM clause, see UNNEST examples.

SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, b.eventType::varchar event_type, b.id::varchar event_id FROM test_json a, a.all_data.data.pnr.events b; type_info | pnr_id | booking_id | version_info | event_type | event_id -----------+---------------------+------------+--------------+------------+------------------------- pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | UPDATED | 123PQRS-2024-09-20-HO-1 pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | CREATED | 123PQRS-2024-09-20-OT-38

Unnesting nested arrays

The following statement uses PartiQL joins to unnest an array that’s nested inside another array.

For examples of unnesting semi-structured data using UNNEST in the FROM clause, see UNNEST examples.

SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, d.id::varchar email_record_id, d.contact::varchar email_contact, e::varchar email_purpose, f::varchar email_commuter FROM test_json a, a.all_data.data.pnr.contactDetail c, c."emailContacts" d, d.purpose e, d.commuter f; type_info | pnr_id | booking_id | version_info | email_record_id | email_contact | email_purpose | email_commuter -----------+---------------------+------------+--------------+-------------------------+---------------------------+---------------+------------------------- pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | 123PQRS-2024-09-20-OT-4 | JOHNMILLER@EXAMPLE.COM | BUSINESS | 123PQRS-2024-09-20-RO-2 pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | 123PQRS-2024-09-20-OT-5 | HARVEYCORMIER@EXAMPLE.COM | NOTIFICATION | 123PQRS-2024-09-20-RO-2

Using semi-structured data in subqueries

The following statement uses a subquery in the WHERE clause to return only a subsection of the results from the previous example.

SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, d.id::varchar email_record_id, d.contact::varchar email_contact FROM test_json a, a.all_data.data.pnr.contactDetail c, c."emailContacts" d WHERE (SELECT COUNT(*) FROM d.purpose e WHERE e = 'BUSINESS') > 0; type_info | pnr_id | booking_id | version_info | email_record_id | email_contact | email_purpose | email_commuter -----------+---------------------+------------+--------------+-------------------------+---------------------------+---------------+------------------------- pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | 123PQRS-2024-09-20-OT-4 | JOHNMILLER@EXAMPLE.COM | BUSINESS | 123PQRS-2024-09-20-RO-2

Aggregating queries using semi-structured data

The following statement uses the COUNT function to aggregate the number of elements in the PendingService array.

SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, COUNT(*) AS total_pending_service FROM test_json a, a.all_data.data.pnr.PendingService c GROUP BY 1,2,3,4; type_info | pnr_id | booking_id | version_info | total_pending_service -----------+--------------------+------------+--------------+----------------------- pnr | 123PQRS-2024-09-20 | 123PQRS | 5 | 4

Using semi-structured data in materialized views

The following statement uses the statement from the previous example to create a materialized view. The materialized view automatically refresh the number of pending services when the base table gets new data.

CREATE MATERIALIZED VIEW mv_total_pending_service AUTO REFRESH YES AS SELECT a.all_data.data.pnr.type::varchar type_info, a.all_data.data.pnr.pnrid::varchar pnr_id , a.all_data.data.pnr.bookingIdentifier::varchar booking_id, a.all_data.data.pnr.version::varchar version_info, COUNT(*) AS total_pending_service FROM test_json a, a.all_data.data.pnr.PendingService c GROUP BY 1,2,3,4;

Using PIVOT and UNPIVOT with semi-structured data

The following statement uses PIVOT on the partname column to return the average price of each part.

SELECT * FROM ( SELECT c.partname::varchar, c.price FROM test_json a, a.all_data.data.pnr.parts c) PIVOT (AVG(price) for partname IN ('prop', 'rudder', 'wing')); prop | rudder | wing ------------+--------------------+-------- 10.33 | 2.71 | 11.50

In the previous example, the results are transformed into columns. The following example shows a GROUP BY query that returns the average prices in rows, rather than in columns.

SELECT partname, avg(price) FROM ( SELECT c.partname::varchar, c.price FROM test_json a, a.all_data.data.pnr.parts c) WHERE partname IN ('prop', 'rudder', 'wing') GROUP BY partname; partname | avg ----------+------- prop | 10.33 rudder | 2.71 wing | 11.50

Following is a PIVOT example with manufacturer as an implicit column.

SELECT * FROM ( SELECT c.quality, c.manufacturer::varchar FROM test_json a, a.all_data.data.pnr.parts c) PIVOT ( count(*) FOR quality IN (1, 2, NULL) ); manufacturer | 1 | 2 | null -------------------+----+----+------ local parts co | 1 | 1 | 1 big parts co | 1 | 1 | 1 small parts co | 1 | 0 | 2

Following is an UNPIVOT example on the quality column.

SELECT * FROM ( SELECT c.quality as quality FROM test_json a, a.all_data.data.pnr.parts c) UNPIVOT (cnt FOR column_header IN (quality)); column_header | cnt -----------------+---- quality | 2 quality | 1 quality | 1 quality | 2 quality | 1