

 Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://amazonaws-china.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). 

# Semi-structured data in Amazon Redshift
Semi-structured data in Amazon Redshift

By using *semi-structured data support* in Amazon Redshift, you can ingest and store semi-structured data in your Amazon Redshift data warehouses. Using the SUPER data type and PartiQL language, Amazon Redshift expands data warehouse capability to integrate with both SQL and NoSQL data sources. This way, Amazon Redshift enables efficient analytics on relational and semi-structured stored data such as JSON. For information on Amazon Redshift integration with PartiQL, see [PartiQL – an SQL-compatible query language for Amazon Redshift](super-partiql.md).

Amazon Redshift offers two forms of semi-structured data support: the SUPER data type and Amazon Redshift Spectrum. 

You can query semi-structured data by ingesting it into Amazon Redshift and storing it in the SUPER data type or use Amazon Redshift Spectrum to query the data stored in Amazon S3.

## Best practices for working with semi-structured data


Consider the following best practices when working with semi-structured data.
+  We recommend that you set the `enable_case_sensitive_super_attribute` and `enable_case_sensitive_identifier`configuration options to true when working with SUPER data. For more information, see [enable\$1case\$1sensitive\$1super\$1attribute](r_enable_case_sensitive_super_attribute.md) and [enable\$1case\$1sensitive\$1identifier](r_enable_case_sensitive_identifier.md). 
+  Use the COPY command to load data from Amazon S3 buckets into Amazon Redshift SUPER columns. 
+  Use PartiQL dynamic typing and lax semantics to run ad hoc queries on SUPER data values without having to impose a schema before querying. For information on dynamic typing, see [Dynamic typing](query-super.md#dynamic-typing-lax-processing). For information on lax semantics, see [Lax semantics](query-super.md#lax-semantics). 
+  Shred schemaless and semi-structured data into materialized views using PartiQL if you plan to query the data frequently. When you perform analytics on the shredded data, the columnar organization of Amazon Redshift materialized views provides better performance. Furthermore, users and business intelligence (BI) tools that require a conventional schema for ingested data can use views (either materialized or virtual) as the conventional schema presentation of the data. 

  After your PartiQL materialized views have extracted the data found in JSON or SUPER into conventional columnar materialized views, you can query the materialized views. For information on materialized views, see [Materialized views in Amazon Redshift](materialized-view-overview.md). For more information on how the SUPER data type works with materialized views, see [SUPER data type and materialized views](r_SUPER_MV.md).

## Concepts for SUPER data type use


Following, you can find some Amazon Redshift SUPER data type concepts.

**Understand what the SUPER data type is in Amazon Redshift** – The *SUPER* data type is an Amazon Redshift data type that enables the storage of schemaless arrays and structures that contain Amazon Redshift scalars and possibly nested arrays and structures. The SUPER data type can natively store different formats of semi-structured data, such as JSON or data originating from document-oriented sources. You can add a new SUPER column to store semi-structured data and write queries that access the SUPER column, along with the usual scalar columns. For more information about the SUPER data type, see [SUPER type](r_SUPER_type.md).

**Ingest schemaless JSON into SUPER** – With the flexible semi-structured SUPER data type, Amazon Redshift can receive and ingest schemaless JSON into columns with the SUPER data type. For example, you can ingest the JSON value `[10.5, “first”]` into a SUPER data type column by using the COPY command. The column would hold a SUPER value of `[10.5, ‘first’]`. You can also ingest JSON using [JSON\$1PARSE function](JSON_PARSE.md). Both COPY and `json_parse` ingest JSON using strict parsing semantics by default. You can also construct SUPER values including arrays and structures, using the database data themselves.

The SUPER column requires no schema modifications while ingesting the flexible structures of schemaless JSON. For example, while analyzing a click-stream, you initially store in the SUPER column “click” structures with attributes “IP” and “time”. You can add an attribute “customer id” without changing your schema in order to ingest such changes.

The native format used for the SUPER data type is a binary format that requires less space than the JSON value in its textual form. This enables faster ingestion and runtime processing of SUPER values at query. 

**Query SUPER data with PartiQL** – PartiQL is a backward compatible extension of SQL-92 that many Amazon services currently use. With the use of PartiQL, familiar SQL constructs seamlessly combine access to both the classic, tabular SQL data and the semi-structured data of SUPER. You can perform object and array navigation and unnest arrays. PartiQL extends the standard SQL language to declaratively express and process nested and multivalued data.

PartiQL is an extension of SQL where the nested and schemaless data of SUPER columns are first-class citizens. PartiQL doesn't require all query expressions to be type-checked during query compilation time. This approach enables query expressions that contain the SUPER data type to be dynamically typed during query execution when the actual types of the data inside the SUPER columns are accessed. Also, PartiQL operates in a lax mode where type inconsistencies don't cause failures but return null. The combination of schemaless and lax query processing makes PartiQL ideal for extract, load, transform (ELT) applications where your SQL query evaluates the JSON data that are ingested in the SUPER columns.

For more information on PartiQL for Amazon Redshift, see [PartiQL – an SQL-compatible query language for Amazon Redshift](super-partiql.md). For information on dynamic typing, see [Dynamic typing](query-super.md#dynamic-typing-lax-processing). For information on lax query processing, see [Lax semantics](query-super.md#lax-semantics).

**Integrate with Redshift Spectrum** – Amazon Redshift supports multiple aspects of PartiQL when running Redshift Spectrum queries over JSON, Parquet, and other formats that have nested data. Redshift Spectrum only supports nested data that has schemas. For example, with Redshift Spectrum you can declare that your JSON data has the attribute `nested_schemaful_example` in the schema `ARRAY<STRUCT<a:INTEGER, b:DECIMAL(5,2)>>`. The schema of this attribute determines that the data always contains an array, which contains a structure with integer *a* and decimal *b*. If the data changes to include more attributes, the type also changes. In contrast, the SUPER data type requires no schema. You can store arrays with structure elements that have different attributes or types. Also, values can be stored outside arrays. 

## Considerations for using SUPER type data


When working with SUPER data, consider the following:
+ Use JDBC driver version 2.x, ODBC driver version 2.x, or Amazon Redshift Python driver version 2.0.872 or later. The ODBC driver version 1.x isn’t supported. 

  For information about JDBC drivers, see [ Configuring a connection for JDBC driver version 2.x for Amazon Redshift](https://docs.amazonaws.cn/redshift/latest/mgmt/jdbc20-install.html) in the *Amazon Redshift Management Guide*.

  For information about ODBC drivers, see [ Configuring a connection for ODBC driver version 2.x for Amazon Redshift](https://docs.amazonaws.cn/redshift/latest/mgmt/odbc20-install.html) in the *Amazon Redshift Management Guide*..

  For information about Python drivers, see [ Amazon Redshift Python connector](https://docs.amazonaws.cn/redshift/latest/mgmt/python-redshift-driver.html) in the *Amazon Redshift Management Guide*..

For more information about SUPER configurations, see [SUPER configurations](super-configurations.md).

# PartiQL – an SQL-compatible query language for Amazon Redshift
PartiQL for Amazon Redshift

Amazon Redshift supports PartiQL, an SQL-compatible query language, to select, insert, update, and delete data in Amazon Redshift. Using PartiQL, you can easily interact with Amazon Redshift tables and run ad hoc queries using the Amazon Web Services Management Console, SQL Workbench/J, the Amazon Command Line Interface, and Amazon Redshift Data APIs for PartiQL. 

PartiQL operations provide the same availability, latency, and performance as the other Redshift data plane operations.

The following sections describe the Amazon Redshift implementation of PartiQL.

## What is PartiQL?


PartiQL provides SQL-compatible query access across multiple data stores containing structured, semi-structured, and nested data. It is widely used within Amazon and is now available as part of many Amazon services, including Amazon Redshift.

For the PartiQL specification and a tutorial on the core query language, see the [PartiQL overview](https://partiql.org/dql/overview.html).

For examples of using PartiQL queries in Amazon Redshift with semi-structured data, see [Examples of using semi-structured data in Amazon Redshift](super-examples.md).

**Note**  
Amazon Redshift supports a subset of the PartiQL query language.
Amazon Redshift doesn't support the [ Amazon Ion](https://amazon-ion.github.io/ion-docs/index.html) data format or Amazon Ion literals.

## PartiQL in Amazon Redshift


To run PartiQL queries in Amazon Redshift, you can use the following methods:
+  The Amazon Web Services Management Console 
+  SQL Workbench/J 
+  The Amazon CLI 
+  Amazon Redshift Data APIs for PartiQL 

# Loading semi-structured data into Amazon Redshift
Loading semi-structured data

Use the SUPER data type to parse and query hierarchical and generic data in Amazon Redshift. Amazon Redshift introduces the [JSON\$1PARSE function](JSON_PARSE.md) to parse data in JSON format and convert it into the SUPER representation. Amazon Redshift also supports loading SUPER columns using the COPY command. The supported file formats are JSON, Avro, text, comma-separated value (CSV) format, Parquet, and ORC.

You can use JSON\$1PARSE to insert and update JSON data into SUPER columns, or use COPY to load JSON data into Amazon Redshift from outside sources such as from Amazon S3 buckets.

**Topics**
+ [

# Using JSON\$1PARSE to insert data into SUPER columns
](parse_json.md)
+ [

# Using COPY to load data into SUPER columns
](copy_json.md)

# Using JSON\$1PARSE to insert data into SUPER columns
Using JSON\$1PARSE to insert JSON data

You can insert or update JSON data into a SUPER column using the [JSON\$1PARSE function](JSON_PARSE.md). The function parses data in JSON format and converts it into the SUPER data type, which you can use in INSERT or UPDATE statements. 

The following example inserts JSON data into a SUPER column. If the JSON\$1PARSE function is missing in the query, Amazon Redshift treats the value as a single string instead of a JSON-formatted string that must be parsed.

```
--Drop the table if it exists.
DROP TABLE IF EXISTS test_json;

--Create the table.
CREATE TABLE test_json (all_data SUPER);

--Populate the table.
INSERT INTO test_json VALUES (JSON_PARSE('
{
    "name": {
        "first_name": "Jake",
        "last_name": "Smith"
    },
    "age": 30,
    "hobby": "Biking"
}'
) );

SELECT * FROM test_json;

 all_data  
---------
{"name":{"first_name":"Jake","last_name":"Smith"},"age":30,"hobby":"Biking"}
```

# Using COPY to load data into SUPER columns
Using COPY to load JSON data

In the following sections, you can learn about different ways to use the COPY command to load JSON data into Amazon Redshift. For information about the data format parameters that Amazon Redshift uses to parse JSON in COPY commands, read the [JSON format for COPY](copy-parameters-data-format.md#copy-json) parameter description in [Data format parameters](copy-parameters-data-format.md).

**Topics**
+ [

## Copying data from JSON and Avro
](#copy_json-from-JSON)
+ [

## Copying data from text and CSV
](#copy_json-from-text-csv)
+ [

## Copying data from columnar-format Parquet and ORC
](#copy_json-from-parquet-orc)

## Copying data from JSON and Avro


Amazon Redshift provides the following methods to ingest a JSON document using COPY, even with a JSON structure that is fully or partially unknown: 
+ Store the data deriving from a JSON document into a single SUPER data column using the `noshred` option. This method is useful when the schema isn't known or is expected to change. Thus, this method makes it easier to store the entire tuple in a single SUPER column.
+ Shred the JSON document into multiple Amazon Redshift columns using the `auto` or `jsonpaths` option. Attributes can be Amazon Redshift scalars or SUPER values. 

You can use these options with the JSON or Avro formats. For more information on JSON options such as `noshred`, `auto`, and `jsonpaths`, see [JSON format for COPY](copy-parameters-data-format.md#copy-json).

The maximum size for a JSON object in Amazon Redshift is 4 MB, which applies before any shredding or parsing.

### Method 1: Copying a JSON document into a single SUPER data column using `noshred`


You can copy entire JSON documents into single SUPER data columns using the `noshred` option in the COPY command. Consider the following example:

1. Create a table with a single SUPER data column.

   ```
   CREATE TABLE region_nations_noshred (rdata SUPER);
   ```

1. Copy the data from Amazon S3 into the single SUPER data column. To ingest the JSON source data into a single SUPER data column, specify the `noshred` option in the FORMAT JSON clause.

   ```
   COPY region_nations_noshred FROM 's3://redshift-downloads/semistructured/tpch-nested/data/json/region_nation'
   REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
   FORMAT JSON 'noshred';
   ```

   After COPY has successfully ingested the JSON, your table has a `rdata` SUPER data column that contains the data of the entire JSON object. The ingested data maintains all the properties of the JSON hierarchy. However, the leaves are converted to Amazon Redshift scalar types for efficient query processing.

1. Use the following query to retrieve the original JSON string.

   ```
   SELECT rdata FROM region_nations_noshred;
   ```

   When Amazon Redshift generates a SUPER data column, it becomes accessible using JDBC as a string through JSON serialization. For more information, see [Serializing complex nested JSON](serializing-complex-JSON.md).

### Method 2: Copying a JSON document into multiple SUPER data columns


You can shred a JSON document into multiple columns that can be either SUPER data columns or Amazon Redshift scalar types. Amazon Redshift spreads different portions of the JSON object to different columns. Consider the following example:

1. Create a table with multiple columns to hold the shredded JSON.

   ```
   CREATE TABLE region_nations
   (
    r_regionkey smallint
    ,r_name varchar
    ,r_comment varchar
    ,r_nations super
    );
   ```

1. To copy JSON into the `region_nations` table, specify the AUTO option in the FORMAT JSON clause to split the JSON value across multiple columns. COPY matches the top-level JSON attributes with column names and allows nested values to be ingested as SUPER values, such as JSON arrays and objects. 

   ```
   COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/json/region_nation'
   REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
   FORMAT JSON 'auto';
   ```

   When the JSON attribute names are in mixed upper and lower cases, specify the `auto ignorecase` option in the FORMAT JSON clause. For more information about the COPY command, see [Load from JSON data using the 'auto ignorecase' option](r_COPY_command_examples.md#copy-from-json-examples-using-auto-ignorecase).

In some cases, there is a mismatch between column names and JSON attributes or the attribute to load is nested more than a level deep. If so, use a `jsonpaths` file to manually map JSON attributes to Amazon Redshift columns. Consider the following example: 

1. Create a table with multiple columns to hold the shredded JSON.

   ```
   CREATE TABLE region_nations
   (
    r_regionkey smallint
    ,r_name varchar
    ,r_comment varchar
    ,r_nations super
    );
   ```

1. In this case, the column names don't match the JSON attributes. You can create a `jsonpaths` file that maps the paths of attributes to the table columns by their position in the `jsonpaths` array, like the following:

   ```
   {"jsonpaths": [
          "$.r_regionkey",
          "$.r_name",
          "$.r_comment",
          "$.r_nations
       ]
   }
   ```

1. Use the location of the `jsonpaths` file as the argument to the FORMAT JSON option in COPY.

   ```
   COPY nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/json/region_nation'
   REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
   FORMAT JSON 's3://redshift-downloads/semistructured/tpch-nested/data/jsonpaths/nations_jsonpaths.json';
   ```

1. Use the following query to access the table that shows data spread to multiple columns. The SUPER data columns are printed using the JSON format.

   ```
   SELECT r_regionkey,r_name,r_comment,r_nations[0].n_nationkey FROM region_nations ORDER BY 1,2,3 LIMIT 1;
   ```

`jsonpaths` files map fields in the JSON document to table columns. You can extract additional columns, such as distribution and sort keys, while still loading the complete document as a SUPER column. The following query loads the complete document to the `nations` column. The `name` column is the sort key and the `regionkey` column is the distribution key. Consider the following example:

1. Create a table with multiple columns to hold the shredded JSON.

   ```
   CREATE TABLE nations_sorted (
       regionkey smallint,
       name varchar,
       nations super
   ) DISTKEY(regionkey) SORTKEY(name);
   ```

1. Map the root jsonpath "\$1" to the root of the document as follows:

   ```
   {"jsonpaths": [
          "$.r_regionkey",
          "$.r_name",
          "$"
       ]
   }
   ```

1. Use the location of the `jsonpaths` file as the argument to the FORMAT JSON option in COPY.

   ```
   COPY nations_sorted FROM 's3://redshift-downloads/semistructured/tpch-nested/data/json/region_nation'
   REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
   FORMAT JSON 's3://redshift-downloads/semistructured/tpch-nested/data/jsonpaths/nations_sorted_jsonpaths.json';
   ```

For more information on using `jsonpaths`, see [COPY from JSON format](copy-usage_notes-copy-from-json.md). 

## Copying data from text and CSV


Amazon Redshift represents SUPER columns in text and CSV formats as serialized JSON. Valid JSON formatting is required for SUPER columns to load with the correct type information. Unquote objects, arrays, numbers, booleans, and null values. Wrap string values in double quotes. SUPER columns use standard escaping rules for text and CSV formats. 

 When copying from CSV, delimiters are escaped according to the CSV standard. Consider the following example: 

```
CREATE TABLE region_nations
(
 r_regionkey smallint
 ,r_name varchar
 ,r_comment varchar
 ,r_nations super
 );
 
 COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/csv/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT CSV;
```

When copying from text, if the chosen delimiter might also appear in a SUPER field, use the ESCAPE option during COPY and UNLOAD. Consider the following example:

```
CREATE TABLE region_nations
(
 r_regionkey smallint
 ,r_name varchar
 ,r_comment varchar
 ,r_nations super
 );
 
COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/text/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
DELIMITER ','
ESCAPE;
```

## Copying data from columnar-format Parquet and ORC


If your semi-structured or nested data is already available in either Apache Parquet or Apache ORC format, you can use the COPY command to ingest data into Amazon Redshift. 

The Amazon Redshift table structure should match the number of columns and the column data types of the Parquet or ORC files. By specifying SERIALIZETOJSON in the COPY command, you can load any column type in the file that aligns with a SUPER column in the table as SUPER. This includes structure and array types.

The following example uses a Parquet format:

```
COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/parquet/region_nation'
REGION 'us-east-1' IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT PARQUET SERIALIZETOJSON;
```

The following example uses an ORC format.

```
COPY region_nations FROM 's3://redshift-downloads/semistructured/tpch-nested/data/orc/region_nation'
IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3'
FORMAT ORC SERIALIZETOJSON;
```

When the attributes of the date or time data types are in ORC, Amazon Redshift converts them to varchar upon encoding them in SUPER.

# Unloading semi-structured data
Unloading semi-structured data

With Amazon Redshift, you can export semi-structured data from your Amazon Redshift cluster to Amazon S3 in a variety of formats, including text, Apache Parquet, Apache ORC, and Avro. The following sections will guide you through the process of configuring and executing unload operations for your semi-structured data in Amazon Redshift.

------
#### [ CSV or text formats ]

You can unload tables with SUPER data columns to Amazon S3 in a comma-separated value (CSV) or text format. Using a combination of navigation and unnest clauses, Amazon Redshift unloads hierarchical data in SUPER data format to Amazon S3 in CSV or text formats. Subsequently, you can create external tables against unloaded data and query them using Redshift Spectrum. For information on using UNLOAD and the required IAM permissions, see [UNLOAD](r_UNLOAD.md). 

The following example unloads all of the data from an Amazon Redshift table into an Amazon S3 bucket.

```
UNLOAD ('SELECT * FROM <redshift_table>')
TO '<S3_bucket>'
IAM_ROLE '<iam_role>'
DELIMITER AS '|'
GZIP
ALLOWOVERWRITE;
```

Unlike other data types where a user-defined string represents a null value, Amazon Redshift exports the SUPER data columns using the JSON format and represents it as *null* as determined by the JSON format. As a result, SUPER data columns ignore the NULL [AS] option used in UNLOAD commands.

------
#### [ Parquet format ]

You can unload tables with SUPER data columns to Amazon S3 in the Parquet format. Amazon Redshift represents SUPER columns in Parquet as the JSON data type. This enables semi-structured data to be represented in Parquet. You can query these columns using Redshift Spectrum or ingest them back to Amazon Redshift using the COPY command. For information on using UNLOAD and the required IAM permissions, see [UNLOAD](r_UNLOAD.md). 

The following example unloads all of the data from an Amazon Redshift table into an Amazon S3 bucket in the Parquet format.

```
UNLOAD ('SELECT * FROM <Amazon Redshift_table>')
TO '<S3_bucket>'
IAM_ROLE '<iam_role>'
FORMAT PARQUET;
```

------

# Querying semi-structured data
Querying semi-structured data

With Amazon Redshift, you can query and analyze semi-structured data, such as JSON, Avro, or Ion, alongside your structured data. Semi-structured data refers to data that has a flexible schema, allowing for hierarchical or nested structures. The following sections demonstrate querying semi-structured data using Amazon Redshift's support for open data formats, allowing you to unlock valuable information from complex data structures.

Amazon Redshift uses the PartiQL language to offer SQL-compatible access to relational, semi-structured, and nested data. 

PartiQL operates with dynamic types. This approach enables intuitive filtering, joining, and aggregation on the combination of structured, semi-structured, and nested datasets. The PartiQL syntax uses dotted notation and array subscript for path navigation when accessing nested data. It also enables the FROM clause items to iterate over arrays and use for unnest operations. Following, you can find descriptions of the different query patterns that combine the use of the SUPER data type with path and array navigation, unnesting, unpivoting, and joins. For more information on PartiQL, see [PartiQL – an SQL-compatible query language for Amazon Redshift](super-partiql.md).

## Navigation
Navigation

Amazon Redshift uses PartiQL to enable navigation into arrays and structures using the [...] bracket and dot notation respectively. Furthermore, you can mix navigation into structures using the dot notation and arrays using the bracket notation. For example, the following statement selects only the third element in an array nested one level deep in a SUPER object:

```
SELECT super_object.array[2];
         
 array
---------------
 third_element
```

You can use the dot and bracket notation when doing data operations such as filtering, joining, and aggregating. You can use these notations anywhere in a query where there are normally column references. For example, the following statement selects the number of events with the type `UPDATED`. 

```
SELECT COUNT(*)
FROM test_json
WHERE all_data.data.pnr.events[0]."eventType" = 'UPDATED';
         
 eventType | count
-----------+-------
 "UPDATED" | 1
```

For more in-depth examples of using PartiQL navigation, see [Examples of using semi-structured data in Amazon Redshift](super-examples.md).

## Unnesting queries
Unnesting queries

To unnest queries, Amazon Redshift provides two ways to iterate over SUPER arrays: PartiQL syntax and the UNNEST operation in the FROM clause. Both unnesting methods result in the same output. For information on the UNNEST operation, see [FROM clause](r_FROM_clause30.md). For examples of using the UNNEST operation, see [UNNEST examples](r_FROM_clause-unnest-examples.md). 

Amazon Redshift can navigate SUPER arrays using PartiQL syntax in the FROM clause of a query. Using the previous example, the following example iterates over the attribute values for `c_orders`.

```
SELECT orders.*, o FROM customer_orders orders, orders.c_orders o;
```

 The PartiQL syntax of unnesting using the FROM clause item `x (AS) y` means that `y` iterates over each (SUPER) value in (SUPER) array expression x. In this case, `x` is a SUPER expression and `y` is an alias for `x`.

The left operand can also use the dot and bracket notation for regular navigation. In the following example, `customer_orders_lineitem c` is the iteration over the `customer_order_lineitem` base table and `c.c_orders o` is the iteration over the `c.c_orders` array. To iterate over the `o_lineitems` attribute, which is an array within an array, you can add multiple clauses, like so:

```
SELECT c.*, o, l FROM customer_orders_lineitem c, c.c_orders o, o.o_lineitems l;
```

Amazon Redshift also supports an array index when iterating over the array using the AT keyword. The clause `x AS y AT z` iterates over array `x` and generates the field `z,` which is the array index. The following example shows how an array index works.

```
SELECT c_name,
       orders.o_orderkey AS orderkey,
       index AS orderkey_index
FROM customer_orders_lineitem c, c.c_orders AS orders AT index 
ORDER BY orderkey_index;

c_name             | orderkey | orderkey_index
-------------------+----------+----------------
Customer#000008251 | 3020007  |        0
Customer#000009452 | 4043971  |        0
  (2 rows)
```

The following is an example of iterating over a scalar array.

```
CREATE TABLE bar AS SELECT json_parse('{"scalar_array": [1, 2.3, 45000000]}') AS data;

SELECT element, index FROM bar AS b, b.data.scalar_array AS element AT index;

 index | element
-------+----------
     0 | 1
     1 | 2.3
     2 | 45000000
(3 rows)
```

The following example iterates over an array of multiple levels. The example uses multiple unnest clauses to iterate into the innermost arrays. The `f.multi_level_array` AS array iterates over `multi_level_array`. The array AS element is the iteration over the arrays within `multi_level_array`.

```
CREATE TABLE foo AS SELECT json_parse('[[1.1, 1.2], [2.1, 2.2], [3.1, 3.2]]') AS multi_level_array;

SELECT array, element FROM foo AS f, f.multi_level_array AS array, array AS element;

 element | array
---------+---------
 1.1     | [1.1,1.2]
 1.2     | [1.1,1.2]
 2.1     | [2.1,2.2] 
 2.2     | [2.1,2.2]
 3.1     | [3.1,3.2] 
 3.2     | [3.1,3.2] 
(6 rows)
```

For more information about the FROM clause, see [FROM clause](r_FROM_clause30.md). For more examples of unnesting SUPER queries, see [Examples of using semi-structured data in Amazon Redshift](super-examples.md).

## Object unpivoting
Object unpivoting

To perform object unpivoting, Amazon Redshift uses the PartiQL syntax to iterate over SUPER objects. It does this using the FROM clause of a query along with the UNPIVOT keyword. In the following example, the expression is the `c.c_orders[0]` object. The example query iterates over each attribute returned by the object.

```
SELECT attr as attribute_name, json_typeof(val) as value_type 
FROM customer_orders_lineitem c, UNPIVOT c.c_orders[0] AS val AT attr 
WHERE c_custkey = 9451;

 attribute_name  | value_type
-----------------+------------
 o_orderstatus   | string
 o_clerk         | string
 o_lineitems     | array
 o_orderdate     | string
 o_shippriority  | number
 o_totalprice    | number
 o_orderkey      | number
 o_comment       | string
 o_orderpriority | string
(9 rows)
```

As with unnesting, the unpivoting syntax is also an extension of the FROM clause. The difference is that the unpivoting syntax uses the UNPIVOT keyword to indicate that it's iterating over an object instead of an array. It uses the AS `value_alias` for iteration over all the values inside an object, and uses the AT `attribute_alias` for iterating over all the attributes. Consider the following syntax:

```
UNPIVOT expression AS value_alias [ AT attribute_alias ]
```

Amazon Redshift supports using object unpivoting and array unnesting in a single FROM clause as follows:

```
SELECT attr as attribute_name, val as object_value
FROM customer_orders_lineitem c, c.c_orders AS o, UNPIVOT o AS val AT attr 
WHERE c_custkey = 9451;
```

When you use object unpivoting, Amazon Redshift doesn't support correlated unpivoting. Specifically, suppose that you have a case where there are multiple examples of unpivoting in different query levels and the inner unpivoting references the outer one. Amazon Redshift doesn't support this type of multiple unpivoting.

For more information about the FROM clause, see [FROM clause](r_FROM_clause30.md). For examples of using pivoting with the SUPER type, see [Examples of using semi-structured data in Amazon Redshift](super-examples.md).

## Dynamic typing
Dynamic typing

Dynamic typing doesn't require explicit casting of data that is extracted from the dot and bracket paths. Amazon Redshift uses dynamic typing to process schemaless SUPER data without the need to declare the data types before you use them in your query. Dynamic typing uses the results of navigating into SUPER data columns without having to explicitly cast them into Amazon Redshift types. Dynamic typing is most useful in joins and GROUP BY clauses. The following example uses a SELECT statement that requires no explicit casting of the dot and bracket expressions to the usual Amazon Redshift types. For information about type compatibility and conversion, see [Type compatibility and conversion](c_Supported_data_types.md#r_Type_conversion).

Consider the following example, which looks for rows where the status of an order is `shipped`:

```
SELECT c_orders[0].o_orderkey
FROM customer_orders_lineitem
WHERE c_orders[0].o_orderstatus = 'shipped';
```

The equality sign in this sample query evaluates to `true` when the value of c\$1orders[0].o\$1orderstatus is the string ‘shipped’. In all other cases, the equality sign evaluates to `false`, including the cases where the arguments of the equality are different types. For example, if the order status is an integer, its row won't be selected.

### Dynamic and static typing


Without using dynamic typing, you can't determine whether c\$1orders[0].o\$1orderstatus is a string, an integer, or a structure. You can only determine that c\$1orders[0].o\$1orderstatus is a SUPER data type, which can be an Amazon Redshift scalar, an array, or a structure. The static type of c\$1orders[0].o\$1orderstatus is a SUPER data type. Conventionally, a type is implicitly a static type in SQL.

Amazon Redshift uses dynamic typing to the processing of schemaless data. When the query evaluates the data, c\$1orders[0].o\$1orderstatus turns out to be a specific type. For example, evaluating c\$1orders[0].o\$1orderstatus on the first record of customer\$1orders\$1lineitem may result into an integer. Evaluating on the second record may result into a string. These are the dynamic types of the expression.

When using an SQL operator or function with dot and bracket expressions that have dynamic types, Amazon Redshift produces results similar to using standard SQL operator or function with the respective static types. In this example, when the dynamic type of the path expression is a string, the comparison with the string ‘P’ is meaningful. Whenever the dynamic type of c\$1orders[0].o\$1orderstatus is any other data type except being a string, the equality returns false. Other functions return null when mistyped arguments are used.

The following example writes the previous query with static typing:

```
SELECT c_custkey
FROM customer_orders_lineitem
WHERE CASE WHEN JSON_TYPEOF(c_orders[0].o_orderstatus) = 'string'
           THEN c_orders[0].o_orderstatus::VARCHAR = 'P'
           ELSE FALSE END;
```

Note the following distinction between equality predicates and comparison predicates. In the previous example, if you replace the equality predicate with a less-than-or-equal predicate, the semantics produce null instead of false.

```
SELECT c_orders[0]. o_orderkey
FROM customer_orders_lineitem
WHERE c_orders[0].o_orderstatus <= 'P';
```

In this example, if c\$1orders[0].o\$1orderstatus is a string, Amazon Redshift returns true if it is alphabetically equal to or smaller than ‘P’. Amazon Redshift returns false if it is alphabetically larger than 'P'. However, if c\$1orders[0].o\$1orderstatus is not a string, Amazon Redshift returns null since Amazon Redshift can't compare values of different types, as shown in the following query:

```
SELECT c_custkey
FROM customer_orders_lineitem
WHERE CASE WHEN JSON_TYPEOF(c_orders[0].o_orderstatus) = 'string'
           THEN c_orders[0].o_orderstatus::VARCHAR <= 'P'
           ELSE NULL END;
```

Dynamic typing doesn't exclude from comparisons of types that are minimally comparable. For example, you can convert both CHAR and VARCHAR Amazon Redshift scalar types to SUPER. They are comparable as strings, including ignoring trailing white-space characters similar to Amazon Redshift CHAR and VARCHAR types. Similarly, integers, decimals, and floating-point values are comparable as SUPER values. Specifically for decimal columns, each value can also have a different scale. Amazon Redshift still considers them as dynamic types.

Amazon Redshift also supports equality on objects and arrays that are evaluated as deep equal, such as evaluating deep into objects or arrays and comparing all attributes. Use deep equal with caution, because the process of performing deep equal can be time-consuming.

### Using dynamic typing for joins


For joins, dynamic typing automatically matches values with different dynamic types without performing a long CASE WHEN analysis to find out what data types may appear. For example, assume that your organization changed the format that it was using for part keys over time.

The initial integer part keys issued are replaced by string part keys, such as ‘A55’, and later replaced again by array part keys, such as [‘X’, 10] combining a string and a number. Amazon Redshift doesn't have to perform a lengthy case analysis about part keys and can use joins, as shown in the following example.

```
SELECT c.c_name
    ,l.l_extendedprice
    ,l.l_discount
FROM customer_orders_lineitem c
    ,c.c_orders o
    ,o.o_lineitems l
    ,supplier_partsupp s
    ,s.s_partsupps ps
WHERE l.l_partkey = ps.ps_partkey
AND c.c_nationkey = s.s_nationkey
ORDER BY c.c_name;
```

The following example shows how complex and inefficient the same query can be without using dynamic typing:

```
SELECT c.c_name
    ,l.l_extendedprice
    ,l.l_discount
FROM customer_orders_lineitem c
    ,c.c_orders o
    ,o.o_lineitems l
    ,supplier_partsupp s
    ,s.s_partsupps ps
WHERE CASE WHEN IS_INTEGER(l.l_partkey) AND IS_INTEGER(ps.ps_partkey)
           THEN l.l_partkey::integer = ps.ps_partkey::integer
           WHEN IS_VARCHAR(l.l_partkey) AND IS_VARCHAR(ps.ps_partkey)
           THEN l.l_partkey::varchar = ps.ps_partkey::varchar
           WHEN IS_ARRAY(l.l_partkey) AND IS_ARRAY(ps.ps_partkey)
                AND IS_VARCHAR(l.l_partkey[0]) AND IS_VARCHAR(ps.ps_partkey[0])
                AND IS_INTEGER(l.l_partkey[1]) AND IS_INTEGER(ps.ps_partkey[1])
           THEN l.l_partkey[0]::varchar = ps.ps_partkey[0]::varchar
                AND l.l_partkey[1]::integer = ps.ps_partkey[1]::integer
           ELSE FALSE END
AND c.c_nationkey = s.s_nationkey
ORDER BY c.c_name;
```

## Case-insensitive querying
Case-insensitive querying

You can perform case-insensitive string comparisons on SUPER data using the COLLATE function or by setting the collation at the column or database level. For more information about setting collation at table creation, see [CREATE TABLE](r_CREATE_TABLE_NEW.md). For information about collation behavior with SUPER data operators and functions, see [Collation behavior](operators-functions.md#collation-behavior).

The following example uses the COLLATE function on string values extracted from SUPER data.

```
CREATE TABLE events (data SUPER);
INSERT INTO events VALUES (JSON_PARSE('{"status": "Active", "name": "Event1"}'));
INSERT INTO events VALUES (JSON_PARSE('{"status": "ACTIVE", "name": "Event2"}'));
INSERT INTO events VALUES (JSON_PARSE('{"status": "active", "name": "Event3"}'));

SELECT data.name FROM events 
WHERE COLLATE(data.status::VARCHAR, 'case_insensitive') = 'active';

 name
----------
 "Event1"
 "Event2"
 "Event3"
(3 rows)
```

You can also define a SUPER column with case-insensitive collation at table creation. In this case, all string comparisons on the column are case-insensitive.

```
CREATE TABLE events_ci (data SUPER COLLATE CASE_INSENSITIVE);
INSERT INTO events_ci VALUES (JSON_PARSE('{"status": "Active"}'));
INSERT INTO events_ci VALUES (JSON_PARSE('{"status": "ACTIVE"}'));

SELECT * FROM events_ci WHERE data.status::VARCHAR = 'active';

 data
-----------------------
 {"status":"Active"}
 {"status":"ACTIVE"}
(2 rows)
```

## Lax semantics
Lax semantics

By default, navigation operations on SUPER values return null instead of returning an error out when the navigation is invalid. Object navigation is invalid if the SUPER value is not an object or if the SUPER value is an object but doesn't contain the attribute name used in the query. For example, the following query accesses an invalid attribute name in the SUPER data column cdata:

```
SELECT c.c_orders.something FROM customer_orders_lineitem c;
```

Array navigation returns null if the SUPER value is not an array or the array index is out of bounds. The following query returns null because c\$1orders[1][1] is out of bounds. 

```
SELECT c.c_orders[1][1] FROM customer_orders_lineitem c;
```

Lax semantics is especially useful when using dynamic typing to cast a SUPER value. Casting a SUPER value to the wrong type returns null instead of an error if the cast is invalid. For example, the following query returns null because it can't cast the string value 'Good' of the object attribute o\$1orderstatus to INTEGER. Amazon Redshift returns an error for a VARCHAR to INTEGER cast but not for a SUPER cast.

```
SELECT c.c_orders.o_orderstatus::integer FROM customer_orders_lineitem c;
```

## Order by
Order by

Amazon Redshift doesn't define SUPER comparisons among values with different dynamic types. A SUPER value that is a string is neither smaller nor larger than a SUPER value that is a number. To use ORDER BY clauses with SUPER columns, Amazon Redshift defines a total ordering among different types to be observed when Amazon Redshift ranks SUPER values using ORDER BY clauses. The order among dynamic types is boolean, number, string, array, object.

For an example of using GROUP BY and ORDER BY in a SUPER query, see [Filtering semi-structured data](super-examples.md#super-examples-filter).

# Operators and functions
Operators and functions

With Amazon Redshift, you can perform advanced analytics on large datasets using SUPER data using operators and functions. Operators and functions for SUPER data are SQL constructs that enable complex analysis and manipulation of semi-structured data stored in Amazon Redshift tables. 

The following sections will cover the syntax, examples, and best practices for using operators and functions for SUPER data in Amazon Redshift to unlock the full potential of your semi-structured data.

## Arithmetic operators
Arithmetic operators

SUPER values support all basic arithmetic operators \$1, -, \$1, /, % using dynamic typing. The resultant type of the operation remains as SUPER. For all operators, except for the binary operator \$1, the input operands must be numbers. Otherwise, Amazon Redshift returns null. The distinction between decimals and floating-point values is retained when Amazon Redshift runs these operators and the dynamic type doesn't change. However, decimal scale changes when you use multiplications and divisions. Arithmetic overflows still cause query errors, they aren't changed to null. Binary operator \$1 performs addition if the inputs are numbers or concatenation if the inputs are string. If one operand is a string and the other operand is a number, the result is null. Unary prefix operators \$1 and - returns null if the SUPER value is not a number as shown in the following example:

```
SELECT (c_orders[0]. o_orderkey + 0.5) * c_orders[0]. o_orderkey / 10 AS math FROM customer_orders_lineitem;
            math
----------------------------
 1757958232200.1500
(1 row)
```

Dynamic typing allows decimal values in SUPER to have different scales. Amazon Redshift treats decimal values as if they are different static types and allows all mathematical operations. Amazon Redshift computes the resulting scale dynamically based on the scales of the operands. If one of the operands is a floating-point number, then Amazon Redshift promotes the other operand to a floating-point number and generates the result as a floating-point number.

## Arithmetic functions
Arithmetic functions

Amazon Redshift supports the following arithmetic functions for SUPER columns. They return null if the input isn't a number:
+ FLOOR. For more information, see [FLOOR function](r_FLOOR.md).
+ CEIL and CEILING. For more information, see [CEILING (or CEIL) function](r_CEILING_FLOOR.md).
+ ROUND. For more information, see [ROUND function](r_ROUND.md).
+ TRUNC. For more information, see [TRUNC function](r_TRUNC.md).
+ ABS. For more information, see [ABS function](r_ABS.md).

The following example uses arithmetic functions to query data:

```
SELECT x, FLOOR(x), CEIL(x), ROUND(x)
FROM (
    SELECT (c_orders[0]. o_orderkey + 0.5) * c_orders[0].o_orderkey / 10 AS x
    FROM customer_orders_lineitem
    );

         x          |     floor     |     ceil      |     round
--------------------+---------------+---------------+---------------
 1389636795898.0500  | 1389636795898  | 1389636795899  | 1389636795898
```

The ABS function retains the scale of the input decimal while FLOOR, CEIL. The ROUND eliminates the scale of the input decimal.

## Array functions
Array functions

Amazon Redshift supports the following array composition and utility functions:
+ ARRAY. For more information, see [ARRAY function](r_array.md).
+ ARRAY\$1CONCAT. For more information, see [ARRAY\$1CONCAT function](r_array_concat.md).
+ ARRAY\$1CONTAINS. For more information, see [ARRAY\$1CONTAINS function](array_contains.md).
+ ARRAY\$1DISTINCT. For more information, see [ARRAY\$1DISTINCT function](array_distinct.md).
+ ARRAY\$1EXCEPT. For more information, see [ARRAY\$1EXCEPT function](array_except.md).
+ ARRAY\$1FLATTEN. For more information, see [ARRAY\$1FLATTEN function](array_flatten.md).
+ ARRAY\$1INTERSECTION. For more information, see [ARRAY\$1INTERSECTION function](array_intersection.md).
+ ARRAY\$1POSITION. For more information, see [ARRAY\$1POSITION function](array_position.md).
+ ARRAY\$1POSITIONS. For more information, see [ARRAY\$1POSITIONS function](array_positions.md).
+ ARRAY\$1SORT. For more information, see [ARRAY\$1SORT function](array_sort.md).
+ ARRAY\$1UNION. For more information, see [ARRAY\$1UNION function](array_union.md).
+ ARRAYS\$1OVERLAP. For more information, see [ARRAYS\$1OVERLAP function](arrays_overlap.md).
+ GET\$1ARRAY\$1LENGTH. For more information, see [GET\$1ARRAY\$1LENGTH function](get_array_length.md).
+ SPLIT\$1TO\$1ARRAY. For more information, see [SPLIT\$1TO\$1ARRAY function](split_to_array.md).
+ SUBARRAY. For more information, see [SUBARRAY function](r_subarray.md).

You can construct SUPER arrays from values in Amazon Redshift data types using the ARRAY function, including other SUPER values. The following example uses the variadic function ARRAY: 

```
SELECT ARRAY(1, c.c_custkey, NULL, c.c_name, 'abc') FROM customer_orders_lineitem c; 
                               array
 -------------------------------------------------------
[1,8401,null,""Customer#000008401"",""abc""]
[1,9452,null,""Customer#000009452"",""abc""]
[1,9451,null,""Customer#000009451"",""abc""]
[1,8251,null,""Customer#000008251"",""abc""]
[1,5851,null,""Customer#000005851"",""abc""] 
(5 rows)
```

The following example uses array concatenation with the ARRAY\$1CONCAT function:

```
SELECT ARRAY_CONCAT(JSON_PARSE('[10001,10002]'),JSON_PARSE('[10003,10004]'));

             array_concat
------------------------------------
 [10001,10002,10003,10004]
(1 row)
```

The following example uses array manipulation with the SUBARRAY function which returns a subset of the input array.

```
SELECT SUBARRAY(ARRAY('a', 'b', 'c', 'd', 'e', 'f'), 2, 3);

   subarray
---------------
 ["c","d","e"]
(1 row))
```

The following example merges multiple levels of arrays into a single array using ARRAY\$1FLATTEN:

```
SELECT x, ARRAY_FLATTEN(x) FROM (SELECT ARRAY(1, ARRAY(2, ARRAY(3, ARRAY()))) AS x);

     x           | array_flatten
 ----------------+---------------
 [1,[2,[3,[]]]]  | [1,2,3]
(1 row)
```

Array functions ARRAY\$1CONCAT and ARRAY\$1FLATTEN use dynamic typing rules. They return a null instead of an error if the input isn't an array. The GET\$1ARRAY\$1LENGTH function returns the length of a SUPER array given an object or array path. 

```
SELECT c_name
FROM customer_orders_lineitem
WHERE GET_ARRAY_LENGTH(c_orders) = (
    SELECT MAX(GET_ARRAY_LENGTH(c_orders))
    FROM customer_orders_lineitem
    );
```

The following example splits a string to an array of strings using SPLIT\$1TO\$1ARRAY. The function uses a delimiter as an optional parameter. If no delimiter is absent, then the default is a comma.

```
SELECT SPLIT_TO_ARRAY('12|345|6789', '|');

   split_to_array
---------------------
 ["12","345","6789"]
(1 row)
```

## Collation behavior


With SUPER, you can set the collation of a column with [CREATE TABLE](https://docs.amazonaws.cn/redshift/latest/dg/r_CREATE_TABLE_NEW.html), take the default collation set with [CREATE DATABASE](https://docs.amazonaws.cn/redshift/latest/dg/r_CREATE_DATABASE.html), and set the collation of an expression with the [COLLATE function](https://docs.amazonaws.cn/redshift/latest/dg/r_COLLATE.html).

The collation setting applies to all comparison operators and string values stored in SUPER, whether they are string values, strings inside SUPER arrays, or values of a SUPER object. For SUPER objects, the collation behavior only applies to the values and not to the attributes. For example, a comparison of values that have the case insensitive collation of `{"attribute": "a"} = {"attribute": "A"}` returns true, while `{"attribute": "a"} = {"ATTRIBUTE": "a"}` returns false.

## Information functions
Information functions

SUPER data columns support inspection functions that return the dynamic type and other type information about the SUPER value. The most common example is the JSON\$1TYPEOF scalar function that returns a VARCHAR with values boolean, number, string, object, array, or null, depending on the dynamic type of the SUPER value. Amazon Redshift supports the following boolean functions for SUPER data columns:
+ [IS\$1ARRAY function](r_is_array.md)
+ [IS\$1BIGINT function](r_is_bigint.md)
+ [IS\$1CHAR function](r_is_char.md)
+ [IS\$1DECIMAL function](r_is_decimal.md)
+ [IS\$1FLOAT function](r_is_float.md)
+ [IS\$1INTEGER function](r_is_integer.md)
+ [IS\$1OBJECT function](r_is_object.md)
+ [IS\$1SCALAR function](r_is_scalar.md)
+ [IS\$1SMALLINT function](r_is_smallint.md)
+ [IS\$1VARCHAR function](r_is_varchar.md)

For more information on SUPER type information functions, see [SUPER type information functions](c_Type_Info_Functions.md).

## Object functions
Object functions

Following are the SQL object functions that Amazon Redshift supports to create and operate on SUPER type objects:
+ [GET\$1NUMBER\$1ATTRIBUTES function](get_number_attributes.md)
+ [LOWER\$1ATTRIBUTE\$1NAMES function](r_lower_attribute_names.md)
+ [OBJECT function](r_object_function.md)
+ [OBJECT\$1TRANSFORM function](r_object_transform_function.md)
+ [UPPER\$1ATTRIBUTE\$1NAMES function](r_upper_attribute_names.md)

For more information on object functions, see [Object functions](Object_Functions.md).

## String functions


To use string functions with string literals in the SUPER data type, you must convert the string literal to string type before applying the functions. Functions return null if the input is not a string literal.

[String functions](String_functions_header.md) now support up to 16,000,000 bytes.

The following example uses SUBSTRING to extract the preview of a string with size 5,000,000 bytes stored in a SUPER JSON object

```
CREATE TABLE customer_data (
   customer_id INT,
   profile SUPER
);

INSERT INTO customer_data VALUES (
   1,
   JSON_PARSE('{"name": "John Doe", "description": "' || REPEAT('A', 5000000) || '"}')
);

SELECT 
   customer_id,
   profile.name::VARCHAR AS name,
   SUBSTRING(profile.description::VARCHAR, 1, 50) AS description_preview
FROM customer_data;

 customer_id | name     | description_preview
-------------+----------+----------------------------------------------------
         1 | John Doe | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
```

The following example demonstrates LEFT, RIGHT, and CONCAT functions on string literals from a SUPER array:

```
CREATE TABLE documents (
   doc_id INT,
   chapters SUPER
);

INSERT INTO documents VALUES (
   1,
   JSON_PARSE('["' || REPEAT('hello', 400000) || '", "' || REPEAT('world', 600000) || '"]')
);

SELECT 
   doc_id,
   LEFT(chapters[0]::VARCHAR, 20) AS chapter1_start,
   RIGHT(chapters[1]::VARCHAR, 20) AS chapter2_end,
   LEN(CONCAT(chapters[0]::VARCHAR, chapters[1]::VARCHAR)) AS concat_size
FROM documents;

 doc_id |    chapter1_start    |     chapter2_end     | concat_size 
--------+----------------------+----------------------+-------------
      1 | hellohellohellohello | worldworldworldworld |     5000000
```

The following example stores standalone string in SUPER:

```
CREATE TABLE text_storage (
   text_id INT,
   content SUPER
);

INSERT INTO text_storage VALUES 
   (1, REPEAT('A', 8000000)),
   (2, REPEAT('B', 16000000));

SELECT 
   text_id,
   LEN(content::VARCHAR) AS content_length
FROM text_storage;

 text_id | content_length
---------+----------------
      1 |        8000000
      2 |       16000000
```

# SUPER configurations
SUPER configurations

You can configure your SUPER data for specific scenarios. The following sections provide details on choosing and applying the appropriate SUPER configurations based on your data format requirements.

**Topics**
+ [

## Lax and strict modes for SUPER
](#lax-strict-modes)
+ [

## Accessing JSON fields with uppercase and mixed-case field names or attributes
](#upper-mixed-case)
+ [

## Parsing options for SUPER
](#parsing-options-super)

## Lax and strict modes for SUPER
Lax and strict modes for SUPER

When you query SUPER data, the path expression may not match the actual SUPER data structure. If you try to access a non-existent member of an object or element of an array, Amazon Redshift returns a NULL value if your query is run in the default lax mode. If you run your query in the strict mode, Amazon Redshift returns an error. The following session parameters can be set to set the lax mode on or off.

The following example uses session parameters to enable lax mode.

```
SET navigate_super_null_on_error=ON;  --default lax mode for navigation

SET cast_super_null_on_error=ON;  --default lax mode for casting

SET parse_super_null_on_error=OFF;  --default strict mode for ingestion
```

## Accessing JSON fields with uppercase and mixed-case field names or attributes
Accessing JSON fields with uppercase and mixed-case letters

When your JSON attribute names are in uppercase or mixed-case, you must be able to navigate SUPER type structures in a case sensitive way. To do that, you can configure `enable_case_sensitive_identifier` to TRUE and wrap the uppercase and mixed-case attribute names with double quotation marks. 

The following example illustrates how to set `enable_case_sensitive_identifier` to query data.

```
SET enable_case_sensitive_identifier to TRUE;
 
-- Accessing JSON attribute names with uppercase and mixed-case names
SELECT json_table.data."ITEMS"."Name",
       json_table.data."price"
FROM
  (SELECT json_parse('{"ITEMS":{"Name":"TV"}, "price": 345}') AS data) AS json_table;

 Name | price
------+-------
 "TV" | 345
(1 row)
 
RESET enable_case_sensitive_identifier;
 
-- After resetting the above configuration, the following query accessing JSON attribute names with uppercase and mixed-case names should return null (if in lax mode).
SELECT json_table.data."ITEMS"."Name",
       json_table.data."price"
FROM
  (SELECT json_parse('{"ITEMS":{"Name":"TV"}, "price": 345}') AS data) AS json_table;

 name | price 
------+-------
      | 345
(1 row)
```

## Parsing options for SUPER
Parsing options

When you use the JSON\$1PARSE function to parse JSON strings into SUPER values, certain restrictions apply: 
+ The same attribute name cannot appear in the same object, but can appear in a nested object. The `json_parse_dedup_attributes` configuration option allows JSON\$1PARSE to keep only the last occurrence of duplicate attributes instead of returning an error. 
+ Individual string literals in SUPER object have a size limit of 16,000,000 bytes. The `json_parse_truncate_strings` configuration option allows JSON\$1PARSE() to automatically truncate strings that are longer than this limit without returning an error. This behavior affects string values only and not attribute names.

For more information about the JSON\$1PARSE function, see [JSON\$1PARSE function](JSON_PARSE.md).

The following example shows how to set the `json_parse_dedup_attributes` configuration option to the default behavior of returning an error for duplicate attributes.

```
SET json_parse_dedup_attributes=OFF;  --default behavior of returning error instead of de-duplicating attributes
```

The following example shows how to set the `json_parse_truncate_strings` configuration option for the default behavior of returning an error for strings that are longer than this limit.

```
SET json_parse_truncate_strings=OFF;  --default behavior of returning error instead of truncating strings
```

# Limitations
Limitations

With Amazon Redshift, you can work with the SUPER data type to store and query semi-structured data like JSON, Avro, or Ion. The SUPER data type limitations refer to the constraints and boundaries when using this data type in Amazon Redshift. The following sections provide details on the specific limitations of the SUPER data type, such as maximum size, nesting levels, and data types supported within semi-structured data.
+ You can't define SUPER columns as either a distribution or sort key.
+ An individual SUPER object can hold up to 16 MB of data.
+ The maximum nesting depth for arrays and structures of the SUPER data type is 1,000.
+ Any string literal stored in a single SUPER object is limited to 16,000,000 bytes.
+ Otherwise, an individual value within a SUPER object is limited to the maximum length of the corresponding Amazon Redshift type.
+ You can't perform partial update or transform operations on SUPER columns.
+ You can't use the SUPER data type and its alias in right joins or full outer joins.
+ The SUPER data type doesn't support XML as inbound or outbound serialization format. 
+ In the FROM clause of a subquery (that is correlated or not) that references a table variable for unnesting, the query can only refer to its parent table and not other tables.
+  You can’t perform unnesting or object unpivoting on SUPER columns produced by a recursive common table expression (`WITH RECURSIVE`). For more information on recursive CTEs, see [Recursive common table expressions](r_WITH_clause.md#r_WITH_clause-recursive-cte). 
+ Casting limitations

  SUPER values can be cast to and from other data types, with the following exceptions:
  + Amazon Redshift doesn't differentiate integers and decimals of scale 0.
  + If the scale isn't zero, SUPER data type has the same behavior as other Amazon Redshift data types, except that Amazon Redshift converts SUPER-related errors to null, as shown in the following example.

    ```
    SELECT 5::bool;
     bool
    -------
     True
    (1 row)
    
    SELECT 5::decimal::bool;
    ERROR:  cannot cast type numeric to boolean
    
    SELECT 5::super::bool;
     bool
    -------
     True
    (1 row)
    
    SELECT 5.0::bool;
    ERROR:  cannot cast type numeric to boolean
    
    SELECT 5.0::super::bool;
     bool
    -------
    (1 row)
    ```
  + Amazon Redshift doesn't cast the date and time types to SUPER data type. Amazon Redshift can only cast the date and time data types from SUPER data type, as shown in the following example.

    ```
    SELECT o.o_orderdate FROM customer_orders_lineitem c,c.c_orders o;
      order_date
    ------------------
     "2001-09-08"
    (1 row)
    
    
    SELECT JSON_TYPEOF(o.o_orderdate) FROM customer_orders_lineitem c,c.c_orders o;
     json_typeof
    -----------------
     string
    (1 row)
    
    
    SELECT o.o_orderdate::date FROM customer_orders_lineitem c,c.c_orders o;
     order_date
    ----------------
     2001-09-08
    (1 row)
    
    
    --date/time cannot be cast to super 
    SELECT '2019-09-09'::date::super;
    ERROR:  cannot cast type date to super
    ```
  + Cast from non-scalar values (object and array) to string returns NULL. To properly serialize these non-scalar values, don't cast them. Instead, use `json_serialize` to cast non-scalar values. The `json_serialize` function returns a varchar. Typically, you don't need to cast non-scalar values to varchar since Amazon Redshift implicitly serializes as shown in the following first example. 

    ```
    SELECT r_nations FROM region_nations WHERE r_regionkey=300;
       r_nations
    ----------------
     [1,"abc",null]
    (1 row)
    
    SELECT r_nations::varchar FROM region_nations WHERE r_regionkey=300;
     r_nations
    -------------
    (1 row)
    
    SELECT JSON_SERIALIZE(r_nations) FROM region_nations WHERE r_regionkey=300;
     json_serialize
    -----------------
     [1,"abc",null]
    (1 row)
    ```
+ Amazon Redshift doesn't support volatile functions, such as RANDOM ( ) or TIMEOFDAY ( ), in subqueries that unnest an outer table or a left-hand side (LHS) of IN functions with such subqueries.

# SUPER data type and materialized views
SUPER data type and materialized views

With Amazon Redshift, you can use materialized views to enhance the performance and flexibility of queries run against the SUPER data type. The SUPER data type lets you store a superset of columns from the base tables in a materialized view, letting you query the materialized view directly without joining the base tables. The following sections show you how to create and use materialized views with the SUPER data type in Amazon Redshift.

Amazon Redshift supports materialized views that incorporate SUPER data type columns and PartiQL queries. Materialized views can incrementally refresh, whereas Amazon Redshift only updates data that has changed in the base tables since the last refresh operation. This selective update approach makes the refresh process more efficient than full recalculations. For more information about materialized views, see [Materialized views in Amazon Redshift](materialized-view-overview.md).

## Accelerating PartiQL queries
Accelerating PartiQL queries

You can use materialized views to accelerate PartiQL queries that navigate and/or unnest hierarchical data in SUPER columns. By creating one or more materialized views to shred the SUPER values into multiple columns and utilize the columnar organization of Amazon Redshift analytical queries, you can essentially extract and normalize nested data. The level of normalization depends on how much effort you put into turning the SUPER data into conventional columnar data. 

The following topics showcase examples of breaking down, or shredding, complex data into smaller columns, as well as creating scalar columns out of shredded data to improve performance.

**Topics**
+ [

## Accelerating PartiQL queries
](#r_accelerate_mv)
+ [

# Shredding semi-structured data into SUPER columns with materialized views
](r_shred_super.md)
+ [

# Creating Amazon Redshift scalar columns out of shredded data
](r_create_scalar.md)

# Shredding semi-structured data into SUPER columns with materialized views
Shredding semi-structured data into SUPER columns with materialized views

With Amazon Redshift, you can improve query performance by shredding data into SUPER columns using materialized views. Shredding refers to the process of breaking down complex data types like semi-structured JSON or XML into smaller, flatter columns. SUPER columns are a specialized form of columnar storage optimized for rapidly scanning shredded data. 

The following sections describe the steps and considerations for shredding data into SUPER columns using materialized views in Amazon Redshift.

The following example shows a materialized view definition that shreds the nested data with the resulting columns still being the SUPER data type.

```
SELECT c.c_name, o.o_orderstatus
FROM customer_orders_lineitem c, c.c_orders o;
```

The following example shows a materialized view definition that creates conventional Amazon Redshift scalar columns from the shredded data.

```
SELECT c.c_name, c.c_orders[0].o_totalprice
FROM customer_orders_lineitem c;
```

You can create a single materialized view super\$1mv to accelerate both queries.

To answer the first query, you must materialize the attribute o\$1orderstatus. You can omit the attribute c\$1name because it doesn't involve nested navigation nor unnesting. You must also include in the materialized view the attribute c\$1custkey of customer\$1orders\$1lineitem to be able to join the base table with the materialized view.

To answer the second query, you must also materialize the attribute o\$1totalprice and the array index o\$1idx of c\$1orders. Hence, you can access the index 0 of c\$1orders.

```
CREATE MATERIALIZED VIEW super_mv distkey(c_custkey) sortkey(c_custkey) AS (
  SELECT c_custkey, o.o_orderstatus, o.o_totalprice, o_idx
  FROM customer_orders_lineitem c, c.c_orders o AT o_idx
);
```

The attributes o\$1orderstatus and o\$1totalprice of the materialized view super\$1mv are SUPER.

The materialized view super\$1mv will be refreshed incrementally upon changes to the base table customer\$1orders\$1lineitem.

```
REFRESH MATERIALIZED VIEW super_mv;
INFO: Materialized view super_mv was incrementally updated successfully.
```

To rewrite the first PartiQL query as a regular SQL query, join customer\$1orders\$1lineitem with super\$1mv as follows.

```
SELECT c.c_name, v.o_orderstatus
FROM customer_orders_lineitem c 
JOIN super_mv v ON c.c_custkey = v.c_custkey;
```

Similarly, you can rewrite the second PartiQL query. The following example uses a filter on o\$1idx = 0.

```
SELECT c.c_name, v.o_totalprice
FROM customer_orders_lineitem c 
JOIN super_mv v ON c.c_custkey = v.c_custkey
WHERE v.o_idx = 0;
```

In the CREATE MATERIALIZED VIEW command, specify c\$1custkey as distribution key and sort key for super\$1mv. Amazon Redshift performs an efficient merge join, assuming that c\$1custkey is also the distribution key and sort key of customer\$1orders\$1lineitem. If that isn’t the case, you can specify c\$1custkey as the sort key and distribution key of customer\$1orders\$1lineitem as follows.

```
ALTER TABLE customer_orders_lineitem
ALTER DISTKEY c_custkey, ALTER SORTKEY (c_custkey);
```

Use the EXPLAIN statement to verify that Amazon Redshift performs a merge join on the rewritten queries.

```
EXPLAIN
      SELECT c.c_name, v.o_orderstatus
      FROM customer_orders_lineitem c JOIN super_mv v ON c.c_custkey = v.c_custkey;
      
      QUERY PLAN                                              
      ------------------------------------------------------------------------------------------------------
      XN Merge Join DS_DIST_NONE  (cost=0.00..34701.82 rows=1470776 width=27)
      Merge Cond: ("outer".c_custkey = "inner".c_custkey)
      ->  XN Seq Scan on mv_tbl__super_mv__0 derived_table2  (cost=0.00..14999.86 rows=1499986 width=13)
      ->  XN Seq Scan on customer_orders_lineitem c  (cost=0.00..999.96 rows=99996 width=30)
      (4 rows)
```

# Creating Amazon Redshift scalar columns out of shredded data
Creating Amazon Redshift scalar columns out of shredded data

Schemaless data stored in SUPER can affect the performance of Amazon Redshift. For instance, filter predicates or join conditions as range-restricted scans can't effectively use zone maps. Users and BI tools can use materialized views as the conventional presentation of the data and increase performance of analytical queries.

The following query scans the materialized view `super_mv` and filters on `o_orderstatus`.

```
SELECT c.c_name, v.o_totalprice
FROM customer_orders_lineitem c
JOIN super_mv v ON c.c_custkey = v.c_custkey
WHERE v.o_orderstatus = 'F';
```

Inspect `stl_scan` to verify that Amazon Redshift can't effectively use zone maps on the range-restricted scan over `o_orderstatus`.

```
SELECT slice, is_rrscan FROM stl_scan
WHERE query = pg_last_query_id() AND perm_table_name LIKE '%super_mv%';

 slice | is_rrscan 
-------+-----------
     0 | f
     1 | f
     5 | f
     4 | f
     2 | f
     3 | f
(6 rows)
```

The following example adapts the materialized view `super_mv` to create scalar columns out of the shredded data. In this case, Amazon Redshift casts `o_orderstatus` from SUPER to VARCHAR. In addition, specify `o_orderstatus` as the sort key for `super_mv`.

```
CREATE MATERIALIZED VIEW super_mv distkey(c_custkey) sortkey(c_custkey, o_orderstatus) AS (
  SELECT c_custkey, o.o_orderstatus::VARCHAR AS o_orderstatus, o.o_totalprice, o_idx
  FROM customer_orders_lineitem c, c.c_orders o AT o_idx
);
```

After re-running the query, verify that Amazon Redshift can now use zone maps.

```
SELECT v.o_totalprice
FROM super_mv v
WHERE v.o_orderstatus = 'F';
```

You can verify that the range-restricted scan now uses zone maps as follows.

```
SELECT slice, is_rrscan FROM stl_scan
WHERE query = pg_last_query_id() AND perm_table_name LIKE '%super_mv%';

 slice | is_rrscan 
-------+-----------
     0 | t
     1 | t
     2 | t
     3 | t
     4 | t
     5 | t
(6 rows)
```

# Examples of using semi-structured data in Amazon Redshift
Examples

 The following examples demonstrate how to work with semi-structured data in Amazon Redshift using PartiQL syntax. You will create a sample table to load a sample set of semi-structured data, then query semi-structured data objects in a variety of use cases. 

**Note**  
We recommend that you set the `enable_case_sensitive_identifier` and `enable_case_sensitive_super_attribute` configuration options before working with the SUPER data type. For more information, see [enable\$1case\$1sensitive\$1identifier](r_enable_case_sensitive_identifier.md) and [enable\$1case\$1sensitive\$1super\$1attribute](r_enable_case_sensitive_super_attribute.md).

## Loading semi-structured data


The following statements create a sample table and load a sample JSON object into the `all_data` SUPER column.

```
DROP TABLE IF EXISTS test_json;

SET enable_case_sensitive_super_attribute TO true;
SET enable_case_sensitive_identifier TO true;

CREATE TABLE test_json (all_data SUPER);

INSERT INTO test_json VALUES (JSON_PARSE('
{
   "data":{
      "pnr":{
         "type":"pnr",
         "pnrid":"123PQRS-2024-09-20",
         "bookingIdentifier":"123PQRS",
         "version":"5",
         "triggerType":"",
         "events":[
            {
               "eventType":"UPDATED",
               "type":"PART",
               "id":"123PQRS-2024-09-20-HO-1"
            },
            {
               "eventType":"CREATED",
               "type":"ABC",
               "id":"123PQRS-2024-09-20-OT-38"
            }
         ],
         "create":{
            "pnrCreateDate":"2024-09-20T16:56:00Z",
            "officeID":"OFFCID1234",
            "officeIDCategory":"Email"
         },
         "lastModification":{
            "dateTime":"2024-09-20T17:09:00Z"
         },
         "PARTDetails":[
            {
               "path":"1",
               "TrainPARTs":[
                  {
                     "PARTID":"123PQRS-2024-09-20-HO-1",
                     "departure":{
                        "departureStation":"XYZ",
                        "departureTimeLocal":"2024-10-03T06:30:00",
                        "departureTimeGMT":"2024-10-03T10:30:00Z"
                     },
                     "arrival":{
                        "arrivalStation":"ABC",
                        "arrivalTimeLocal":"2024-10-03T08:20:00",
                        "arrivalTimeGMT":"2024-10-03T15:20:00Z"
                     },
                     "marketing":{
                        "carrierCode":"XX",
                        "TrainNumber":"100"
                     },
                     "operating":{
                        "carrierCode":"YY",
                        "TrainNumber":"100-A"
                     },
                     "status":"ON",
                     "aircraft":{
                        "code":"222"
                     },
                     "class":"WC",
                     "first":"Y",
                     "seating":[
                        
                     ]
                  }
               ]
            }
         ],
         "commuterInformation":[
            {
               "commuterID":"2",
               "commuterPNR":"123PQRS-2024-09-20-RO-2",
               "commuterTypeCode":"DOM",
               "firstName":"JOHN",
               "lastName":"MILLER"
            }
         ],
         "contactDetail":[
            {
               "emailContacts":[
                  {
                     "id":"123PQRS-2024-09-20-OT-4",
                     "contact":"JOHNMILLER@EXAMPLE.COM",
                     "purpose":[
                        "BUSINESS"
                     ],
                     "commuter":[
                        "123PQRS-2024-09-20-RO-2"
                     ],
                     "language":"EN"
                  },
                  {
                     "id":"123PQRS-2024-09-20-OT-5",
                     "contact":"HARVEYCORMIER@EXAMPLE.COM",
                     "purpose":[
                        "NOTIFICATION"
                     ],
                     "commuter":[
                        "123PQRS-2024-09-20-RO-2"
                     ],
                     "language":"EN"
                  }
               ]
            },
            {
               "phoneContacts":[
                  {
                     "id":"123PQRS-2024-09-20-OT-3",
                     "contact":"1234567890",
                     "purpose":[
                        "NOTIFICATION"
                     ],
                     "commuter":[
                        "123PQRS-2024-09-20-RO-2"
                     ],
                     "language":""
                  }
               ]
            },
            {
               "addressInfo":[
                  {
                     "id":"123PQRS-2024-09-20-OT-6",
                     "addressline":[
                        "112 PORT STREET"
                     ],
                     "provinceState":"CA",
                     "cityName":"SAN JOSE",
                     "postalCode":"12345",
                     "countryCode":"USA",
                     "purpose":[
                        "MAILING"
                     ],
                     "commuter":[
                        "123PQRS-2024-09-20-RO-2"
                     ]
                  }
               ]
            }
         ],
         "PendingService":[
            {
               "id":"123PQRS-2024-09-20-OT-26",
               "code":"MONO",
               "status":"",
               "text":"Broken Seat at Coach-No XYZ123 Seat-No 567",
               "trainCode":"WC-1",
               "TrainsArray":[
                  "123PQRS-2024-09-20-HO-1"
               ],
               "commuter":[
                  "123PQRS-2024-09-20-RO-2"
               ]
            },
            {
               "id":"123PQRS-2024-09-20-OT-27",
               "code":"OTHS",
               "status":"",
               "text":"Broken Seat at Coach-No XYZ567 Seat-No 111",
               "trainCode":"WC-1",
               "TrainsArray":[
                  "123PQRS-2024-09-20-HO-1"
               ],
               "commuter":[
                  "123PQRS-2024-09-20-RO-2"
               ]
            },
            {
               "id":"123PQRS-2024-09-20-OT-28",
               "code":"OTHS",
               "status":"",
               "text":"Broken Seat at Coach-No XYZ890 Seat-No 123",
               "trainCode":"WC-1",
               "TrainsArray":[
                  "123PQRS-2024-09-20-HO-1"
               ],
               "commuter":[
                  "123PQRS-2024-09-20-RO-2"
               ]
            },
            {
               "id":"123PQRS-2024-09-20-OT-29",
               "code":"OTHS",
               "status":"",
               "text":"Broken Seat at Coach-No XYZ111 Seat-No 333",
               "trainCode":"WC-1",
               "TrainsArray":[
                  "123PQRS-2024-09-20-HO-1"
               ],
               "commuter":[
                  "123PQRS-2024-09-20-RO-2"
               ]
            }            
         ],
         "parts": [
            {
               "partname": "prop",
               "manufacturer": "local parts co",
               "quality": 2,
               "price": 10.00
            },
            {
               "partname": "prop",
               "manufacturer": "big parts co",
               "quality": null,
               "price": 9.00
            },
            {
               "partname": "prop",
               "manufacturer": "small parts co",
               "quality": 1,
               "price": 12.00
            },
            {
               "partname": "rudder",
               "manufacturer": "local parts co",
               "quality": 1,
               "price": 2.50
            },
            {
               "partname": "rudder",
               "manufacturer": "big parts co",
               "quality": 2,
               "price": 3.75
            },
            {
               "partname": "rudder",
               "manufacturer": "small parts co",
               "quality": null,
               "price": 1.90
            },
            {
               "partname": "wing",
               "manufacturer": "local parts co",
               "quality": null,
               "price": 7.50
            },
            {
               "partname": "wing",
               "manufacturer": "big parts co",
               "quality": 1,
               "price": 15.20
            },
            {
               "partname": "wing",
               "manufacturer": "small parts co",
               "quality": null,
               "price": 11.80
            }
         ],
         "count_by_color": [
            {
               "quality": "high",
               "red": 15,
               "green": 20,
               "blue": 7
            },
            {
               "quality": "normal",
               "red": 35,
               "green": null,
               "blue": 40
            },
            {
               "quality": "low",
               "red": 10,
               "green": 23,
               "blue": null
            }
         ]
       }
   },
   "id":"abcdefgh-ijklmnop-qrstuvwxyz123",
   "mainIds":[
      {
         "ID":"pqrstuvwxyz-aabbcc123",
         "Source":"NYC"
      }
   ]
}
'));
```

## Querying nested semi-structured data


The following statement uses PartiQL’s dot notation to extract the `pnrid` field, which is nested three levels deep inside the top-level `all_data` object.

```
select all_data.data.pnr.pnrid::varchar from test_json;

 pnrid
--------------------
 123PQRS-2024-09-20
```

The following statement uses PartiQL’s bracket notation to specify and extract only the first element from the `events` array nested inside the top-level object.

```
SELECT
    all_data.data.pnr.events[0]
FROM test_json;

 events
---------------------------------
{
   "eventType":"UPDATED",
   "type":"PART",
   "id":"123PQRS-2024-09-20-HO-1"
}
```

The following statement extracts the `eventType` property of only the specified element from the `events` array.

```
SELECT
    all_data.data.pnr.events[0].eventType
FROM test_json;

 eventtype
-----------
 "UPDATED"
```

The following statements 

## Using `enable_case_sensitive_identifier` and `enable_case_sensitive_super_attribute` with semi-structured data


The following examples show how the configuration options [enable\$1case\$1sensitive\$1identifier](r_enable_case_sensitive_identifier.md) and [enable\$1case\$1sensitive\$1super\$1attribute](r_enable_case_sensitive_super_attribute.md) differ when used for querying semi-structured data. For more information on these configuration options, see [Accessing JSON fields with uppercase and mixed-case field names or attributes](super-configurations.md#upper-mixed-case).

In the following statement, resetting both configuration options to their default of false makes the query return NULL.

```
RESET enable_case_sensitive_identifier;
RESET enable_case_sensitive_super_attribute;

SELECT
    all_data.data.pnr.events[0].eventType
FROM test_json;

 eventtype
-----------
NULL
```

In following example, the sample query returns the desired result after you wrap the case sensitive attributes in double quotation marks and set `enable_case_sensitive_identifier` to true.

```
RESET enable_case_sensitive_identifier;
RESET enable_case_sensitive_super_attribute;

SELECT
    all_data.data.pnr.events[0]."eventType"
FROM test_json;
         
 eventtype
-----------
NULL

SET enable_case_sensitive_identifier TO true;
         
SELECT
    all_data.data.pnr.events[0]."eventType"
FROM test_json;
         
 eventtype
-----------
 "UPDATED"
```

In the following example, the sample query returns the desired result after you set `enable_case_sensitive_super_attribute` to true without wrapping the case sensitive attributes in double quotation marks.

```
RESET enable_case_sensitive_identifier;
RESET enable_case_sensitive_super_attribute;

SELECT
    all_data.data.pnr.events[0].eventType
FROM test_json;
         
 eventtype
-----------
NULL
         
SET enable_case_sensitive_super_attribute TO true;
         
SELECT
    all_data.data.pnr.events[0].eventType
FROM test_json;
         
 eventtype
-----------
 "UPDATED"
```

## Filtering semi-structured data


The following statement uses PartiQL syntax in the WHERE clause of a statement that counts events of the type `UPDATED` to retrieve data of a certain attribute from inside an array. You can use this syntax in any portion of the query where you would normally reference columns.

```
SELECT COUNT(*)
FROM test_json
WHERE all_data.data.pnr.events[0].eventType = 'UPDATED';

 count
------
 1
```

The following example uses PartiQL’s bracket and dot syntax in both GROUP BY and ORDER BY clauses.

```
SELECT all_data.data.pnr.events[0].eventType::varchar,
       COUNT(*)
FROM test_json
WHERE all_data.data.pnr.events[0].eventType IS NOT NULL
GROUP BY all_data.data.pnr.events[0].eventType
ORDER BY all_data.data.pnr.events[0].eventType;

 eventtype | count
-----------+-------
 "UPDATED" | 1
```

## Unnesting semi-structured data


The following statement uses PartiQL joins to unnest the `events` array. Note that this join works even when the number of indexes for the array aren’t static.

For examples of unnesting semi-structured data using UNNEST in the FROM clause, see [UNNEST examples](r_FROM_clause-unnest-examples.md).

```
SELECT
a.all_data.data.pnr.type::varchar type_info,
a.all_data.data.pnr.pnrid::varchar pnr_id ,
a.all_data.data.pnr.bookingIdentifier::varchar booking_id,
a.all_data.data.pnr.version::varchar version_info,
b.eventType::varchar event_type,
b.id::varchar event_id
FROM test_json a, 
  a.all_data.data.pnr.events b;

 type_info | pnr_id              | booking_id | version_info | event_type | event_id
-----------+---------------------+------------+--------------+------------+-------------------------
 pnr       | 123PQRS-2024-09-20  | 123PQRS    | 5            | UPDATED    | 123PQRS-2024-09-20-HO-1
 pnr       | 123PQRS-2024-09-20  | 123PQRS    | 5            | CREATED    | 123PQRS-2024-09-20-OT-38
```

## Unnesting nested arrays


The following statement uses PartiQL joins to unnest an array that’s nested inside another array. 

For examples of unnesting semi-structured data using UNNEST in the FROM clause, see [UNNEST examples](r_FROM_clause-unnest-examples.md).

```
SELECT
a.all_data.data.pnr.type::varchar type_info,
a.all_data.data.pnr.pnrid::varchar pnr_id ,
a.all_data.data.pnr.bookingIdentifier::varchar booking_id,
a.all_data.data.pnr.version::varchar version_info,
d.id::varchar email_record_id,
d.contact::varchar email_contact,
e::varchar email_purpose,
f::varchar email_commuter
FROM test_json a,
  a.all_data.data.pnr.contactDetail c,
  c."emailContacts" d,
  d.purpose e,
  d.commuter f;

 type_info | pnr_id              | booking_id | version_info | email_record_id         | email_contact             | email_purpose | email_commuter
-----------+---------------------+------------+--------------+-------------------------+---------------------------+---------------+-------------------------
 pnr       | 123PQRS-2024-09-20  | 123PQRS    | 5            | 123PQRS-2024-09-20-OT-4 | JOHNMILLER@EXAMPLE.COM    | BUSINESS      | 123PQRS-2024-09-20-RO-2
 pnr       | 123PQRS-2024-09-20  | 123PQRS    | 5            | 123PQRS-2024-09-20-OT-5 | HARVEYCORMIER@EXAMPLE.COM | NOTIFICATION  | 123PQRS-2024-09-20-RO-2
```

## Using semi-structured data in subqueries


The following statement uses a subquery in the WHERE clause to return only a subsection of the results from the previous example.

```
SELECT
a.all_data.data.pnr.type::varchar type_info,
a.all_data.data.pnr.pnrid::varchar pnr_id ,
a.all_data.data.pnr.bookingIdentifier::varchar booking_id,
a.all_data.data.pnr.version::varchar version_info,
d.id::varchar email_record_id,
d.contact::varchar email_contact
FROM test_json a,
a.all_data.data.pnr.contactDetail c,
c."emailContacts" d
WHERE (SELECT COUNT(*) FROM d.purpose e WHERE e = 'BUSINESS') > 0;

 type_info | pnr_id              | booking_id | version_info | email_record_id         | email_contact             | email_purpose | email_commuter
-----------+---------------------+------------+--------------+-------------------------+---------------------------+---------------+-------------------------
 pnr       | 123PQRS-2024-09-20  | 123PQRS    | 5            | 123PQRS-2024-09-20-OT-4 | JOHNMILLER@EXAMPLE.COM    | BUSINESS      | 123PQRS-2024-09-20-RO-2
```

## Aggregating queries using semi-structured data


The following statement uses the COUNT function to aggregate the number of elements in the `PendingService` array.

```
SELECT
a.all_data.data.pnr.type::varchar type_info,
a.all_data.data.pnr.pnrid::varchar pnr_id ,
a.all_data.data.pnr.bookingIdentifier::varchar booking_id,
a.all_data.data.pnr.version::varchar version_info,
COUNT(*) AS total_pending_service
FROM test_json a, 
  a.all_data.data.pnr.PendingService c
GROUP BY 1,2,3,4;

 type_info | pnr_id             | booking_id | version_info | total_pending_service
-----------+--------------------+------------+--------------+-----------------------
 pnr       | 123PQRS-2024-09-20 | 123PQRS    | 5            | 4
```

## Using semi-structured data in materialized views


The following statement uses the statement from the previous example to create a materialized view. The materialized view automatically refresh the number of pending services when the base table gets new data.

```
CREATE MATERIALIZED VIEW mv_total_pending_service
AUTO REFRESH YES
AS
SELECT
a.all_data.data.pnr.type::varchar type_info,
a.all_data.data.pnr.pnrid::varchar pnr_id ,
a.all_data.data.pnr.bookingIdentifier::varchar booking_id,
a.all_data.data.pnr.version::varchar version_info,
COUNT(*) AS total_pending_service
FROM test_json a,
  a.all_data.data.pnr.PendingService c
GROUP BY 1,2,3,4;
```

## Using PIVOT and UNPIVOT with semi-structured data


The following statement uses PIVOT on the `partname` column to return the average price of each part.

```
SELECT *
FROM 
(
SELECT
c.partname::varchar, c.price
FROM test_json a, 
  a.all_data.data.pnr.parts c) 
PIVOT (AVG(price) for partname IN ('prop', 'rudder', 'wing'));
            
 
 prop       | rudder             |  wing
------------+--------------------+--------
 10.33      | 2.71               |  11.50
```

In the previous example, the results are transformed into columns. The following example shows a GROUP BY query that returns the average prices in rows, rather than in columns.

```
SELECT partname, avg(price)
FROM (
SELECT
c.partname::varchar, c.price
FROM test_json a, 
  a.all_data.data.pnr.parts c)
WHERE partname IN ('prop', 'rudder', 'wing')
GROUP BY partname;
            
 partname |  avg
----------+-------
 prop     | 10.33
 rudder   |  2.71
 wing     | 11.50
```

Following is a PIVOT example with `manufacturer` as an implicit column.

```
SELECT *
FROM (
SELECT
c.quality, c.manufacturer::varchar
FROM test_json a, 
  a.all_data.data.pnr.parts c) PIVOT (
count(*) FOR quality IN (1, 2, NULL)
);
            
 manufacturer      | 1  | 2  | null
-------------------+----+----+------
 local parts co    | 1  | 1  |  1
 big parts co      | 1  | 1  |  1
 small parts co    | 1  | 0  |  2
```

Following is an UNPIVOT example on the `quality` column.

```
SELECT *
FROM 
(
SELECT
c.quality as quality
FROM test_json a, 
  a.all_data.data.pnr.parts c) 
UNPIVOT (cnt FOR column_header IN (quality));
            
 column_header   | cnt
-----------------+----
 quality         | 2    
 quality         | 1    
 quality         | 1    
 quality         | 2    
 quality         | 1
```