本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
Serializing complex nested JSON
本教程中展示的方法的替代方法是将顶级嵌套收集列查询为序列化JSON。您可以使用序列化来检查、转换和将嵌套数据的嵌套数据用作JSON Redshift Spectrum.
ORC、JSON、ION和Parquet格式支持此方法。使用会话配置参数 json_serialization_enable
配置序列化行为。设置后,复杂的JSON数据类型将序列化为Varchar(65535)。嵌套JSON可以使用 JSON 函数. 有关更多信息,请参阅 json_serialization_enable。)
例如,没有设置 json_serialization_enable
,访问嵌套列的以下查询直接失败。
SELECT * FROM spectrum.customers LIMIT 1;
=> ERROR: Nested tables do not support '*' in the SELECT clause.
SELECT name FROM spectrum.customers LIMIT 1;
=> ERROR: column "name" does not exist in customers
设置 json_serialization_enable
可直接查询顶级系列。
SET json_serialization_enable TO true;
SELECT * FROM spectrum.customers order by id LIMIT 1;
id | name | phones | orders
---+--------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------
1 | {"given": "John", "family": "Smith"} | ["123-457789"] | [{"shipdate": "2018-03-01T11:59:59.000Z", "price": 100.50}, {"shipdate": "2018-03-01T09:10:00.000Z", "price": 99.12}]
SELECT name FROM spectrum.customers order by id LIMIT 1;
name
---------
{"given": "John", "family": "Smith"}
在序列化嵌套JSON时,考虑以下项目。
-
When collection columns are serialized as VARCHAR(65535), their nested subfields can't be accessed directly as part of the query syntax (for example, in the filter clause). However, JSON functions can be used to access nested JSON.
-
The following specialized representations are not supported:
-
ORC unions
-
ORC maps with complex type keys
-
Ion datagrams
-
Ion SEXP
-
-
Timestamps are returned as ISO serialized strings.
-
Primitive map keys are promoted to string (for example,
1
to"1"
). -
Top-level null values are serialized as NULLs.
-
If the serialization overflows the maximum VARCHAR size of 65535, the cell is set to NULL.
Serializing complex types containing JSON strings
默认情况下,嵌套收集中包含的字符串值序列化为EscapedJSON字符串。当字符串是有效的JSON时,逃离可能是不希望的。相反,您可能希望将Varchar直接写入为JSON的嵌套子元素或字段。利用
json_serialization_parse_nested_strings
会话级别配置。当两个时间 json_serialization_enable
和 json_serialization_parse_nested_strings
已设置,有效JSON值在没有逃脱字符的情况下进行串行化。如果该值不是有效的JSON,则将其转移为 json_serialization_parse_nested_strings
配置值未设置。有关更多信息,请参阅 json_serialization_parse_nested_strings。)
例如,假设上一示例中的数据包含JSON为 structs
复杂类型 name
Varchar(20)字段:
name
---------
{"given": "{\"first\":\"John\",\"middle\":\"James\"}", "family": "Smith"}
当WhenWhen json_serialization_parse_nested_strings
设置, name
列的序列化如下:
SET json_serialization_enable TO true;
SET json_serialization_parse_nested_strings TO true;
SELECT name FROM spectrum.customers order by id LIMIT 1;
name
---------
{"given": {"first":"John","middle":"James"}, "family": "Smith"}
而不是像这样逃避:
SET json_serialization_enable TO true;
SELECT name FROM spectrum.customers order by id LIMIT 1;
name
---------
{"given": "{\"first\":\"John\",\"middle\":\"James\"}", "family": "Smith"}