Serializing complex nested JSON - Amazon Redshift
AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 AWS 服务入门

如果我们为英文版本指南提供翻译,那么如果存在任何冲突,将以英文版本指南为准。在提供翻译时使用机器翻译。

Serializing complex nested JSON

本教程中展示的方法的替代方法是将顶级嵌套收集列查询为序列化JSON。您可以使用序列化来检查、转换和将嵌套数据的嵌套数据用作JSON Redshift Spectrum. ORC、JSON、ION和Parquet格式支持此方法。使用会话配置参数 json_serialization_enable 配置序列化行为。设置后,复杂的JSON数据类型将序列化为Varchar(65535)。嵌套JSON可以使用 JSON 函数. 有关更多信息,请参阅 json_serialization_enable。)

例如,没有设置 json_serialization_enable,访问嵌套列的以下查询直接失败。

SELECT * FROM spectrum.customers LIMIT 1; => ERROR: Nested tables do not support '*' in the SELECT clause. SELECT name FROM spectrum.customers LIMIT 1; => ERROR: column "name" does not exist in customers

设置 json_serialization_enable 可直接查询顶级系列。

SET json_serialization_enable TO true; SELECT * FROM spectrum.customers order by id LIMIT 1; id | name | phones | orders ---+--------------------------------------+----------------+---------------------------------------------------------------------------------------------------------------------- 1 | {"given": "John", "family": "Smith"} | ["123-457789"] | [{"shipdate": "2018-03-01T11:59:59.000Z", "price": 100.50}, {"shipdate": "2018-03-01T09:10:00.000Z", "price": 99.12}] SELECT name FROM spectrum.customers order by id LIMIT 1; name --------- {"given": "John", "family": "Smith"}

在序列化嵌套JSON时,考虑以下项目。

  • When collection columns are serialized as VARCHAR(65535), their nested subfields can't be accessed directly as part of the query syntax (for example, in the filter clause). However, JSON functions can be used to access nested JSON.

  • The following specialized representations are not supported:

    • ORC unions

    • ORC maps with complex type keys

    • Ion datagrams

    • Ion SEXP

  • Timestamps are returned as ISO serialized strings.

  • Primitive map keys are promoted to string (for example, 1 to "1").

  • Top-level null values are serialized as NULLs.

  • If the serialization overflows the maximum VARCHAR size of 65535, the cell is set to NULL.

Serializing complex types containing JSON strings

默认情况下,嵌套收集中包含的字符串值序列化为EscapedJSON字符串。当字符串是有效的JSON时,逃离可能是不希望的。相反,您可能希望将Varchar直接写入为JSON的嵌套子元素或字段。利用 json_serialization_parse_nested_strings 会话级别配置。当两个时间 json_serialization_enablejson_serialization_parse_nested_strings 已设置,有效JSON值在没有逃脱字符的情况下进行串行化。如果该值不是有效的JSON,则将其转移为 json_serialization_parse_nested_strings 配置值未设置。有关更多信息,请参阅 json_serialization_parse_nested_strings。)

例如,假设上一示例中的数据包含JSON为 structs 复杂类型 name Varchar(20)字段:

name --------- {"given": "{\"first\":\"John\",\"middle\":\"James\"}", "family": "Smith"}

当WhenWhen json_serialization_parse_nested_strings 设置, name 列的序列化如下:

SET json_serialization_enable TO true; SET json_serialization_parse_nested_strings TO true; SELECT name FROM spectrum.customers order by id LIMIT 1; name --------- {"given": {"first":"John","middle":"James"}, "family": "Smith"}

而不是像这样逃避:

SET json_serialization_enable TO true; SELECT name FROM spectrum.customers order by id LIMIT 1; name --------- {"given": "{\"first\":\"John\",\"middle\":\"James\"}", "family": "Smith"}