# PySpark 扩展类型 Amazon Glue PySpark 扩展所使用的类型。 ## DataType 其他 Amazon Glue 类型的基类。 **`__init__(properties={})`** + `properties` – 数据类型的属性 (可选)。   **`typeName(cls)`** 返回 Amazon Glue 类型类的类型（即，类名，其“Type”会从末尾删除）。 + `cls` – 一个派生自 Amazon Glue 的 `DataType` 类实例。   `jsonValue( )` 返回一个包含类的数据类型和属性的 JSON 对象： ``` { "dataType": typeName, "properties": properties } ``` ## AtomicType 和简单衍生继承自并扩展 [DataType](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-datatype) 类，并且充当所有 Amazon Glue 原子数据类型的基类。 **`fromJsonValue(cls, json_value)`** 使用 JSON 对象中的值初始化类实例。 + `cls` – 一个要初始化的 Amazon Glue 类型类。 + `json_value` – 要从其中加载键-值对的 JSON 对象。   以下类型是 [AtomicType](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-atomictype) 类的简单衍生： + `BinaryType` – 二进制数据。 + `BooleanType` – 布尔值。 + `ByteType` – 一个字节值。 + `DateType` – 一个日期时间值。 + `DoubleType` – 一个双精度浮点值。 + `IntegerType` – 一个整数值。 + `LongType` – 一个长整数值。 + `NullType` – 一个空值。 + `ShortType` – 一个短整数值。 + `StringType` – 一个文本字符串。 + `TimestampType` – 一个时间戳值 (通常以秒为单位，从 1/1/1970 开始)。 + `UnknownType` – 一个未知类型的值。 ## DecimalType(AtomicType) 继承自并扩展 [AtomicType](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-atomictype) 类以表示十进制数字 (以十进制数字表示的数字，与二进制以 2 为底数的数字相对)。 **`__init__(precision=10, scale=2, properties={})`** + `precision` – 十进制数中的位数 (可选；默认值为 10)。 + `scale` – 小数点右侧的位数 (可选；默认值为 2)。 + `properties` – 十进制数字的属性 (可选)。 ## EnumType(AtomicType) 继承自并扩展 [AtomicType](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-atomictype) 类以表示有效选项的枚举。 **`__init__(options)`** + `options` – 正被枚举的选项的列表。 ## 集合类型 + [ArrayType(DataType)](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-arraytype) + [ChoiceType(DataType)](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-choicetype) + [MapType(DataType)](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-maptype) + [Field(Object)](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-field) + [StructType(DataType)](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-structtype) + [EntityType(DataType)](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-entitytype) ## ArrayType(DataType) **`__init__(elementType=UnknownType(), properties={})`** + `elementType` – 数组中的元素的类型 (可选；默认值为 UnknownType)。 + `properties` – 数组的属性 (可选)。 ## ChoiceType(DataType) **`__init__(choices=[], properties={})`** + `choices` – 可能选项的列表 (可选)。 + `properties` – 这些选项的属性 (可选)。   **`add(new_choice)`** 将新选项添加到可能的选项列表中。 + `new_choice` – 要添加到可能选项列表中的选项。   **`merge(new_choices)`** 将新选项列表与现有选项列表合并。 + `new_choices` – 要与现有选项列表合并的新选项列表。 ## MapType(DataType) **`__init__(valueType=UnknownType, properties={})`** + `valueType` – 映射中的值的类型 (可选；默认值为 UnknownType)。 + `properties` – 映射的属性 (可选)。 ## Field(Object) 根据从 [DataType](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-datatype) 派生的对象创建一个字段对象。 **`__init__(name, dataType, properties={})`** + `name` – 要为字段分配的名称。 + `dataType` – 要从中创建字段的对象。 + `properties` – 字段的属性 (可选)。 ## StructType(DataType) 定义数据结构 (`struct`)。 **`__init__(fields=[], properties={})`** + `fields` – 要在结构中包括的字段的列表 (类型为 `Field`) (可选)。 + `properties` – 结构的属性 (可选)。   **`add(field)`** + `field` – 要添加到结构中的类型 `Field` 的对象。   **`hasField(field)`** 如果此结构具有同名字段，则返回 `True`，否则返回 `False`。 + `field` – 一个字段名称，或其名称被使用的类型 `Field` 的对象。   **`getField(field)`** + `field` – 一个字段名称，或其名称被使用的类型 `Field` 的对象。如果此结构具有同名字段，则返回它。 ## EntityType(DataType) `__init__(entity, base_type, properties)` 此类尚未实现。 ## 其他类型 + [DataSource(object)](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-data-source) + [DataSink(object)](#aws-glue-api-crawler-pyspark-extensions-types-awsglue-data-sink) ## DataSource(object) **`__init__(j_source, sql_ctx, name)`** + `j_source` – 数据源。 + `sql_ctx` – SQL 上下文。 + `name` – 数据源名称。   **`setFormat(format, **options)`** + `format` – 要为数据源设置的格式。 + `options`- 要为数据源设置的选项的集合。有关格式选项的更多信息，请参阅 [Amazon Glue for Spark 中的输入和输出的数据格式选项](aws-glue-programming-etl-format.md)。   `getFrame()` 为数据源返回 `DynamicFrame`。 ## DataSink(object) **`__init__(j_sink, sql_ctx)`** + `j_sink` – 要创建的堆栈。 + `sql_ctx` – 数据接收器的 SQL 上下文。   **`setFormat(format, **options)`** + `format` – 要为数据接收器设置的格式。 + `options`- 要为数据接收器设置的选项的集合。有关格式选项的更多信息，请参阅 [Amazon Glue for Spark 中的输入和输出的数据格式选项](aws-glue-programming-etl-format.md)。   **`setAccumulableSize(size)`** + `size` – 要设置的可累积大小 (以字节为单位)。   **`writeFrame(dynamic_frame, info="")`** + `dynamic_frame` – 要编写的 `DynamicFrame`。 + `info` – 有关 `DynamicFrame` 的信息 (可选)。   **`write(dynamic_frame_or_dfc, info="")`** 写入 `DynamicFrame` 或 `DynamicFrameCollection`。 + `dynamic_frame_or_dfc` – 将被写入的 `DynamicFrame` 对象或 `DynamicFrameCollection` 对象。 + `info` – 有关将被写入的 `DynamicFrame` 或 `DynamicFrames` 的信息 (可选)。