PySpark extension types
The types that are used by the Amazon Glue PySpark extensions.
DataType
The base class for the other Amazon Glue types.
__init__(properties={})
-
properties– Properties of the data type (optional).
typeName(cls)
Returns the type of the Amazon Glue type class (that is, the class name with "Type" removed from the end).
-
cls– An Amazon Glue class instance derived fromDataType.
jsonValue( )
Returns a JSON object that contains the data type and properties of the class:
{ "dataType": typeName, "properties": properties }
AtomicType and simple derivatives
Inherits from and extends the DataType class, and serves as the base class for all the Amazon Glue atomic data types.
fromJsonValue(cls, json_value)
Initializes a class instance with values from a JSON object.
-
cls– An Amazon Glue type class instance to initialize. -
json_value– The JSON object to load key-value pairs from.
The following types are simple derivatives of the AtomicType class:
BinaryType– Binary data.BooleanType– Boolean values.ByteType– A byte value.DateType– A datetime value.DoubleType– A floating-point double value.IntegerType– An integer value.LongType– A long integer value.NullType– A null value.ShortType– A short integer value.StringType– A text string.TimestampType– A timestamp value (typically in seconds from 1/1/1970).UnknownType– A value of unidentified type.
DecimalType(AtomicType)
Inherits from and extends the AtomicType class to represent a decimal number (a number expressed in decimal digits, as opposed to binary base-2 numbers).
__init__(precision=10, scale=2, properties={})
-
precision– The number of digits in the decimal number (optional; the default is 10). -
scale– The number of digits to the right of the decimal point (optional; the default is 2). -
properties– The properties of the decimal number (optional).
EnumType(AtomicType)
Inherits from and extends the AtomicType class to represent an enumeration of valid options.
__init__(options)
-
options– A list of the options being enumerated.
collection types
ArrayType(DataType)
__init__(elementType=UnknownType(), properties={})
-
elementType– The type of elements in the array (optional; the default is UnknownType). -
properties– Properties of the array (optional).
ChoiceType(DataType)
__init__(choices=[], properties={})
-
choices– A list of possible choices (optional). -
properties– Properties of these choices (optional).
add(new_choice)
Adds a new choice to the list of possible choices.
-
new_choice– The choice to add to the list of possible choices.
merge(new_choices)
Merges a list of new choices with the existing list of choices.
-
new_choices– A list of new choices to merge with existing choices.
MapType(DataType)
__init__(valueType=UnknownType, properties={})
-
valueType– The type of values in the map (optional; the default is UnknownType). -
properties– Properties of the map (optional).
Field(Object)
Creates a field object out of an object that derives from DataType.
__init__(name, dataType, properties={})
-
name– The name to be assigned to the field. -
dataType– The object to create a field from. -
properties– Properties of the field (optional).
StructType(DataType)
Defines a data structure (struct).
__init__(fields=[], properties={})
-
fields– A list of the fields (of typeField) to include in the structure (optional). -
properties– Properties of the structure (optional).
add(field)
-
field– An object of typeFieldto add to the structure.
hasField(field)
Returns True if this structure has a field of the same name, or
False if not.
-
field– A field name, or an object of typeFieldwhose name is used.
getField(field)
-
field– A field name or an object of typeFieldwhose name is used. If the structure has a field of the same name, it is returned.
EntityType(DataType)
__init__(entity, base_type, properties)
This class is not yet implemented.
other types
DataSource(object)
__init__(j_source, sql_ctx, name)
-
j_source– The data source. -
sql_ctx– The SQL context. -
name– The data-source name.
setFormat(format, **options)
-
format– The format to set for the data source. -
options– A collection of options to set for the data source. For more information about format options, see Data format options for inputs and outputs in Amazon Glue for Spark.
getFrame()
Returns a DynamicFrame for the data source.
DataSink(object)
__init__(j_sink, sql_ctx)
-
j_sink– The sink to create. -
sql_ctx– The SQL context for the data sink.
setFormat(format, **options)
-
format– The format to set for the data sink. -
options– A collection of options to set for the data sink. For more information about format options, see Data format options for inputs and outputs in Amazon Glue for Spark.
setAccumulableSize(size)
-
size– The accumulable size to set, in bytes.
writeFrame(dynamic_frame, info="")
-
dynamic_frame– TheDynamicFrameto write. -
info– Information about theDynamicFrame(optional).
write(dynamic_frame_or_dfc, info="")
Writes a DynamicFrame or a DynamicFrameCollection.
-
dynamic_frame_or_dfc– Either aDynamicFrameobject or aDynamicFrameCollectionobject to be written. -
info– Information about theDynamicFrameorDynamicFramesto be written (optional).