FormatCase class
The FormatCase
transform changes each string in a column to the specified case type.
Example
from pyspark.context import SparkContext from pyspark.sql import SparkSession from awsgluedi.transforms import * sc = SparkContext() spark = SparkSession(sc) datasource1 = spark.read.json("s3://${BUCKET}/json/zips/raw/data") try: df_output = data_cleaning.FormatCase.apply( data_frame=datasource1, spark_context=sc, source_column="city", case_type="LOWER" ) except: print("Unexpected Error happened ") raise
Output
The FormatCase
transformation will convert the values in the `city` column to lowercase based on
the `case_type="LOWER"` parameter. The resulting `df_output` DataFrame will contain all columns from the original
`datasource1` DataFrame, but with the `city` column values in lowercase.
Methods
__call__(spark_context, data_frame, source_column, case_type)
The FormatCase
transform changes each string in a column to the specified case type.
-
source_column
– The name of an existing column. -
case_type
– Supported case types areCAPITAL
,LOWER
,UPPER
,SENTENCE
.
apply(cls, *args, **kwargs)
Inherited from GlueTransform
apply.
name(cls)
Inherited from GlueTransform
name.
describeArgs(cls)
Inherited from GlueTransform
describeArgs.
describeReturn(cls)
Inherited from GlueTransform
describeReturn.
describeTransform(cls)
Inherited from GlueTransform
describeTransform.
describeErrors(cls)
Inherited from GlueTransform
describeErrors.
describe(cls)
Inherited from GlueTransform
describe.