FillWithMode class - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

FillWithMode class

The FillWithMode transform formats a column according to the phone numberformat you specify. You can also specify tie-breaker logic, where some of the values are identical. For example, consider the following values: 1 2 2 3 3 4

A modeType of MINIMUM causes FillWithMode to return 2 as the mode value. If modeType is MAXIMUM, the mode is 3. For AVERAGE, the mode is 2.5.

Example

from awsglue.context import * from pyspark.sql import SparkSession from awsgluedi.transforms import * sc = SparkContext() spark = SparkSession(sc) input_df = spark.createDataFrame( [ (105.111, 13.12), (1055.123, 13.12), (None, 13.12), (13.12, 13.12), (None, 13.12), ], ["source_column_1", "source_column_2"], ) try: df_output = data_quality.FillWithMode.apply( data_frame=input_df, spark_context=sc, source_column="source_column_1", mode_type="MAXIMUM" ) df_output.show() except: print("Unexpected Error happened ") raise

Output

The output of the given code will be:

``` +---------------+---------------+ |source_column_1|source_column_2| +---------------+---------------+ | 105.111| 13.12| | 1055.123| 13.12| | 1055.123| 13.12| | 13.12| 13.12| | 1055.123| 13.12| +---------------+---------------+ ```

The FillWithMode transformation from the `awsglue.data_quality` module is applied to the `input_df` DataFrame. It replaces the `null` values in the source_column_1 column with the maximum value (`mode_type="MAXIMUM"`) from the non-null values in that column.

In this case, the maximum value in the source_column_1 column is `1055.123`. Therefore, the `null` values in source_column_1 are replaced by `1055.123` in the output DataFrame `df_output`.

Methods

__call__(spark_context, data_frame, source_column, mode_type)

The FillWithMode transform formats the case of strings in a column.

  • source_column – The name of an existing column.

  • mode_type – How to resolve tie values in the data. This value must be one of MINIMUM, NONE, AVERAGE, or MAXIMUM.

apply(cls, *args, **kwargs)

Inherited from GlueTransform apply.

name(cls)

Inherited from GlueTransform name.

describeArgs(cls)

Inherited from GlueTransform describeArgs.

describeReturn(cls)

Inherited from GlueTransform describeReturn.

describeTransform(cls)

Inherited from GlueTransform describeTransform.

describeErrors(cls)

Inherited from GlueTransform describeErrors.

describe(cls)

Inherited from GlueTransform describe.