The FillWithMode transform formats a column according to the phone numberformat you specify. You can also specify tie-breaker logic, where some of the values are identical. For example, consider the following values: 1 2 2 3 3 4

A modeType of MINIMUM causes FillWithMode to return 2 as the mode value. If modeType is MAXIMUM, the mode is 3. For AVERAGE, the mode is 2.5.


from awsglue.context import * from pyspark.sql import SparkSession from awsgluedi.transforms import * sc = SparkContext() spark = SparkSession(sc) input_df = spark.createDataFrame( [ (105.111, 13.12), (1055.123, 13.12), (None, 13.12), (13.12, 13.12), (None, 13.12), ], ["source_column_1", "source_column_2"], ) try: df_output = data_quality.FillWithMode.apply( data_frame=input_df, spark_context=sc, source_column="source_column_1", mode_type="MAXIMUM" ) except: print("Unexpected Error happened ") raise


The output of the given code will be:

``` +---------------+---------------+ |source_column_1|source_column_2| +---------------+---------------+ | 105.111| 13.12| | 1055.123| 13.12| | 1055.123| 13.12| | 13.12| 13.12| | 1055.123| 13.12| +---------------+---------------+ ```

The FillWithMode transformation from the `awsglue.data_quality` module is applied to the `input_df` DataFrame. It replaces the `null` values in the source_column_1 column with the maximum value (`mode_type="MAXIMUM"`) from the non-null values in that column.

In this case, the maximum value in the source_column_1 column is `1055.123`. Therefore, the `null` values in source_column_1 are replaced by `1055.123` in the output DataFrame `df_output`.


__call__(spark_context, data_frame, source_column, mode_type)

The FillWithMode transform formats the case of strings in a column.

  • source_column – The name of an existing column.

  • mode_type – How to resolve tie values in the data. This value must be one of MINIMUM, NONE, AVERAGE, or MAXIMUM.

