FillMissingValues class
The FillMissingValues
class locates null values and empty strings in a
specified DynamicFrame
and uses machine learning methods, such as linear regression
and random forest, to predict the missing values. The ETL job uses the values in the input
dataset to train the machine learning model, which then predicts what the missing values should
be.
Tip
If you use incremental data sets, then each incremental set is used as the training data for the machine learning model, so the results might not be as accurate.
To import:
from awsglueml.transforms import FillMissingValues
Methods
apply(frame, missing_values_column, output_column ="", transformation_ctx ="", info ="", stageThreshold = 0, totalThreshold = 0)
Fills a dynamic frame's missing values in a specified column and returns a new frame with estimates in a new column. For rows without missing values, the specified column's value is duplicated to the new column.
frame
– TheDynamicFrame
in which to fill missing values. Required.missing_values_column
– The column containing missing values (null
values and empty strings). Required.output_column
– The name of the new column that will contain estimated values for all rows whose value was missing. Optional; the default is the name ofmissing_values_column
suffixed by"_filled"
.transformation_ctx
– A unique string that is used to identify state information (optional).info
– A string associated with errors in the transformation (optional).stageThreshold
– The maximum number of errors that can occur in the transformation before it errors out (optional; the default is zero).totalThreshold
– The maximum number of errors that can occur overall before processing errors out (optional; the default is zero).
Returns a new DynamicFrame
with one additional column that contains estimations for rows with missing values and the present value for other rows.