ADVANCED_DATATYPE_FILTER
Filters the current source column based on advanced data type detection. For example, given a column that DataBrew has identified as containing zip codes, this transform can filter the column based on timezone. The details that you can extract depend on the pattern that is detected, as described in Notes below.
Parameters
-
sourceColumn
– The name of a string source column. -
pattern
– The pattern to extract. -
advancedDataType
– Can be one of Phone, Zip Code, Date Time, State, Credit Card, URL, Email, SSN, or Gender. -
filter values
– List of string values that the user wants to filter the column based on. -
strategy
– KEEP_ROWS or DISCARD_ROWS or CLEAR_FILTERS or CLEAR_OTHERS. -
clearWithEmpty
– Booleantrue
orfalse
, to clear rows withempty
instead ofnull
.
Notes
If advancedDataType is Phone, then the pattern can be AREA_CODE, TIME_ZONE, or COUNTRY_CODE.
If advancedDataType is Zip Code, then the pattern can be TIME_ZONE, COUNTRY, STATE, CITY, TYPE, or REGION.
If advancedDataType is Date Time, then the pattern can be DAY, MONTH, MONTH_NAME, WEEK, QUARTER, or YEAR.
If advancedDataType is State, then the pattern can be TIME_ZONE.
If advancedDataType is Credit Card, then the pattern can be LENGTH or NETWORK.
If advancedDataType is URL, then the pattern can be PROTOCOL, TLD, or DOMAIN.
Example
{ "RecipeAction": { "Operation": "ADVANCED_DATATYPE_FILTER", "Parameters": { "pattern": "AREA_CODE", "sourceColumn": "phoneColumn", "advancedDataType": "Phone", "filterValues": ['Ohio'], "strategy": "KEEP_ROWS" } } }