Parsing an input CSV file
Because there isn't a standardized format to create and maintain data in CSV files, Step Functions parses CSV files based on the following rules:
-
Commas (,) are a delimiter that separates individual fields.
-
Newlines are a delimiter that separates individual records.
-
Fields are treated as strings. For data type conversions, use the
States.StringToJson
intrinsic function in ItemSelector. -
Double quotation marks (" ") aren't required to enclose strings. However, strings that are enclosed by double quotation marks can contain commas and newlines without them functioning as delimiters.
-
Escape double quotes by repeating them.
-
If the number of fields in a row is less than the number of fields in the header, Step Functions provides empty strings for the missing values.
-
If the number of fields in a row is more than the number of fields in the header, Step Functions skips the additional fields.
Example of parsing an input CSV file
Say that
you have provided a CSV file named
that contains one row as input.
Then,
you've stored this file in an Amazon S3 bucket
that's
named
myCSVInput.csv
.
The CSV file is as follows.my-bucket
abc,123,"This string contains commas, a double quotation marks (""), and a newline (
)",{""MyKey"":""MyValue""},"[1,2,3]"
The following state machine reads this CSV file and uses ItemSelector to convert the data types of some of the fields.
{ "StartAt": "Map", "States": { "Map": { "Type": "Map", "ItemProcessor": { "ProcessorConfig": { "Mode": "DISTRIBUTED", "ExecutionType": "STANDARD" }, "StartAt": "Pass", "States": { "Pass": { "Type": "Pass", "End": true } } }, "End": true, "Label": "Map", "MaxConcurrency": 1000, "ItemReader": { "Resource": "arn:aws-cn:states:::s3:getObject", "ReaderConfig": { "InputType": "CSV", "CSVHeaderLocation": "GIVEN", "CSVHeaders": [ "MyLetters", "MyNumbers", "MyString", "MyObject", "MyArray" ] }, "Parameters": { "Bucket": "
my-bucket
", "Key": "myCSVInput.csv
" } }, "ItemSelector": { "MyLetters.$": "$$.Map.Item.Value.MyLetters", "MyNumbers.$": "States.StringToJson($$.Map.Item.Value.MyNumbers)", "MyString.$": "$$.Map.Item.Value.MyString", "MyObject.$": "States.StringToJson($$.Map.Item.Value.MyObject)", "MyArray.$": "States.StringToJson($$.Map.Item.Value.MyArray)" } } } }
When you run this state machine, it produces the following output.
[
{
"MyNumbers": 123,
"MyObject": {
"MyKey": "MyValue"
},
"MyString": "This string contains commas, a double quote (\"), and a newline (\n)",
"MyLetters": "abc",
"MyArray": [
1,
2,
3
]
}
]