Parsing an input CSV file - Amazon Step Functions
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Parsing an input CSV file

Because there isn't a standardized format to create and maintain data in CSV files, Step Functions parses CSV files based on the following rules:

  • Commas (,) are a delimiter that separates individual fields.

  • Newlines are a delimiter that separates individual records.

  • Fields are treated as strings. For data type conversions, use the States.StringToJson intrinsic function in ItemSelector.

  • Double quotation marks (" ") aren't required to enclose strings. However, strings that are enclosed by double quotation marks can contain commas and newlines without them functioning as delimiters.

  • Escape double quotes by repeating them.

  • If the number of fields in a row is less than the number of fields in the header, Step Functions provides empty strings for the missing values.

  • If the number of fields in a row is more than the number of fields in the header, Step Functions skips the additional fields.

Example
of parsing an input CSV file

Say that you have provided a CSV file named myCSVInput.csv that contains one row as input. Then, you've stored this file in an Amazon S3 bucket that's named my-bucket. The CSV file is as follows.

abc,123,"This string contains commas, a double quotation marks (""), and a newline ( )",{""MyKey"":""MyValue""},"[1,2,3]"

The following state machine reads this CSV file and uses ItemSelector to convert the data types of some of the fields.

{ "StartAt": "Map", "States": { "Map": { "Type": "Map", "ItemProcessor": { "ProcessorConfig": { "Mode": "DISTRIBUTED", "ExecutionType": "STANDARD" }, "StartAt": "Pass", "States": { "Pass": { "Type": "Pass", "End": true } } }, "End": true, "Label": "Map", "MaxConcurrency": 1000, "ItemReader": { "Resource": "arn:aws-cn:states:::s3:getObject", "ReaderConfig": { "InputType": "CSV", "CSVHeaderLocation": "GIVEN", "CSVHeaders": [ "MyLetters", "MyNumbers", "MyString", "MyObject", "MyArray" ] }, "Parameters": { "Bucket": "my-bucket", "Key": "myCSVInput.csv" } }, "ItemSelector": { "MyLetters.$": "$$.Map.Item.Value.MyLetters", "MyNumbers.$": "States.StringToJson($$.Map.Item.Value.MyNumbers)", "MyString.$": "$$.Map.Item.Value.MyString", "MyObject.$": "States.StringToJson($$.Map.Item.Value.MyObject)", "MyArray.$": "States.StringToJson($$.Map.Item.Value.MyArray)" } } } }

When you run this state machine, it produces the following output.

[ { "MyNumbers": 123, "MyObject": { "MyKey": "MyValue" }, "MyString": "This string contains commas, a double quote (\"), and a newline (\n)", "MyLetters": "abc", "MyArray": [ 1, 2, 3 ] } ]