Process a CSV file from Amazon S3 using a Distributed Map
This sample project demonstrates how you can use the Distributed Map state to iterate over 10,000 rows of a CSV file that is generated using a Lambda function. The CSV file contains shipping information of customer orders and is stored in an Amazon S3 bucket. The Distributed Map iterates over a batch of 10 rows in the CSV file for data analysis.
The Distributed Map contains a Lambda function to detect any delayed orders. The Distributed Map also contains an Inline Map to process the delayed orders in a batch and returns these delayed orders in an array. For each delayed order, the Inline Map sends a message to an Amazon SQS queue. Finally, this sample project stores the Map Run results to another Amazon S3 bucket in your Amazon Web Services account.
With Distributed Map, you can run up to 10,000 parallel child workflow executions at a time. In this sample project, the maximum concurrency of Distributed Map is set at 1000 that limits it to 1000 parallel child workflow executions.
This sample project creates the state machine, the supporting Amazon resources, and configures the related IAM permissions. Explore this sample project to learn about using the Distributed Map for orchestrating large-scale, parallel workloads, or use it as a starting point for your own projects.
Step 1: Create the state machine
-
Open the Step Functions console
and choose Create state machine. -
Find and choose the starter template you want to work with. Choose Next to continue.
-
Choose Run a demo to create a read-only and ready-to-deploy workflow, or choose Build on it to create an editable state machine definition that you can build on and later deploy.
-
Choose Use template to continue with your selection.
Next steps depend on your previous choice:
-
Run a demo – You can review the state machine before you create a read-only project with resources deployed by Amazon CloudFormation to your Amazon Web Services account.
You can view the state machine definition, and when you are ready, choose Deploy and run to deploy the project and create the resources.
Deploying can take up to 10 minutes to create resources and permissions. You can use the Stack ID link to monitor progress in Amazon CloudFormation.
After deploy completes, you should see your new state machine in the console.
-
Build on it – You can review and edit the workflow definition. You might need to set values for placeholders in the sample project before attemping to run your custom workflow.
Note
Standard charges might apply for services deployed to your account.
Step 2: Run the state machine
On the State machines page, choose your sample project.
On the sample project page, choose Start execution.
In the Start execution dialog box, do the following:
-
(Optional) Enter a custom execution name to override the generated default.
Non-ASCII names and logging
Step Functions accepts names for state machines, executions, activities, and labels that contain non-ASCII characters. Because such characters will not work with Amazon CloudWatch, we recommend using only ASCII characters so you can track metrics in CloudWatch.
-
(Optional) In the Input box, enter input values as JSON. You can skip this step if you are running a demo.
-
Choose Start execution.
The Step Functions console will direct you to an Execution Details page where you can choose states in the Graph view to explore related information in the Step details pane.
-
Congratulations!
You should now have either a running demo or a state machine definition that you can customize.