Migrating workloads from Amazon Data Pipeline to Step Functions - Amazon Step Functions
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Migrating workloads from Amazon Data Pipeline to Step Functions

Amazon launched the Amazon Data Pipeline service in 2012. At that time, customers wanted a service that let them use a variety of compute options to move data between different data sources. As data transfer needs changed over time, so have the solutions to those needs. You now have the option to choose the solution that most closely meets your business requirements. For example, you can do any of the following:

  • Use Step Functions to orchestrate workflows between multiple Amazon Web Services.

  • Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to manage workflow orchestration for Apache Airflow.

  • Use Amazon Glue to run and orchestrate Apache Spark applications.

You can migrate typical use cases of Amazon Data Pipeline to either Amazon Glue, Step Functions, or Amazon MWAA. The option you choose depends on your current workload on Amazon Data Pipeline. This topic explains how to migrate from Amazon Data Pipeline to Step Functions.

Migrating workloads from Amazon Data Pipeline

Step Functions is a serverless orchestration service where you build workflows for business-critical applications. With Step Functions' Workflow Studio, you can build workflows and integrate them with more than 11,000 API actions from over 250 Amazon Web Services. This includes Amazon Web Services such as Amazon Lambda, Amazon EMR, and Amazon DynamoDB. You can also use Step Functions to orchestrate data processing pipelines, handle errors, and work with throttling limits on the underlying Amazon Web Services. You can create workflows that process and publish machine learning models, orchestrate microservices, and handle extract, transform, and load (ETL) workflows with Amazon Glue. You can also create long-running, automated workflows for applications that require human interaction.

Step Functions is a fully managed service provided by Amazon. This means that Amazon manages tasks such as maintaining infrastructure, patching workers, and managing OS version updates for you.

When your use case matches the following conditions, we recommend that you migrate from Amazon Data Pipeline to Step Functions:

  • You prefer a serverless, highly available workflow orchestration service.

  • You need a solution that charges at the granularity of a single task execution.

  • Your workloads involve orchestrating tasks for multiple other Amazon Web Services, such as Amazon EMR, Lambda, Amazon Glue, or DynamoDB.

  • You need a low-code solution with a drag-and-drop visual designer for workflow creation. This solution shouldn't require learning unfamiliar, complex programming concepts.

  • You need a service that integrates with over 250 Amazon Web Services that cover over 11,000 API actions. This service must also integrate with custom services and activities outside of Amazon Web Services.

Concept mapping between Step Functions and Amazon Data Pipeline

Amazon Data Pipeline and Step Functions share some common concepts. For example, to define your workflows, you use JSON format in both Amazon Data Pipeline and Step Functions. In Step Functions, you use Amazon States Language, which is a JSON-based, structured language. You use Amazon States Language (ASL) to define your workflows and switch between the textual and visual representations of your workflow. This JSON-based format helps simplify storing your workflows in a source control tool. It also helps you manage multiple versions of your workflows, control their access, or automate their orchestration with CI/CD methods.

The following table describes the mapping between the major concepts used in both the services. The Data pipeline concepts column on the left lists the concepts in Amazon Data Pipeline, while the Step Functions concepts column on the right lists the equivalent concepts in Step Functions.

Data pipeline concepts Step Functions concepts
Pipelines Workflows
Pipeline definition Amazon States Language (ASL)
Activities States and Task
Instances Executions
Attempts Catchers and retriers
Pipeline schedule
Pipeline expressions and functions

Step Functions sample projects

The following list outlines some sample projects that implement the most common Amazon Data Pipeline use cases with Step Functions. You can use these sample projects as a reference to migrate from Amazon Data Pipeline to Step Functions. You can also use them as a boilerplate to build your own workflows and integrate with the supported Amazon Web Services based on your use case.

To learn more about Step Functions, see the following topics and resources:

Pricing comparison

Amazon Data Pipeline is priced by number of pipelines and their level of use. Activities that are run more than once a day (high frequency) are priced at $1 per month per activity. Activities that are run once a day or less (low frequency) are priced at $0.60 per month per activity. Inactive Pipelines are priced at $1 per pipeline. For more information about pricing, see Amazon Data Pipeline Pricing page.

Step Functions has two types of workflows: Standard and Express. Each workflow type has a different pricing model. This comparison is based on the Standard workflow since it best matches common use cases from Amazon Data Pipeline. Standard workflows are priced at $0.025 per 1000 state transitions. There’s no cost for inactive state machines; you only pay for what you use. For more information about pricing, see Amazon Step Functions Pricing page.