Automate recurring clusters with Amazon Data Pipeline - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Automate recurring clusters with Amazon Data Pipeline

Amazon Data Pipeline is a service that automates the movement and transformation of data. You can use it to schedule moving input data into Amazon S3 and to schedule launching clusters to process that data. For example, consider the case where you have a web server recording traffic logs. If you want to run a weekly cluster to analyze the traffic data, you can use Amazon Data Pipeline to schedule those clusters. Amazon Data Pipeline is a data-driven workflow, so that one task (launching the cluster) can be dependent on another task (moving the input data to Amazon S3). It also has robust retry functionality.

For more information about Amazon Data Pipeline, see the Amazon Data Pipeline Developer Guide, especially the tutorials regarding Amazon EMR: