Automate recurring Amazon EMR clusters with Amazon Data Pipeline
Amazon Data Pipeline is a service that automates the movement and transformation of data. You can use it to schedule moving input data into Amazon S3 and to schedule launching clusters to process that data. For example, consider the case where you have a web server recording traffic logs. If you want to run a weekly cluster to analyze the traffic data, you can use Amazon Data Pipeline to schedule those clusters. Amazon Data Pipeline is a data-driven workflow, so that one task (launching the cluster) can be dependent on another task (moving the input data to Amazon S3). It also has robust retry functionality.
For more information about Amazon Data Pipeline, see the Amazon Data Pipeline Developer Guide, especially the tutorials regarding Amazon EMR: