Tune a Machine Learning Model - Amazon Step Functions
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Tune a Machine Learning Model

This sample project demonstrates using SageMaker to tune the hyperparameters of a machine learning model, and to batch transform a test dataset.

In this project, Step Functions uses a Lambda function to seed an Amazon S3 bucket with a test dataset. It then creates a hyperparameter tuning job using the SageMaker service integration. It then uses a Lambda function to extract the data path, saves the tuning model, extracts the model name, and then runs a batch transform job to perform inference in SageMaker.

For more information about SageMaker and Step Functions service integrations, see the following:


This sample project may incur charges.

For new Amazon users, a free usage tier is available. On this tier, services are free below a certain level of usage. For more information about Amazon costs and the Free Tier, see SageMaker Pricing.

Step 1: Create the state machine and provision resources

  1. Open the Step Functions console and choose Create state machine.

  2. Type Tune a machine learning model in the search box, and then choose Tune a machine learning model from the search results that are returned.

  3. Choose Next to continue.

  4. Step Functions lists the Amazon Web Services used in the sample project you selected. It also shows a workflow graph for the sample project. Deploy this project to your Amazon Web Services account or use it as a starting point for building your own projects. Based on how you want to proceed, choose Run a demo or Build on it.

    This sample project deploys the following resources:

    • Three Amazon Lambda functions

    • An Amazon Simple Storage Service (Amazon S3) bucket

    • An Amazon Step Functions state machine

    • Related Amazon Identity and Access Management (IAM) roles

    The following image shows the workflow graph for the Tune a machine learning model sample project:

    Workflow graph of the Tune a machine learning model sample project.
  5. Choose Use template to continue with your selection.

  6. Do one of the following:

    • If you selected Build on it, Step Functions creates the workflow prototype for the sample project you selected. Step Functions doesn't deploy the resources listed in the workflow definition.

      In Workflow Studio's Design mode, drag and drop states from the States browser to continue building your workflow protoype. Or switch to the Code mode that provides an integrated code editor similar to VS Code for updating the Amazon States Language (ASL) definition of your state machine within the Step Functions console. For more information about using Workflow Studio to build your state machines, see Using Workflow Studio.


      Remember to update the placeholder Amazon Resource Name (ARN) for the resources used in the sample project before you run your workflow.

    • If you selected Run a demo, Step Functions creates a read-only sample project which uses an Amazon CloudFormation template to deploy the Amazon resources listed in that template to your Amazon Web Services account.


      To view the state machine definition of the sample project, choose Code.

      When you're ready, choose Deploy and run to deploy the sample project and create the resources.

      It can take up to 10 minutes for these resources and related IAM permissions to be created. While your resources are being deployed, you can open the CloudFormation Stack ID link to see which resources are being provisioned.

      After all the resources in the sample project are created, you can see the new sample project listed on the State machines page.


      Standard charges may apply for each service used in the CloudFormation template.

Step 2: Run the state machine

  1. On the State machines page, choose your sample project.

  2. On the sample project page, choose Start execution.

  3. In the Start execution dialog box, do the following:

    1. (Optional) To identify your execution, you can specify a name for it in the Name box. By default, Step Functions generates a unique execution name automatically.


      Step Functions allows you to create names for state machines, executions, and activities, and labels that contain non-ASCII characters. These non-ASCII names don't work with Amazon CloudWatch. To ensure that you can track CloudWatch metrics, choose a name that uses only ASCII characters.

    2. (Optional) In the Input box, enter input values in JSON format to run your workflow.

      If you chose to Run a demo, you need not provide any execution input.


      If the demo project you deployed contains prepopulated execution input data, use that input to run the state machine.

    3. Choose Start execution.

    4. The Step Functions console directs you to a page that's titled with your execution ID. This page is known as the Execution Details page. On this page, you can review the execution results as the execution progresses or after it's complete.

      To review the execution results, choose individual states on the Graph view, and then choose the individual tabs on the Step details pane to view each state's details including input, output, and definition respectively. For details about the execution information you can view on the Execution Details page, see Execution Details page – Interface overview.

Example State Machine Code

The state machine in this sample project integrates with SageMaker and Amazon Lambda by passing parameters directly to those resources, and uses an Amazon S3 bucket for the training data source and output.

Browse through this example state machine to see how Step Functions controls Lambda and SageMaker.

For more information about how Amazon Step Functions can control other Amazon services, see Using Amazon Step Functions with other services.

{ "StartAt": "Generate Training Dataset", "States": { "Generate Training Dataset": { "Resource": "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageMa-LambdaForDataGeneration-1TF67BUE5A12U", "Type": "Task", "Next": "HyperparameterTuning (XGBoost)" }, "HyperparameterTuning (XGBoost)": { "Resource": "arn:aws:states:::sagemaker:createHyperParameterTuningJob.sync", "Parameters": { "HyperParameterTuningJobName.$": "$.body.jobName", "HyperParameterTuningJobConfig": { "Strategy": "Bayesian", "HyperParameterTuningJobObjective": { "Type": "Minimize", "MetricName": "validation:rmse" }, "ResourceLimits": { "MaxNumberOfTrainingJobs": 2, "MaxParallelTrainingJobs": 2 }, "ParameterRanges": { "ContinuousParameterRanges": [{ "Name": "alpha", "MinValue": "0", "MaxValue": "1000", "ScalingType": "Auto" }, { "Name": "gamma", "MinValue": "0", "MaxValue": "5", "ScalingType": "Auto" } ], "IntegerParameterRanges": [{ "Name": "max_delta_step", "MinValue": "0", "MaxValue": "10", "ScalingType": "Auto" }, { "Name": "max_depth", "MinValue": "0", "MaxValue": "10", "ScalingType": "Auto" } ] } }, "TrainingJobDefinition": { "AlgorithmSpecification": { "TrainingImage": "433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest", "TrainingInputMode": "File" }, "OutputDataConfig": { "S3OutputPath": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/models" }, "StoppingCondition": { "MaxRuntimeInSeconds": 86400 }, "ResourceConfig": { "InstanceCount": 1, "InstanceType": "ml.m4.xlarge", "VolumeSizeInGB": 30 }, "RoleArn": "arn:aws:iam::012345678912:role/StepFunctionsSample-SageM-SageMakerAPIExecutionRol-1MNH1VS5CGGOG", "InputDataConfig": [{ "DataSource": { "S3DataSource": { "S3DataDistributionType": "FullyReplicated", "S3DataType": "S3Prefix", "S3Uri": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/csv/train.csv" } }, "ChannelName": "train", "ContentType": "text/csv" }, { "DataSource": { "S3DataSource": { "S3DataDistributionType": "FullyReplicated", "S3DataType": "S3Prefix", "S3Uri": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/csv/validation.csv" } }, "ChannelName": "validation", "ContentType": "text/csv" }], "StaticHyperParameters": { "precision_dtype": "float32", "num_round": "2" } } }, "Type": "Task", "Next": "Extract Model Path" }, "Extract Model Path": { "Resource": "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelPath-V0R37CVARUS9", "Type": "Task", "Next": "HyperparameterTuning - Save Model" }, "HyperparameterTuning - Save Model": { "Parameters": { "PrimaryContainer": { "Image": "433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest", "Environment": {}, "ModelDataUrl.$": "$.body.modelDataUrl" }, "ExecutionRoleArn": "arn:aws:iam::012345678912:role/StepFunctionsSample-SageM-SageMakerAPIExecutionRol-1MNH1VS5CGGOG", "ModelName.$": "$.body.bestTrainingJobName" }, "Resource": "arn:aws:states:::sagemaker:createModel", "Type": "Task", "Next": "Extract Model Name" }, "Extract Model Name": { "Resource": "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelName-8FUOB30SM5EM", "Type": "Task", "Next": "Batch transform" }, "Batch transform": { "Type": "Task", "Resource": "arn:aws:states:::sagemaker:createTransformJob.sync", "Parameters": { "ModelName.$": "$.body.jobName", "TransformInput": { "CompressionType": "None", "ContentType": "text/csv", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/csv/test.csv" } } }, "TransformOutput": { "S3OutputPath": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/output" }, "TransformResources": { "InstanceCount": 1, "InstanceType": "ml.m4.xlarge" }, "TransformJobName.$": "$.body.jobName" }, "End": true } } }

For information about how to configure IAM when using Step Functions with other Amazon services, see IAM Policies for integrated services.

IAM Examples

These example Amazon Identity and Access Management (IAM) policies generated by the sample project include the least privilege necessary to execute the state machine and related resources. We recommend that you include only those permissions that are necessary in your IAM policies.

The following IAM policy is attached to the state machine, and allows the state machine execution to access necessary SageMaker, Lambda, and Amazon S3 resources.

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "sagemaker:CreateHyperParameterTuningJob", "sagemaker:DescribeHyperParameterTuningJob", "sagemaker:StopHyperParameterTuningJob", "sagemaker:ListTags", "sagemaker:CreateModel", "sagemaker:CreateTransformJob", "iam:PassRole" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "lambda:InvokeFunction" ], "Resource": [ "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageMa-LambdaForDataGeneration-1TF67BUE5A12U", "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelPath-V0R37CVARUS9", "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelName-8FUOB30SM5EM" ], "Effect": "Allow" }, { "Action": [ "events:PutTargets", "events:PutRule", "events:DescribeRule" ], "Resource": [ "arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTrainingJobsRule", "arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTransformJobsRule", "arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTuningJobsRule" ], "Effect": "Allow" } ] }

The following IAM policy is referenced in the TrainingJobDefinition and HyperparameterTuning fields of the HyperparameterTuning state.

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "cloudwatch:PutMetricData", "logs:CreateLogStream", "logs:PutLogEvents", "logs:CreateLogGroup", "logs:DescribeLogStreams", "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "sagemaker:DescribeHyperParameterTuningJob", "sagemaker:StopHyperParameterTuningJob", "sagemaker:ListTags" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/*", "Effect": "Allow" }, { "Action": [ "s3:ListBucket" ], "Resource": "arn:aws:s3:::stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f", "Effect": "Allow" } ] }

The following IAM policy allows the Lambda function to seed the Amazon S3 bucket with sample data.

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:PutObject" ], "Resource": "arn:aws:s3:::stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/*", "Effect": "Allow" } ] }

For information about how to configure IAM when using Step Functions with other Amazon services, see IAM Policies for integrated services.