调整机器学习模型 - AWS Step Functions
AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 AWS 服务入门

调整机器学习模型

此样本项目演示使用 Amazon SageMaker 调整机器学习模型的超参数和批量转换测试数据集。此示例项目创建以下内容:

  • 三个 AWS Lambda 函数

  • 一个 Amazon Simple Storage Service (Amazon S3) 存储桶

  • AWS Step Functions 状态机

  • 相关 AWS Identity and Access Management (IAM) 角色

在此项目中,Step Functions 使用 Lambda 函数通过测试数据集为 Amazon S3 存储桶添加种子。然后,它使用 Amazon SageMaker 服务集成来创建超参数调整作业。然后,它使用一个 Lambda 函数来提取数据路径,保存调整模型,提取模型名称,然后运行批处理转换作业以在 Amazon SageMaker 中进行推理。

有关 Amazon SageMaker 和 Step Functions 服务集成的更多信息,请参阅以下内容:

注意

此示例项目可能会产生费用。

为新 AWS 用户提供了免费使用套餐。在此套餐中,低于某种使用水平的服务是免费的。有关 AWS 成本和免费套餐的更多信息,请参阅 Amazon SageMaker 定价

创建状态机并预置资源

  1. 打开 Step Functions 控制台,然后选择 Create a state machine (创建状态机)

  2. 选择 Sample Projects (示例项目),然后选择 Tune a machine learning model (调整机器学习模型)

    此时将显示状态机 Code (代码)Visual Workflow (可视工作流程)

    
                    超参数工作流程。
  3. 选择 Next

    此时将显示 Deploy resources (部署资源) 页面,其中列出了将创建的资源。对于本示例项目,资源包括:

    • 三个 Lambda 函数

    • 一个 Amazon S3 存储桶

    • Step Functions 状态机

    • 相关 IAM 角色

  4. 选择 Deploy Resources (部署资源)

    注意

    创建这些资源和相关 IAM 权限可能需要长达 10 分钟的时间。当显示 Deploy resources (部署资源) 页面时,您可打开 Stack ID (堆栈 ID) 链接以查看正在预置的资源。

启动新的执行

  1. 打开 Step Functions 控制台

  2. State machines (状态机) 页面上,选择示例项目创建的 HyperparamTuningAndBatchTransformStateMachine 状态机,然后选择 Start execution (开始执行)

  3. New execution 页面上,输入执行名称 (可选),然后选择 Start Execution (开始执行)

  4. (可选)为帮助您标识执行,您可以在 Enter an execution name (输入执行名称) 框中为执行指定一个 ID。如果未输入 ID,Step Functions 将自动生成一个唯一 ID。

    注意

    Step Functions 允许您创建包含非 ASCII 字符的状态机、执行和活动名称。这些非 ASCII 名称不适用于 Amazon CloudWatch。为确保您可以跟踪 CloudWatch 指标,请选择一个只使用 ASCII 字符的名称。

  5. (可选)转到 Step Functions Dashboard (控制面板) 上新创建的状态机,然后选择 New execution (新执行)

  6. 执行完成后,您可以在 Visual workflow (可视工作流) 上选择状态,并浏览 Step details (步骤详细信息) 下的 Input (输入)Output (输出)

示例状态机代码

此示例项目中的状态机通过将参数直接传递给这些资源来与 Amazon SageMaker 和 AWS Lambda 集成,并且使用 Amazon S3 存储桶来训练数据源和输出。

浏览此示例状态机以了解 Step Functions 如何控制 Lambda 和 Amazon SageMaker。

有关 AWS Step Functions 如何控制其他 AWS 服务的更多信息,请参阅 服务与 AWS Step Functions 集成

{ "StartAt": "Generate Training Dataset", "States": { "Generate Training Dataset": { "Resource": "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageMa-LambdaForDataGeneration-1TF67BUE5A12U", "Type": "Task", "Next": "HyperparameterTuning (XGBoost)" }, "HyperparameterTuning (XGBoost)": { "Resource": "arn:aws:states:::sagemaker:createHyperParameterTuningJob.sync", "Parameters": { "HyperParameterTuningJobName.$": "$.body.jobName", "HyperParameterTuningJobConfig": { "Strategy": "Bayesian", "HyperParameterTuningJobObjective": { "Type": "Minimize", "MetricName": "validation:rmse" }, "ResourceLimits": { "MaxNumberOfTrainingJobs": 2, "MaxParallelTrainingJobs": 2 }, "ParameterRanges": { "ContinuousParameterRanges": [{ "Name": "alpha", "MinValue": "0", "MaxValue": "1000", "ScalingType": "Auto" }, { "Name": "gamma", "MinValue": "0", "MaxValue": "5", "ScalingType": "Auto" } ], "IntegerParameterRanges": [{ "Name": "max_delta_step", "MinValue": "0", "MaxValue": "10", "ScalingType": "Auto" }, { "Name": "max_depth", "MinValue": "0", "MaxValue": "10", "ScalingType": "Auto" } ] } }, "TrainingJobDefinition": { "AlgorithmSpecification": { "TrainingImage": "433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest", "TrainingInputMode": "File" }, "OutputDataConfig": { "S3OutputPath": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/models" }, "StoppingCondition": { "MaxRuntimeInSeconds": 86400 }, "ResourceConfig": { "InstanceCount": 1, "InstanceType": "ml.m4.xlarge", "VolumeSizeInGB": 30 }, "RoleArn": "arn:aws:iam::012345678912:role/StepFunctionsSample-SageM-SageMakerAPIExecutionRol-1MNH1VS5CGGOG", "InputDataConfig": [{ "DataSource": { "S3DataSource": { "S3DataDistributionType": "FullyReplicated", "S3DataType": "S3Prefix", "S3Uri": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/csv/train.csv" } }, "ChannelName": "train", "ContentType": "text/csv" }, { "DataSource": { "S3DataSource": { "S3DataDistributionType": "FullyReplicated", "S3DataType": "S3Prefix", "S3Uri": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/csv/validation.csv" } }, "ChannelName": "validation", "ContentType": "text/csv" }], "StaticHyperParameters": { "precision_dtype": "float32", "num_round": "2" } } }, "Type": "Task", "Next": "Extract Model Path" }, "Extract Model Path": { "Resource": "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelPath-V0R37CVARUS9", "Type": "Task", "Next": "HyperparameterTuning - Save Model" }, "HyperparameterTuning - Save Model": { "Parameters": { "PrimaryContainer": { "Image": "433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest", "Environment": {}, "ModelDataUrl.$": "$.body.modelDataUrl" }, "ExecutionRoleArn": "arn:aws:iam::012345678912:role/StepFunctionsSample-SageM-SageMakerAPIExecutionRol-1MNH1VS5CGGOG", "ModelName.$": "$.body.bestTrainingJobName" }, "Resource": "arn:aws:states:::sagemaker:createModel", "Type": "Task", "Next": "Extract Model Name" }, "Extract Model Name": { "Resource": "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelName-8FUOB30SM5EM", "Type": "Task", "Next": "Batch transform" }, "Batch transform": { "Type": "Task", "Resource": "arn:aws:states:::sagemaker:createTransformJob.sync", "Parameters": { "ModelName.$": "$.body.jobName", "TransformInput": { "CompressionType": "None", "ContentType": "text/csv", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/csv/test.csv" } } }, "TransformOutput": { "S3OutputPath": "s3://stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/output" }, "TransformResources": { "InstanceCount": 1, "InstanceType": "ml.m4.xlarge" }, "TransformJobName.$": "$.body.jobName" }, "End": true } } }

有关在将 Step Functions 与其他 AWS 服务一起使用时如何配置 IAM 的信息,请参阅 集成服务的 IAM 策略

IAM 示例

示例项目生成的这些示例 AWS Identity and Access Management (IAM) 策略包括执行状态机和相关资源所需的最小权限。我们建议在您的 IAM 策略中仅包含这些必需的权限。

以下 IAM 策略被附加到状态机,并允许状态机执行访问必要的 Amazon SageMaker、Lambda 和 Amazon S3 资源。

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "sagemaker:CreateHyperParameterTuningJob", "sagemaker:DescribeHyperParameterTuningJob", "sagemaker:StopHyperParameterTuningJob", "sagemaker:ListTags", "sagemaker:CreateModel", "sagemaker:CreateTransformJob", "iam:PassRole" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "lambda:InvokeFunction" ], "Resource": [ "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageMa-LambdaForDataGeneration-1TF67BUE5A12U", "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelPath-V0R37CVARUS9", "arn:aws:lambda:us-west-2:012345678912:function:StepFunctionsSample-SageM-LambdaToExtractModelName-8FUOB30SM5EM" ], "Effect": "Allow" }, { "Action": [ "events:PutTargets", "events:PutRule", "events:DescribeRule" ], "Resource": [ "arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTrainingJobsRule", "arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTransformJobsRule", "arn:aws:events:*:*:rule/StepFunctionsGetEventsForSageMakerTuningJobsRule" ], "Effect": "Allow" } ] }

以下 IAM 策略在 HyperparameterTuning 状态的 TrainingJobDefinitionHyperparameterTuning 字段中引用。

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "cloudwatch:PutMetricData", "logs:CreateLogStream", "logs:PutLogEvents", "logs:CreateLogGroup", "logs:DescribeLogStreams", "ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:BatchGetImage", "sagemaker:DescribeHyperParameterTuningJob", "sagemaker:StopHyperParameterTuningJob", "sagemaker:ListTags" ], "Resource": "*", "Effect": "Allow" }, { "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/*", "Effect": "Allow" }, { "Action": [ "s3:ListBucket" ], "Resource": "arn:aws:s3:::stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f", "Effect": "Allow" } ] }

以下 IAM 策略允许 Lambda 函数使用示例数据为 Amazon S3 存储桶添加种子。

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:PutObject" ], "Resource": "arn:aws:s3:::stepfunctionssample-sagemak-bucketformodelanddata-80fblmdlcs9f/*", "Effect": "Allow" } ] }

有关在将 Step Functions 与其他 AWS 服务一起使用时如何配置 IAM 的信息,请参阅 集成服务的 IAM 策略