使用 Amazon EventBridge 自动执行 Amazon SageMaker - Amazon SageMaker
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 Amazon Web Services 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

使用 Amazon EventBridge 自动执行 Amazon SageMaker

Amazon EventBridge 监控 Amazon SageMaker 中的状态更改事件。您可以使用 SageMaker 自动执行并自动响应事件,例如训练作业状态更改或终端节点状态更改。SageMaker 的事件近乎实时传输到 EventBridge。您可以编写简单规则来指示您关注的事件,并指示要在事件匹配规则时执行的自动化操作。有关如何创建规则的示例,请参阅使用 Amazon EventBridge 安排管道.

可自动触发的操作示例包括:

  • 调用 Amazon Lambda 函数

  • 调用 Amazon EC2 Run Command

  • 将事件中继到 Amazon Kinesis Data Streams

  • 激活 Amazon Step Functions 状态机

  • 通知 Amazon SNS 主题或Amazon SMS排队

训练作业状态更改

指示 SageMaker 训练作业的状态更改。

如果 TrainingJobStatus 的值为 Failed,则事件包含 FailureReason 字段,此字段描述训练作业为何失败。

{ "version": "0", "id": "844e2571-85d4-695f-b930-0153b71dcb42", "detail-type": "SageMaker Training Job State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2018-10-06T12:26:13Z", "region": "us-east-1", "resources": [ "arn:aws:sagemaker:us-east-1:123456789012:training-job/kmeans-1" ], "detail": { "TrainingJobName": "89c96cc8-dded-4739-afcc-6f1dc936701d", "TrainingJobArn": "arn:aws:sagemaker:us-east-1:123456789012:training-job/kmeans-1", "TrainingJobStatus": "Completed", "SecondaryStatus": "Completed", "HyperParameters": { "Hyper": "Parameters" }, "AlgorithmSpecification": { "TrainingImage": "TrainingImage", "TrainingInputMode": "TrainingInputMode" }, "RoleArn": "arn:aws:iam::123456789012:role/SMRole", "InputDataConfig": [ { "ChannelName": "Train", "DataSource": { "S3DataSource": { "S3DataType": "S3DataType", "S3Uri": "S3Uri", "S3DataDistributionType": "S3DataDistributionType" } }, "ContentType": "ContentType", "CompressionType": "CompressionType", "RecordWrapperType": "RecordWrapperType" } ], "OutputDataConfig": { "KmsKeyId": "KmsKeyId", "S3OutputPath": "S3OutputPath" }, "ResourceConfig": { "InstanceType": "InstanceType", "InstanceCount": 3, "VolumeSizeInGB": 20, "VolumeKmsKeyId": "VolumeKmsKeyId" }, "VpcConfig": { }, "StoppingCondition": { "MaxRuntimeInSeconds": 60 }, "CreationTime": "1583831889050", "TrainingStartTime": "1583831889050", "TrainingEndTime": "1583831889050", "LastModifiedTime": "1583831889050", "SecondaryStatusTransitions": [ ], "Tags": { } } }

更改超级参数优化作业状态

指示 SageMaker 超参数优化作业的状态更改。

{ "version": "0", "id": "844e2571-85d4-695f-b930-0153b71dcb42", "detail-type": "SageMaker HyperParameter Tuning Job State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2018-10-06T12:26:13Z", "region": "us-east-1", "resources": [ "arn:aws:sagemaker:us-east-1:123456789012:tuningJob/x" ], "detail": { "HyperParameterTuningJobName": "016bffd3-6d71-4d3a-9710-0a332b2759fc", "HyperParameterTuningJobArn": "arn:aws:sagemaker:us-east-1:123456789012:tuningJob/x", "TrainingJobDefinition": { "StaticHyperParameters": {}, "AlgorithmSpecification": { "TrainingImage": "trainingImageName", "TrainingInputMode": "inputModeFile", "MetricDefinitions": [ { "Name": "metricName", "Regex": "regex" } ] }, "RoleArn": "roleArn", "InputDataConfig": [ { "ChannelName": "channelName", "DataSource": { "S3DataSource": { "S3DataType": "s3DataType", "S3Uri": "s3Uri", "S3DataDistributionType": "s3DistributionType" } }, "ContentType": "contentType", "CompressionType": "gz", "RecordWrapperType": "RecordWrapper" } ], "VpcConfig": { "SecurityGroupIds": [ "securityGroupIds" ], "Subnets": [ "subnets" ] }, "OutputDataConfig": { "KmsKeyId": "kmsKeyId", "S3OutputPath": "s3OutputPath" }, "ResourceConfig": { "InstanceType": "instanceType", "InstanceCount": 10, "VolumeSizeInGB": 500, "VolumeKmsKeyId": "volumeKeyId" }, "StoppingCondition": { "MaxRuntimeInSeconds": 3600 } }, "HyperParameterTuningJobStatus": "status", "CreationTime": "1583831889050", "LastModifiedTime": "1583831889050", "TrainingJobStatusCounters": { "Completed": 1, "InProgress": 0, "RetryableError": 0, "NonRetryableError": 0, "Stopped": 0 }, "ObjectiveStatusCounters": { "Succeeded": 1, "Pending": 0, "Failed": 0 }, "Tags": {} } }

转换作业状态更改

指示 SageMaker 批次转换作业的状态更改。

如果 TransformJobStatus 的值为 Failed,则事件包含 FailureReason 字段,此字段描述训练作业为何失败。

{ "version": "0", "id": "844e2571-85d4-695f-b930-0153b71dcb42", "detail-type": "SageMaker Transform Job State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2018-10-06T12:26:13Z", "region": "us-east-1", "resources": ["arn:aws:sagemaker:us-east-1:123456789012:transform-job/myjob"], "detail": { "TransformJobName": "4b52bd8f-e034-4345-818d-884bdd7c9724", "TransformJobArn": "arn:aws:sagemaker:us-east-1:123456789012:transform-job/myjob", "TransformJobStatus": "another status... GO", "FailureReason": "failed why 1", "ModelName": "i am a beautiful model", "MaxConcurrentTransforms": 5, "MaxPayloadInMB": 10, "BatchStrategy": "Strategizing...", "Environment": { "environment1": "environment2" }, "TransformInput": { "DataSource": { "S3DataSource": { "S3DataType": "s3DataType", "S3Uri": "s3Uri" } }, "ContentType": "content type", "CompressionType": "compression type", "SplitType": "split type" }, "TransformOutput": { "S3OutputPath": "s3Uri", "Accept": "accept", "AssembleWith": "assemblyType", "KmsKeyId": "kmsKeyId" }, "TransformResources": { "InstanceType": "instanceType", "InstanceCount": 3 }, "CreationTime": "2018-10-06T12:26:13Z", "TransformStartTime": "2018-10-06T12:26:13Z", "TransformEndTime": "2018-10-06T12:26:13Z", "Tags": {} } }

端点状态更改

指示 SageMaker 托管的实时推理终端节点的状态更改。

下面显示了中包含终端节点的事件IN_SERVICE状态。

{ "version": "0", "id": "d2921b5a-b0ad-cace-a8e3-0f159d018e06", "detail-type": "SageMaker Endpoint State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "1583831889050", "region": "us-west-2", "resources": [ "arn:aws:sagemaker:us-west-2:123456789012:endpoint/myendpoint" ], "detail": { "EndpointName": "MyEndpoint", "EndpointArn": "arn:aws:sagemaker:us-west-2:123456789012:endpoint/myendpoint", "EndpointConfigName": "MyEndpointConfig", "ProductionVariants": [ { "DesiredWeight": 1.0, "DesiredInstanceCount": 1.0 } ], "EndpointStatus": "IN_SERVICE", "CreationTime": 1592411992203.0, "LastModifiedTime": 1592411994287.0, "Tags": { } } }

功能组状态更改

表示在FeatureGroupStatus或者OfflineStoreStatusSageMaker 功能组的数据。

{ "version": "0", "id": "93201303-abdb-36a4-1b9b-4c1c3e3671c0", "detail-type": "SageMaker Feature Group State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2021-01-26T01:22:01Z", "region": "us-east-1", "resources": [ "arn:aws:sagemaker:us-east-1:123456789012:feature-group/sample-feature-group" ], "detail": { "FeatureGroupArn": "arn:aws:sagemaker:us-east-1:123456789012:feature-group/sample-feature-group", "FeatureGroupName": "sample-feature-group", "RecordIdentifierFeatureName": "RecordIdentifier", "EventTimeFeatureName": "EventTime", "FeatureDefinitions": [ { "FeatureName": "RecordIdentifier", "FeatureType": "Integral" }, { "FeatureName": "EventTime", "FeatureType": "Fractional" } ], "CreationTime": 1611624059000, "OnlineStoreConfig": { "EnableOnlineStore": true }, "OfflineStoreConfig": { "S3StorageConfig": { "S3Uri": "s3://offline/s3/uri" }, "DisableGlueTableCreation": false, "DataCatalogConfig": { "TableName": "sample-feature-group-1611624059", "Catalog": "AwsDataCatalog", "Database": "sagemaker_featurestore" } }, "RoleArn": "arn:aws:iam::123456789012:role/SageMakerRole", "FeatureGroupStatus": "Active", "Tags": {} } }

模型包状态更改

指示 SageMaker 模型包的状态更改。

{ "version": "0", "id": "844e2571-85d4-695f-b930-0153b71dcb42", "detail-type": "SageMaker Model Package State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2021-02-24T17:00:14Z", "region": "us-east-2", "resources": [ "arn:aws:sagemaker:us-east-2:123456789012:model-package/versionedmp-p-idy6c3e1fiqj/2" ], "source": [ "aws.sagemaker" ], "detail": { "ModelPackageGroupName": "versionedmp-p-idy6c3e1fiqj", "ModelPackageVersion": 2, "ModelPackageArn": "arn:aws:sagemaker:us-east-2:123456789012:model-package/versionedmp-p-idy6c3e1fiqj/2", "CreationTime": "2021-02-24T17:00:14Z", "InferenceSpecification": { "Containers": [ { "Image": "257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3", "ImageDigest": "sha256:4dc8a7e4a010a19bb9e0a6b063f355393f6e623603361bd8b105f554d4f0c004", "ModelDataUrl": "s3://sagemaker-project-p-idy6c3e1fiqj/versionedmp-p-idy6c3e1fiqj/AbaloneTrain/pipelines-4r83jejmhorv-TrainAbaloneModel-xw869y8C4a/output/model.tar.gz" } ], "SupportedContentTypes": [ "text/csv" ], "SupportedResponseMIMETypes": [ "text/csv" ] }, "ModelPackageStatus": "Completed", "ModelPackageStatusDetails": { "ValidationStatuses": [], "ImageScanStatuses": [] }, "CertifyForMarketplace": false, "ModelApprovalStatus": "Rejected", "MetadataProperties": { "GeneratedBy": "arn:aws:sagemaker:us-east-2:123456789012:pipeline/versionedmp-p-idy6c3e1fiqj/execution/4r83jejmhorv" }, "ModelMetrics": { "ModelQuality": { "Statistics": { "ContentType": "application/json", "S3Uri": "s3://sagemaker-project-p-idy6c3e1fiqj/versionedmp-p-idy6c3e1fiqj/script-2021-02-24-10-55-15-413/output/evaluation/evaluation.json" } } }, "LastModifiedTime": "2021-02-24T17:00:14Z" } }

管道执行状态更改

指示 SageMaker 管道执行的状态更改。

{ "version": "0", "id": "315c1398-40ff-a850-213b-158f73kd93ir", "detail-type": "SageMaker Model Building Pipeline Execution Status Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2021-03-15T16:10:11Z", "region": "us-east-1", "resources": ["arn:aws:sagemaker:us-east-1:123456789012:pipeline/myPipeline-123", "arn:aws:sagemaker:us-east-1:123456789012:pipeline/myPipeline-123/execution/p4jn9xou8a8s"], "detail": { "pipelineExecutionDisplayName": "SomeDisplayName", "currentPipelineExecutionStatus": "Succeeded", "previousPipelineExecutionStatus": "Executing", "executionStartTime": "2021-03-15T16:03:13Z", "executionEndTime": "2021-03-15T16:10:10Z", "pipelineExecutionDescription": "SomeDescription", "pipelineArn": "arn:aws:sagemaker:us-east-1:123456789012:pipeline/myPipeline-123", "pipelineExecutionArn": "arn:aws:sagemaker:us-east-1:123456789012:pipeline/myPipeline-123/execution/p4jn9xou8a8s" } }

管道步骤状态更改

指示 SageMaker 管道步骤的状态更改。

如果有缓存命中,该事件将包含cacheHitResult字段中返回的子位置类型。

如果currentStepStatusFailed,该事件包含failureReason字段,提供了有关步骤失败的原因的说明。

{ "version": "0", "id": "ea37ccbb-5e2b-05e9-4073-1daazc940304", "detail-type": "SageMaker Model Building Pipeline Execution Step Status Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2021-03-15T16:10:10Z", "region": "us-east-1", "resources": ["arn:aws:sagemaker:us-east-1:123456789012:pipeline/myPipeline-123", "arn:aws:sagemaker:us-east-1:123456789012:pipeline/myPipeline-123/execution/p4jn9xou8a8s"], "detail": { "metadata": { "processingJob": { "arn": "arn:aws:sagemaker:us-east-1:123456789012:processing-job/pipelines-p4jn9xou8a8s-myprocessingstep1-tmgxry49ug" } }, "stepStartTime": "2021-03-15T16:03:14Z", "stepEndTime": "2021-03-15T16:10:09Z", "stepName": "myprocessingstep1", "stepType": "Processing", "previousStepStatus": "Executing", "currentStepStatus": "Succeeded", "pipelineArn": "arn:aws:sagemaker:us-east-1:123456789012:pipeline/myPipeline-123", "pipelineExecutionArn": "arn:aws:sagemaker:us-east-1:123456789012:pipeline/myPipeline-123/execution/p4jn9xou8a8s" } }

SageMaker 映像状态更改

指示 SageMaker 映像的状态更改。

{ "version": "0", "id": "cee033a3-17d8-49f8-865f-b9ebf485d9ee", "detail-type": "SageMaker Image State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2021-04-29T01:29:59Z", "region": "us-east-1", "resources": ["arn:aws:sagemaker:us-west-2:123456789012:image/cee033a3-17d8-49f8-865f-b9ebf485d9ee"], "detail": { "ImageName": "cee033a3-17d8-49f8-865f-b9ebf485d9ee", "ImageArn": "arn:aws:sagemaker:us-west-2:123456789012:image/cee033a3-17d8-49f8-865f-b9ebf485d9ee", "ImageStatus": "Creating", "Version": 1.0, "Tags": {} } }

SageMaker 映像版本状态更改

指示 SageMaker 映像版本的状态更改。

{ "version": "0", "id": "07fc4615-ebd7-15fc-1746-243411f09f04", "detail-type": "SageMaker Image Version State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2021-04-29T01:29:59Z", "region": "us-east-1", "resources": ["arn:aws:sagemaker:us-west-2:123456789012:image-version/07800032-2d29-48b7-8f82-5129225b2a85"], "detail": { "ImageArn": "arn:aws:sagemaker:us-west-2:123456789012:image/a70ff896-c832-4fe8-add6-eba25a0f43e6", "ImageVersionArn": "arn:aws:sagemaker:us-west-2:123456789012:image-version/07800032-2d29-48b7-8f82-5129225b2a85", "ImageVersionStatus": "Creating", "Version": 1.0, "Tags": {} } }

有关 SageMaker 作业、终端节点和管道的状态值及其含义的更多信息,请参阅以下链接:

有关更多信息,请参阅 Amazon EventBridge 用户指南

端节点部署状态更改

重要

以下示例可能不适用于所有终端节点。有关可能排除终端节点的功能列表,请参阅Exclusions页.

表示终端节点部署的状态更改。以下示例显示了使用蓝色/绿色 Canary 部署进行更新的端点。

{ "version": "0", "id": "0bd4a141-0a02-9d8a-f977-3924c3fb259c", "detail-type": "SageMaker Endpoint Deployment State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2021-10-25T01:52:12Z", "region": "us-west-2", "resources": [ "arn:aws:sagemaker:us-west-2:651393343886:endpoint/sample-endpoint" ], "detail": { "EndpointName": "sample-endpoint", "EndpointArn": "arn:aws:sagemaker:us-west-2:651393343886:endpoint/sample-endpoint", "EndpointConfigName": "sample-endpoint-config-1", "ProductionVariants": [ { "VariantName": "AllTraffic", "CurrentWeight": 1, "DesiredWeight": 1, "CurrentInstanceCount": 3, "DesiredInstanceCount": 3 } ], "EndpointStatus": "UPDATING", "CreationTime": 1635195148181, "LastModifiedTime": 1635195148181, "Tags": {}, "PendingDeploymentSummary": { "EndpointConfigName": "sample-endpoint-config-2", "StartTime": Timestamp, "ProductionVariants": [ { "VariantName": "AllTraffic", "CurrentWeight": 1, "DesiredWeight": 1, "CurrentInstanceCount": 1, "DesiredInstanceCount": 3, "VariantStatus": [ { "Status": "Baking", "StatusMessage": "Baking for 600 seconds (TerminationWaitInSeconds) with traffic enabled on canary capacity of 1 instance(s).", "StartTime": 1635195269181, } ] } ] } } }

以下示例表示终端节点部署的状态更改,该部署正在使用现有终端节点配置上的新容量进行更新。

{ "version": "0", "id": "0bd4a141-0a02-9d8a-f977-3924c3fb259c", "detail-type": "SageMaker Endpoint Deployment State Change", "source": "aws.sagemaker", "account": "123456789012", "time": "2021-10-25T01:52:12Z", "region": "us-west-2", "resources": [ "arn:aws:sagemaker:us-west-2:651393343886:endpoint/sample-endpoint" ], "detail": { "EndpointName": "sample-endpoint", "EndpointArn": "arn:aws:sagemaker:us-west-2:651393343886:endpoint/sample-endpoint", "EndpointConfigName": "sample-endpoint-config-1", "ProductionVariants": [ { "VariantName": "AllTraffic", "CurrentWeight": 1, "DesiredWeight": 1, "CurrentInstanceCount": 3, "DesiredInstanceCount": 6, "VariantStatus": [ { "Status": "Updating", "StatusMessage": "Scaling out desired instance count to 6.", "StartTime": 1635195269181, } ] } ], "EndpointStatus": "UPDATING", "CreationTime": 1635195148181, "LastModifiedTime": 1635195148181, "Tags": {}, }

以下辅助部署状态也可用于终端节点(在VariantStatus对象。

  • Creating:为生产变体创建实例。

    示例消息:"Launching X instance(s)."

  • Deleting:终止生产变体的实例。

    示例消息:"Terminating X instance(s)."

  • Updating:更新生产变体的容量。

    示例消息:"Launching X instance(s).""Scaling out desired instance count to X."

  • ActivatingTraffic:启用生产变体的流量。

    示例消息:"Activating traffic on canary capacity of X instance(s)."

  • Baking:在自动回滚配置中监控 CloudWatch 警报的等待时间。

    示例消息:"Baking for X seconds (TerminationWaitInSeconds) with traffic enabled on full capacity of Y instance(s)."