批量预测 - Amazon SageMaker
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

批量预测

批量预测,也称为离线推理,可根据一批观察数据生成模型预测。对于大型数据集或者在您不需要立即响应模型预测请求时,批量推理是很好的选择。

与之对比的是,在线推理(实时推理)会实时生成预测。

您可以使用 SageMaker APIs检索 AutoML 作业的最佳候选对象,然后使用该候选任务提交一批输入数据进行推理。

  1. 检索 AutoML 作业的详细信息。

    以下 Amazon CLI 命令示例使用DescribeAutoMLJobV2API来获取 AutoML 作业的详细信息,包括有关最佳候选模型的信息。

    aws sagemaker describe-auto-ml-job-v2 --auto-ml-job-name job-name --region region
  2. 从中提取容器定义InferenceContainers以获得最佳候选模型。

    容器定义是容器化环境,用于托管经过训练的 SageMaker 模型以进行预测。

    BEST_CANDIDATE=$(aws sagemaker describe-auto-ml-job-v2 \ --auto-ml-job-name job-name --region region \ --query 'BestCandidate.InferenceContainers[0]' \ --output json

    此命令提取最佳候选模型的容器定义并将其存储在BEST_CANDIDATE变量中。

  3. 使用最佳候选容器定义创建 SageMaker 模型。

    使用前面步骤中的容器定义通过使用创建 SageMaker 模型CreateModelAPI。

    aws sagemaker create-model \ --model-name 'model-name' \ --primary-container "$BEST_CANDIDATE" --execution-role-arn 'execution-role-arn>' \ --region 'region>

    --execution-role-arn参数指定使用模型进行推理时所IAM扮演的角色。 SageMaker有关该角色所需权限的详细信息,请参阅 CreateModel API:执行角色权限

  4. 创建批量转换作业。

    以下示例使用创建转换作业CreateTransformJobAPI。

    aws sagemaker create-transform-job \ --transform-job-name 'transform-job-name' \ --model-name 'model-name'\ --transform-input file://transform-input.json \ --transform-output file://transform-output.json \ --transform-resources file://transform-resources.json \ --region 'region'

    输入、输出和资源详细信息在不同的JSON文件中定义:

    • transform-input.json:

      { "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://my-input-data-bucket/path/to/input/data" } }, "ContentType": "text/csv", "SplitType": "None" }
    • transform-output.json:

      { "S3OutputPath": "s3://my-output-bucket/path/to/output", "AssembleWith": "Line" }
    • transform-resources.json:

      { "InstanceType": "instance-type", "InstanceCount": 1 }
  5. 使用监控转换作业的进度DescribeTransformJobAPI。

    以以下 Amazon CLI 命令为例。

    aws sagemaker describe-transform-job \ --transform-job-name 'transform-job-name' \ --region region
  6. 检索批量转换输出。

    作业完成后,预测结果可在中找到S3OutputPath

    输出文件名称格式如下:input_data_file_name.out。例如,如果您的输入文件是 text_x.csv,则输出文件名称是 text_x.csv.out

    aws s3 ls s3://my-output-bucket/path/to/output/

以下代码示例说明了如何使用用 Amazon SDK于 Python (boto3) 和 Amazon CLI 用于批量预测。

Amazon SDK for Python (boto3)

以下示例使用 Amazon SDKPython (boto3) 进行批量预测。

import sagemaker import boto3 session = sagemaker.session.Session() sm_client = boto3.client('sagemaker', region_name='us-west-2') role = 'arn:aws:iam::1234567890:role/sagemaker-execution-role' output_path = 's3://test-auto-ml-job/output' input_data = 's3://test-auto-ml-job/test_X.csv' best_candidate = sm_client.describe_auto_ml_job_v2(AutoMLJobName=job_name)['BestCandidate'] best_candidate_containers = best_candidate['InferenceContainers'] best_candidate_name = best_candidate['CandidateName'] # create model reponse = sm_client.create_model( ModelName = best_candidate_name, ExecutionRoleArn = role, Containers = best_candidate_containers ) # Lauch Transform Job response = sm_client.create_transform_job( TransformJobName=f'{best_candidate_name}-transform-job', ModelName=model_name, TransformInput={ 'DataSource': { 'S3DataSource': { 'S3DataType': 'S3Prefix', 'S3Uri': input_data } }, 'ContentType': "text/csv", 'SplitType': 'None' }, TransformOutput={ 'S3OutputPath': output_path, 'AssembleWith': 'Line', }, TransformResources={ 'InstanceType': 'ml.m5.2xlarge', 'InstanceCount': 1, }, )

批量推理作业返回以下格式的响应。

{'TransformJobArn': 'arn:aws:sagemaker:us-west-2:1234567890:transform-job/test-transform-job', 'ResponseMetadata': {'RequestId': '659f97fc-28c4-440b-b957-a49733f7c2f2', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '659f97fc-28c4-440b-b957-a49733f7c2f2', 'content-type': 'application/x-amz-json-1.1', 'content-length': '96', 'date': 'Thu, 11 Aug 2022 22:23:49 GMT'}, 'RetryAttempts': 0}}
Amazon Command Line Interface (Amazon CLI)
  1. 获取最佳候选容器定义

    aws sagemaker describe-auto-ml-job-v2 --auto-ml-job-name 'test-automl-job' --region us-west-2
  2. 创建模型

    aws sagemaker create-model --model-name 'test-sagemaker-model' --containers '[{ "Image": "348316444620.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3", "ModelDataUrl": "s3://amzn-s3-demo-bucket/out/test-job1/data-processor-models/test-job1-dpp0-1-e569ff7ad77f4e55a7e549a/output/model.tar.gz", "Environment": { "AUTOML_SPARSE_ENCODE_RECORDIO_PROTOBUF": "1", "AUTOML_TRANSFORM_MODE": "feature-transform", "SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "application/x-recordio-protobuf", "SAGEMAKER_PROGRAM": "sagemaker_serve", "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code" } }, { "Image": "348316444620.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.3-1-cpu-py3", "ModelDataUrl": "s3://amzn-s3-demo-bucket/out/test-job1/tuning/flicdf10v2-dpp0-xgb/test-job1E9-244-7490a1c0/output/model.tar.gz", "Environment": { "MAX_CONTENT_LENGTH": "20971520", "SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv", "SAGEMAKER_INFERENCE_OUTPUT": "predicted_label", "SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,probabilities" } }, { "Image": "348316444620.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sklearn-automl:2.5-1-cpu-py3", "ModelDataUrl": "s3://amzn-s3-demo-bucket/out/test-job1/data-processor-models/test-job1-dpp0-1-e569ff7ad77f4e55a7e549a/output/model.tar.gz", "Environment": { "AUTOML_TRANSFORM_MODE": "inverse-label-transform", "SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT": "text/csv", "SAGEMAKER_INFERENCE_INPUT": "predicted_label", "SAGEMAKER_INFERENCE_OUTPUT": "predicted_label", "SAGEMAKER_INFERENCE_SUPPORTED": "predicted_label,probability,labels,probabilities", "SAGEMAKER_PROGRAM": "sagemaker_serve", "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code" } }]' \ --execution-role-arn 'arn:aws:iam::1234567890:role/sagemaker-execution-role' \ --region 'us-west-2'
  3. 创建转换作业

    aws sagemaker create-transform-job --transform-job-name 'test-tranform-job'\ --model-name 'test-sagemaker-model'\ --transform-input '{ "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://amzn-s3-demo-bucket/data.csv" } }, "ContentType": "text/csv", "SplitType": "None" }'\ --transform-output '{ "S3OutputPath": "s3://amzn-s3-demo-bucket/output/", "AssembleWith": "Line" }'\ --transform-resources '{ "InstanceType": "ml.m5.2xlarge", "InstanceCount": 1 }'\ --region 'us-west-2'
  4. 检查转换作业的进度

    aws sagemaker describe-transform-job --transform-job-name 'test-tranform-job' --region us-west-2

    以下是来自转换作业的响应。

    { "TransformJobName": "test-tranform-job", "TransformJobArn": "arn:aws:sagemaker:us-west-2:1234567890:transform-job/test-tranform-job", "TransformJobStatus": "InProgress", "ModelName": "test-model", "TransformInput": { "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://amzn-s3-demo-bucket/data.csv" } }, "ContentType": "text/csv", "CompressionType": "None", "SplitType": "None" }, "TransformOutput": { "S3OutputPath": "s3://amzn-s3-demo-bucket/output/", "AssembleWith": "Line", "KmsKeyId": "" }, "TransformResources": { "InstanceType": "ml.m5.2xlarge", "InstanceCount": 1 }, "CreationTime": 1662495635.679, "TransformStartTime": 1662495847.496, "DataProcessing": { "InputFilter": "$", "OutputFilter": "$", "JoinSource": "None" } }

    TransformJobStatus 更改为 Completed 后,您可以在中 S3OutputPath 查看推理结果。