先决条件 - Amazon SageMaker
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 Amazon Web Services 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

先决条件

注意

如果您使用编译模型,请按照本节中的说明操作Amazon SDK for Python (Boto3)、Amazon CLI,或者 SageMaker 控制台。

创建 SageMaker 新编译模型,你需要以下内容:

  1. Docker 镜像亚马逊 ECR URI。您可以从中选择满足您的需求的选择此列表.

  2. 入口点脚本文件:

    1. 适用于 PyTorch 和 MXNet 模型:

      如果你使用 SageMaker 训练模型,训练脚本必须实现下面描述的功能。在推理期间,训练脚本作为入口点脚本。在中详细介绍的示例中MNIST 使用 MXNet 模块进行 MNIST 训练、编译和部署 SageMaker 新,训练脚本(mnist.py) 实现所需的功能。

      如果你没有使用 SageMaker 训练模型,你需要提供入口点脚本 (inference.py) 推理时可以使用的文件。基于框架(MXNet 或 PyTORCHH)推理脚本位置必须符合 SageMaker Python 开发工具包MxNet 的模型目录结构要么PyTorch 的模型目录结构.

      当使用 Neo 推理优化容器镜像时PyTorchMXNet在 CPU 和 GPU 实例类型上,推理脚本必须实现以下功能:

      • model_fn:加载模型。(可选)

      • input_fn:将传入的请求有效负载转换为 numpy 数组。

      • predict_fn:执行预测。

      • output_fn:将预测输出转换为响应有效负载。

      • 或者,您可以定义transform_fn组合input_fnpredict_fn, 和output_fn.

      示例如下:inference.py名为的目录中的脚本code(code/inference.py) 对于PyTorch 和 MxNet(Gluon 和模块)。这些示例首先加载模型,然后在 GPU 上的图像数据上提供模型:

      MXNet Module
      import numpy as np import json import mxnet as mx import neomx # noqa: F401 from collections import namedtuple Batch = namedtuple('Batch', ['data']) # Change the context to mx.cpu() if deploying to a CPU endpoint ctx = mx.gpu() def model_fn(model_dir): # The compiled model artifacts are saved with the prefix 'compiled' sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0) mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None) exe = mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes) mod.set_params(arg_params, aux_params, allow_missing=True) # Run warm-up inference on empty data during model load (required for GPU) data = mx.nd.empty((1,3,224,224), ctx=ctx) mod.forward(Batch([data])) return mod def transform_fn(mod, image, input_content_type, output_content_type): # pre-processing decoded = mx.image.imdecode(image) resized = mx.image.resize_short(decoded, 224) cropped, crop_info = mx.image.center_crop(resized, (224, 224)) normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255, mean=mx.nd.array([0.485, 0.456, 0.406]), std=mx.nd.array([0.229, 0.224, 0.225])) transposed = normalized.transpose((2, 0, 1)) batchified = transposed.expand_dims(axis=0) casted = batchified.astype(dtype='float32') processed_input = casted.as_in_context(ctx) # prediction/inference mod.forward(Batch([processed_input])) # post-processing prob = mod.get_outputs()[0].asnumpy().tolist() prob_json = json.dumps(prob) return prob_json, output_content_type
      MXNet Gluon
      import numpy as np import json import mxnet as mx import neomx # noqa: F401 # Change the context to mx.cpu() if deploying to a CPU endpoint ctx = mx.gpu() def model_fn(model_dir): # The compiled model artifacts are saved with the prefix 'compiled' block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx) # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True block.hybridize(static_alloc=True, static_shape=True) # Run warm-up inference on empty data during model load (required for GPU) data = mx.nd.empty((1,3,224,224), ctx=ctx) warm_up = block(data) return block def input_fn(image, input_content_type): # pre-processing decoded = mx.image.imdecode(image) resized = mx.image.resize_short(decoded, 224) cropped, crop_info = mx.image.center_crop(resized, (224, 224)) normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255, mean=mx.nd.array([0.485, 0.456, 0.406]), std=mx.nd.array([0.229, 0.224, 0.225])) transposed = normalized.transpose((2, 0, 1)) batchified = transposed.expand_dims(axis=0) casted = batchified.astype(dtype='float32') processed_input = casted.as_in_context(ctx) return processed_input def predict_fn(processed_input_data, block): # prediction/inference prediction = block(processed_input_data) return prediction def output_fn(prediction, output_content_type): # post-processing prob = prediction.asnumpy().tolist() prob_json = json.dumps(prob) return prob_json, output_content_type
      PyTorch 1.4 and Older
      import os import torch import torch.nn.parallel import torch.optim import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from PIL import Image import io import json import pickle def model_fn(model_dir): """Load the model and return it. Providing this function is optional. There is a default model_fn available which will load the model compiled using SageMaker Neo. You can override it here. Keyword arguments: model_dir -- the directory path where the model artifacts are present """ # The compiled model is saved as "compiled.pt" model_path = os.path.join(model_dir, 'compiled.pt') with torch.neo.config(model_dir=model_dir, neo_runtime=True): model = torch.jit.load(model_path) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) # We recommend that you run warm-up inference during model load sample_input_path = os.path.join(model_dir, 'sample_input.pkl') with open(sample_input_path, 'rb') as input_file: model_input = pickle.load(input_file) if torch.is_tensor(model_input): model_input = model_input.to(device) model(model_input) elif isinstance(model_input, tuple): model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp)) model(*model_input) else: print("Only supports a torch tensor or a tuple of torch tensors") return model def transform_fn(model, request_body, request_content_type, response_content_type): """Run prediction and return the output. The function 1. Pre-processes the input request 2. Runs prediction 3. Post-processes the prediction output. """ # preprocess decoded = Image.open(io.BytesIO(request_body)) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[ 0.485, 0.456, 0.406], std=[ 0.229, 0.224, 0.225]), ]) normalized = preprocess(decoded) batchified = normalized.unsqueeze(0) # predict device = torch.device("cuda" if torch.cuda.is_available() else "cpu") batchified = batchified.to(device) output = model.forward(batchified) return json.dumps(output.cpu().numpy().tolist()), response_content_type
      PyTorch 1.5 and Newer

      使用neopytorch以指定模型的文件路径(如果你训练过) PyTorch 版本 1.5 及更新。

      import os import torch import neopytorch import torch.nn.parallel import torch.optim import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from PIL import Image import io import json import pickle def model_fn(model_dir): """Load the model and return it. Providing this function is optional. There is a default_model_fn available, which will load the model compiled using SageMaker Neo. You can override the default here. The model_fn only needs to be defined if your model needs extra steps to load, and can otherwise be left undefined. Keyword arguments: model_dir -- the directory path where the model artifacts are present """ # The compiled model is saved as "compiled.pt" model_path = os.path.join(model_dir, 'compiled.pt') neopytorch.config(model_dir=model_dir,neo_runtime=True) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = torch.jit.load(model_path, map_location=device) model = model.to(device) # We recommend that you run warm-up inference during model load sample_input_path = os.path.join(model_dir, 'sample_input.pkl') with open(sample_input_path, 'rb') as input_file: model_input = pickle.load(input_file) if torch.is_tensor(model_input): model_input = model_input.to(device) model(model_input) elif isinstance(model_input, tuple): model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp)) model(*model_input) else: print("Only supports a torch tensor or a tuple of torch tensors") return model def transform_fn(model, request_body, request_content_type, response_content_type): """Run prediction and return the output. The function 1. Pre-processes the input request 2. Runs prediction 3. Post-processes the prediction output. """ # preprocess decoded = Image.open(io.BytesIO(request_body)) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[ 0.485, 0.456, 0.406], std=[ 0.229, 0.224, 0.225]), ]) normalized = preprocess(decoded) batchified = normalized.unsqueeze(0) # predict device = torch.device("cuda" if torch.cuda.is_available() else "cpu") batchified = batchified.to(device) output = model.forward(batchified) return json.dumps(output.cpu().numpy().tolist()), response_content_type
    2. 对于 INF1 实例或 onnx、xgboost、keras 容器镜像

      对于所有其他 Neo Infference 优化的容器映像或推断实例类型,入口点脚本必须为 Neo 深度学习运行时实现以下功能:

      • neo_preprocess:将传入的请求有效负载转换为 numpy 数组。

      • neo_postprocess:将 Neo 深度学习运行时的预测输出转换为响应正文。

        注意

        前面的两个函数不使用 MXNet、PyTorch 或 TensorFlow 的任何功能。

      有关如何使用这些函数的示例,请参阅Neo 模型编译示例笔记本.

    3. 适用于 TensorFlow 模型

      如果模型在将数据发送到模型之前需要自定义的预处理和后处理逻辑,则必须指定入口点脚本inference.py推理时可以使用的文件。脚本应该实现任一对input_handleroutput_handler函数或单个处理函数。

      注意

      请注意,如果实现了处理函数,input_handleroutput_handler将忽略。

      以下是代码示例inference.py脚本可以与编译模型一起组合在一起,以便对图像分类模型执行自定义的预处理和后处理。这些区域有: SageMaker 客户端将图像文件作为application/x-image将内容类型转换为input_handler函数,其中它被转换为 JSON。然后将转换后的图像文件发送到Tensorflow 模型服务器 (TFX)使用 REST API。

      import json import numpy as np import json import io from PIL import Image def input_handler(data, context): """ Pre-process request input before it is sent to TensorFlow Serving REST API Args: data (obj): the request data, in format of dict or string context (Context): an object containing request and configuration details Returns: (dict): a JSON-serializable dict that contains request body and headers """ f = data.read() f = io.BytesIO(f) image = Image.open(f).convert('RGB') batch_size = 1 image = np.asarray(image.resize((512, 512))) image = np.concatenate([image[np.newaxis, :, :]] * batch_size) body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()}) return body def output_handler(data, context): """Post-process TensorFlow Serving output before it is returned to the client. Args: data (obj): the TensorFlow serving response context (Context): an object containing request and configuration details Returns: (bytes, string): data to return to client, response content type """ if data.status_code != 200: raise ValueError(data.content.decode('utf-8')) response_content_type = context.accept_header prediction = data.content return prediction, response_content_type

      如果没有自定义的预处理或后处理,则 SageMaker 客户端先以类似的方式将文件图像转换为 JSON,然后再将其发送到 SageMaker 终端节点。

      有关更多信息,请参阅 。部署到 TensorFlow 在中提供终端节点 SageMaker Python 开发工具包.

  3. 包含已编译模型构件的 Amazon S3 存储桶 URI。