Prerequisites - Amazon SageMaker
AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 AWS 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

Prerequisites

注意

如果您使用 适用于 Python (Boto3) 的 AWS 软件开发工具包、 或 AWS CLI 控制台编译了模型SageMaker,请按照本部分中的说明操作。

要创建 SageMaker Neo 编译的模型,您需要以下内容:

  1. Docker 映像 Amazon ECR URI。您可以从此列表中选择一个满足您的需求的存储桶。

  2. 入口点脚本文件:

    1. 对于 PyTorch和 MXNet 模型:

      如果您使用 SageMaker 训练了模型,则训练脚本必须实现下面所述的函数。在推理过程中,训练脚本用作入口点脚本。在包含 MXNet 模块和 Neo SageMaker的 MNIST 训练、编译和部署中详细介绍的示例中,训练脚本 (mnist.py) 实现所需的函数。

      如果您未使用 SageMaker 训练模型,则需要提供可在推理时使用的入口点脚本 (inference.py) 文件。根据 —MXNet 或 PyTorch— 推理脚本位置必须符合适用于 MxNetSageMaker 的 Python 开发工具包模型目录结构或适用于 PyTorch Model Directory 结构。

      在 CPU 和 GPU 实例类型上将 Neo Inference Optimized Container 映像与 PyTorchMXNet 一起使用时,推理脚本必须实现以下函数:

      • model_fn:加载模型。

      • input_fn:将传入请求负载转换为 numpy 数组。

      • predict_fn:执行预测。

      • output_fn:将预测输出转换为响应负载。

      • 或者,您可以定义 transform_fn 以组合 input_fnpredict_fnoutput_fn

      以下是名为 inference.pycode) 的目录中 code/inference.pyPyTorch 和 MXNet 和模块)的脚本示例。这些示例首先加载模型,然后在 GPU 上的图像数据中提供模型:

      MXNet Module
      import numpy as np import json import mxnet as mx import neomxnet # noqa: F401 from collections import namedtuple Batch = namedtuple('Batch', ['data']) # Change the context to mx.cpu() if deploying to a CPU endpoint ctx = mx.gpu() def model_fn(model_dir): # The compiled model artifacts are saved with the prefix 'compiled' sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0) mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None) exe = mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes) mod.set_params(arg_params, aux_params, allow_missing=True) # Run warm-up inference on empty data during model load (required for GPU) data = mx.nd.empty((1,3,224,224), ctx=ctx) mod.forward(Batch([data])) return mod def transform_fn(mod, image, input_content_type, output_content_type): # pre-processing decoded = mx.image.imdecode(image) resized = mx.image.resize_short(decoded, 224) cropped, crop_info = mx.image.center_crop(resized, (224, 224)) normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255, mean=mx.nd.array([0.485, 0.456, 0.406]), std=mx.nd.array([0.229, 0.224, 0.225])) transposed = normalized.transpose((2, 0, 1)) batchified = transposed.expand_dims(axis=0) casted = batchified.astype(dtype='float32') processed_input = casted.as_in_context(ctx) # prediction/inference mod.forward(Batch([processed_input])) # post-processing prob = mod.get_outputs()[0].asnumpy().tolist() prob_json = json.dumps(prob) return prob_json, output_content_type
      MXNet Gluon
      import numpy as np import json import mxnet as mx import neomxnet # noqa: F401 # Change the context to mx.cpu() if deploying to a CPU endpoint ctx = mx.gpu() def model_fn(model_dir): # The compiled model artifacts are saved with the prefix 'compiled' block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx) # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True block.hybridize(static_alloc=True, static_shape=True) # Run warm-up inference on empty data during model load (required for GPU) data = mx.nd.empty((1,3,224,224), ctx=ctx) warm_up = block(data) return block def input_fn(image, input_content_type): # pre-processing decoded = mx.image.imdecode(image) resized = mx.image.resize_short(decoded, 224) cropped, crop_info = mx.image.center_crop(resized, (224, 224)) normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255, mean=mx.nd.array([0.485, 0.456, 0.406]), std=mx.nd.array([0.229, 0.224, 0.225])) transposed = normalized.transpose((2, 0, 1)) batchified = transposed.expand_dims(axis=0) casted = batchified.astype(dtype='float32') processed_input = casted.as_in_context(ctx) return processed_input def predict_fn(processed_input_data, block): # prediction/inference prediction = block(processed_input_data) return prediction def output_fn(prediction, output_content_type): # post-processing prob = prediction.asnumpy().tolist() prob_json = json.dumps(prob) return prob_json, output_content_type
      PyTorch 1.4 and Older
      import os import torch import torch.nn.parallel import torch.optim import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from PIL import Image import io import json import pickle def model_fn(model_dir): """Load the model and return it. Providing this function is optional. There is a default model_fn available which will load the model compiled using SageMaker Neo. You can override it here. Keyword arguments: model_dir -- the directory path where the model artifacts are present """ # The compiled model is saved as "compiled.pt" model_path = os.path.join(model_dir, 'compiled.pt') with torch.neo.config(model_dir=model_dir, neo_runtime=True): model = torch.jit.load(model_path) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) # We recommend that you run warm-up inference during model load sample_input_path = os.path.join(model_dir, 'sample_input.pkl') with open(sample_input_path, 'rb') as input_file: model_input = pickle.load(input_file) if torch.is_tensor(model_input): model_input = model_input.to(device) model(model_input) elif isinstance(model_input, tuple): model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp)) model(*model_input) else: print("Only supports a torch tensor or a tuple of torch tensors") return model def transform_fn(model, request_body, request_content_type, response_content_type): """Run prediction and return the output. The function 1. Pre-processes the input request 2. Runs prediction 3. Post-processes the prediction output. """ # preprocess decoded = Image.open(io.BytesIO(request_body)) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[ 0.485, 0.456, 0.406], std=[ 0.229, 0.224, 0.225]), ]) normalized = preprocess(decoded) batchified = normalized.unsqueeze(0) # predict device = torch.device("cuda" if torch.cuda.is_available() else "cpu") batchified = batchified.to(device) output = model.forward(batchified) return json.dumps(output.cpu().numpy().tolist()), response_content_type
      PyTorch 1.5 and Newer

      如果您使用 PyTorch 版本 1.5 和更高版本训练,请使用 neopytorch 指定模型的文件路径。

      import os import torch import neopytorch import torch.nn.parallel import torch.optim import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms from PIL import Image import io import json import pickle def model_fn(model_dir): """Load the model and return it. Providing this function is optional. There is a default model_fn available which will load the model compiled using SageMaker Neo. You can override it here. Keyword arguments: model_dir -- the directory path where the model artifacts are present """ # The compiled model is saved as "compiled.pt" model_path = os.path.join(model_dir, 'compiled.pt') neopytorch.config(model_dir=model_dir,neo_runtime=True) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = torch.jit.load(model_path, map_location=device) model = model.to(device) # We recommend that you run warm-up inference during model load sample_input_path = os.path.join(model_dir, 'sample_input.pkl') with open(sample_input_path, 'rb') as input_file: model_input = pickle.load(input_file) if torch.is_tensor(model_input): model_input = model_input.to(device) model(model_input) elif isinstance(model_input, tuple): model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp)) model(*model_input) else: print("Only supports a torch tensor or a tuple of torch tensors") return model def transform_fn(model, request_body, request_content_type, response_content_type): """Run prediction and return the output. The function 1. Pre-processes the input request 2. Runs prediction 3. Post-processes the prediction output. """ # preprocess decoded = Image.open(io.BytesIO(request_body)) preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[ 0.485, 0.456, 0.406], std=[ 0.229, 0.224, 0.225]), ]) normalized = preprocess(decoded) batchified = normalized.unsqueeze(0) # predict device = torch.device("cuda" if torch.cuda.is_available() else "cpu") batchified = batchified.to(device) output = model.forward(batchified) return json.dumps(output.cpu().numpy().tolist()), response_content_type
    2. 对于 inf1 实例或 onnx、xgboost、keras 容器映像

      对于所有其他 Neo 推理优化的容器映像或推理实例类型,入口点脚本必须为 Neo 深度学习运行时实施以下函数:

      • neo_preprocess:将传入请求负载转换为 numpy 数组。

      • neo_postprocess:将预测输出从 Neo 深度学习运行时转换为响应正文。

        注意

        前两个函数不使用 MXNet、PyTorch 或 TensorFlow的任何功能。

      有关如何使用这些函数的示例,请参阅 Neo 模型编译示例笔记本

    3. 对于 TensorFlow 模型

      如果您的模型在将数据发送到模型之前需要自定义预处理逻辑和后处理逻辑,则必须指定一个可在推理时使用的入口点脚本inference.py文件。该脚本应实施一对 input_handleroutput_handler 函数或单个处理程序函数。

      注意

      请注意,如果实施了处理程序函数,则 input_handleroutput_handler 将被忽略。

      以下是inference.py脚本的代码示例,您可以将其与编译模型放在一起,以在图像分类模型上执行自定义的预处理和后处理。SageMaker 客户端将图像文件作为 application/x-image 内容类型发送到 input_handler 函数,在其中将它转换为 JSON。然后,使用 REST API 将转换后的图像文件发送到 Tensorflow Model Server (TFX)。

      import json import numpy as np import json import io from PIL import Image def input_handler(data, context): """ Pre-process request input before it is sent to TensorFlow Serving REST API Args: data (obj): the request data, in format of dict or string context (Context): an object containing request and configuration details Returns: (dict): a JSON-serializable dict that contains request body and headers """ f = data.read() f = io.BytesIO(f) image = Image.open(f).convert('RGB') batch_size = 1 image = np.asarray(image.resize((512, 512))) image = np.concatenate([image[np.newaxis, :, :]] * batch_size) body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()}) return body def output_handler(data, context): """Post-process TensorFlow Serving output before it is returned to the client. Args: data (obj): the TensorFlow serving response context (Context): an object containing request and configuration details Returns: (bytes, string): data to return to client, response content type """ if data.status_code != 200: raise ValueError(data.content.decode('utf-8')) response_content_type = context.accept_header prediction = data.content return prediction, response_content_type

      如果没有自定义预处理或后处理SageMaker,客户端会在将文件映像发送到SageMaker终端节点之前以类似的方式将其转换为 JSON。

      有关更多信息,请参阅 SageMaker Python 开发工具包中的部署到 TensorFlow 服务终端节点。

  3. 包含已编译的模型构件的Amazon S3存储桶 URI。