调整你自己的推理容器以适应 Amazon AI SageMaker

如果您无法将预先构建的 SageMaker AI Docker 镜像 Amazon A SageMaker I 中列出的任何图像用于您的用例，则可以构建自己的 Docker 容器，然后在 SageMaker AI 中使用它进行训练和推理。为了与 SageMaker AI 兼容，您的容器必须具有以下特征：

您的容器必须在 8080 端口列出网络服务器。
您的容器必须接受向 /invocations 和 /ping 实时端点发出的 POST 请求。您向这些端点发送的请求必须在 60 秒内返回，且最大容量为 6 MB。

要了解更多信息以及如何构建自己的 Docker 容器以便使用 SageMaker AI 进行训练和推理的示例，请参阅构建自己的算法容器。

以下指南向您展示了如何在 Amazon SageMaker Studio Classic 中使用JupyterLab空间来调整推理容器以使用 SageMaker AI 托管。该示例使用了 NGINX 网络服务器，Gunicorn 作为 Python Web 服务器网关接口，以及 Flask 作为 Web 应用程序框架。您只要符合前面列出的要求，就可以使用不同的应用程序来调整您的容器。有关使用自己的推理代码的更多信息，请参阅自定义托管服务的推理代码。

调整您的推理容器

使用以下步骤调整您自己的推理容器以使用 SageMaker AI 托管。以下步骤中的示例使用了预先训练好的命名实体识别 (NER) 模型，此模型使用了 spaCy 自然语言处理（NLP）库，进行 Python 和以下操作：

A Dockerfile 来构建包含以下内容的容器 NER 模型。
用于服务的推理脚本 NER 模型。

如果您根据自己的用例调整此示例，则必须使用 Dockerfile 以及部署和提供模型所需的推理脚本。

使用 Amazon SageMaker Studio Classic（可选）创建 JupyterLab 空间。

您可以使用任何笔记本来运行脚本，以便通过 SageMaker AI 托管来调整您的推理容器。此示例向您展示如何使用 JupyterLab Amazon SageMaker Studio Classic 中的空间可以启动 JupyterLab 附带 SageMaker AI 分发映像的应用程序。有关更多信息，请参阅 SageMaker JupyterLab。

上传一个 Docker 文件和推理脚本。

在您的主目录中创建一个新文件夹。如果你正在使用 JupyterLab，在左上角，选择 “新建文件夹” 图标，然后输入包含您的文件夹的名称 Dockerfile。在此示例中，该文件夹名为docker_test_folder。
上传一个 Dockerfile 将文本文件放入您的新文件夹。以下是示例 Dockerfile 这会创建一个 Docker 包含来自 SpaCy 的预训练命名实体识别 (NER) 模型的容器，以及运行该示例所需的应用程序和环境变量：
```
FROM python:3.8

RUN apt-get -y update && apt-get install -y --no-install-recommends \
         wget \
         python3 \
         nginx \
         ca-certificates \
    && rm -rf /var/lib/apt/lists/*

RUN wget https://bootstrap.pypa.io/get-pip.py && python3 get-pip.py && \
    pip install flask gevent gunicorn && \
        rm -rf /root/.cache

#pre-trained model package installation
RUN pip install spacy
RUN python -m spacy download en


# Set environment variables
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

COPY NER /opt/program
WORKDIR /opt/program
```
在前面的代码示例中，环境变量PYTHONUNBUFFERED保持 Python 从缓冲标准输出流，这样可以更快地向用户传送日志。环境变量PYTHONDONTWRITEBYTECODE保持 Python 从编写已编译的字节码.pyc文件开始，对于这个用例来说，这些文件是不必要的。环境变量 PATH 用于在调用容器时标识 train 和 serve 程序的位置。
在新文件夹内创建一个新目录，其中包含为模型提供服务的脚本。本示例使用名为 NER 的目录，其中包含运行本示例所需的以下脚本：
- predictor.py— A Python 脚本，其中包含用于加载模型并对模型执行推理的逻辑。
- nginx.conf：用于配置网络服务器的脚本。
- serve：启动推理服务器的脚本。
- wsgi.py：用于为模型提供服务的辅助脚本。
重要
如果您将推理脚本复制到以 .ipynb 结尾的笔记本中并重新命名，则脚本中可能会包含格式化字符，从而导致端点无法部署。而是创建一个文本文件并对其进行重命名。

上传脚本，使您的模型可用于推理。以下是一个名为的示例脚本predictor.py，它使用了 Flask 要提供/ping和/invocations端点，请执行以下操作：


from flask import Flask
import flask
import spacy
import os
import json
import logging

#Load in model
nlp = spacy.load('en_core_web_sm') 
#If you plan to use a your own model artifacts, 
#your model artifacts should be stored in /opt/ml/model/ 


# The flask app for serving predictions
app = Flask(__name__)
@app.route('/ping', methods=['GET'])
def ping():
    # Check if the classifier was loaded correctly
    health = nlp is not None
    status = 200 if health else 404
    return flask.Response(response= '\n', status=status, mimetype='application/json')


@app.route('/invocations', methods=['POST'])
def transformation():
    
    #Process input
    input_json = flask.request.get_json()
    resp = input_json['input']
    
    #NER
    doc = nlp(resp)
    entities = [(X.text, X.label_) for X in doc.ents]

    # Transform predictions to JSON
    result = {
        'output': entities
        }

    resultjson = json.dumps(result)
    return flask.Response(response=resultjson, status=200, mimetype='application/json')

如果模型加载正确，前面的脚本示例中的 /ping 端点会返回状态代码 200；如果模型加载错误，则返回状态代码 404。/invocations端点处理格式为 JSON，提取输入字段，然后使用 NER 模型用于识别和存储可变实体中的实体。这些区域有：Flask 应用程序返回包含这些实体的响应。有关这些必要运行状况正常要求的更多信息，请参阅容器应如何响应运行状况检查 (Ping) 请求。

上传脚本以启动推理服务器。以下脚本示例serve使用调用 Gunicorn 作为应用程序服务器，以及 Nginx 作为 Web 服务器：


#!/usr/bin/env python

# This file implements the scoring service shell. You don't necessarily need to modify it for various
# algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until
# gunicorn exits.
#
# The flask server is specified to be the app object in wsgi.py
#
# We set the following parameters:
#
# Parameter                Environment Variable              Default Value
# ---------                --------------------              -------------
# number of workers        MODEL_SERVER_WORKERS              the number of CPU cores
# timeout                  MODEL_SERVER_TIMEOUT              60 seconds

import multiprocessing
import os
import signal
import subprocess
import sys

cpu_count = multiprocessing.cpu_count()

model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT', 60)
model_server_workers = int(os.environ.get('MODEL_SERVER_WORKERS', cpu_count))

def sigterm_handler(nginx_pid, gunicorn_pid):
    try:
        os.kill(nginx_pid, signal.SIGQUIT)
    except OSError:
        pass
    try:
        os.kill(gunicorn_pid, signal.SIGTERM)
    except OSError:
        pass

    sys.exit(0)

def start_server():
    print('Starting the inference server with {} workers.'.format(model_server_workers))


    # link the log streams to stdout/err so they will be logged to the container logs
    subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
    subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])

    nginx = subprocess.Popen(['nginx', '-c', '/opt/program/nginx.conf'])
    gunicorn = subprocess.Popen(['gunicorn',
                                 '--timeout', str(model_server_timeout),
                                 '-k', 'sync',
                                 '-b', 'unix:/tmp/gunicorn.sock',
                                 '-w', str(model_server_workers),
                                 'wsgi:app'])

    signal.signal(signal.SIGTERM, lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid))

    # Exit the inference server upon exit of either subprocess
    pids = set([nginx.pid, gunicorn.pid])
    while True:
        pid, _ = os.wait()
        if pid in pids:
            break

    sigterm_handler(nginx.pid, gunicorn.pid)
    print('Inference server exiting')

# The main routine to invoke the start function.

if __name__ == '__main__':
    start_server()

前面的脚本示例定义了一个信号处理函数sigterm_handler，它会关闭 Nginx 以及 Gunicorn 当它收到SIGTERM信号时进行子处理。一个start_server函数启动信号处理器，启动并监视 Nginx 以及 Gunicorn 子处理，并捕获日志流。

上传脚本以配置您的网络服务器。以下名nginx.conf为的脚本示例配置了 Nginx Web 服务器使用 Gunicorn 作为应用服务器，为您的模型提供推理：


worker_processes 1;
daemon off; # Prevent forking


pid /tmp/nginx.pid;
error_log /var/log/nginx/error.log;

events {
  # defaults
}

http {
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  access_log /var/log/nginx/access.log combined;
  
  upstream gunicorn {
    server unix:/tmp/gunicorn.sock;
  }

  server {
    listen 8080 deferred;
    client_max_body_size 5m;

    keepalive_timeout 5;
    proxy_read_timeout 1200s;

    location ~ ^/(ping|invocations) {
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $http_host;
      proxy_redirect off;
      proxy_pass http://gunicorn;
    }

    location / {
      return 404 "{}";
    }
  }
}

前面的脚本示例配置 Nginx 要在前台运行，请设置捕获的位置error_log，然后将其定义为 upstream Gunicorn 服务器的套接字袜子。服务器会配置服务器块以监听 8080 端口，并设置客户端请求正文大小和超时值的限制。服务器块将包含/ping或/invocations路径的请求转发到 Gunicorn server http://gunicorn，并返回其他路径的404错误。

上传为模型提供服务所需的任何其他脚本。此示例需要调用以下示例脚本wsgi.py来提供帮助 Gunicorn 查找您的应用程序：


import predictor as myapp

# This is just a simple wrapper for gunicorn to find your app.
# If you want to change the algorithm file, simply change "predictor" above to the
# new file.

app = myapp.app

在该文件夹中docker_test_folder，您的目录结构应包含一个 Dockerfile 还有文件夹 NER。的 NER 文件夹应包含文件nginx.conf、predictor.py、和serve，wsgi.py如下所示：

The Dockerfile structure has inference scripts under the NER directory next to the Dockerfile.

构建自己的容器。

从该文件夹docker_test_folder中构建你的 Docker 容器。以下示例命令将构建 Docker 在您的中配置的容器 Dockerfile:
```
! docker build -t byo-container-test .
```
前面的命令将在当前工作目录下创建一个名为 byo-container-test 的容器。有关该的更多信息 Docker 生成参数，请参阅生成参数。
注意
如果你收到以下错误消息 Docker 找不到 Dockerfile，请确保 Dockerfile 名字正确，已保存到目录中。
```
unable to prepare context: unable to evaluate symlinks in Dockerfile path:
lstat /home/ec2-user/SageMaker/docker_test_folder/Dockerfile: no such file or directory
```
Docker 查找一个专门名为的文件 Dockerfile 当前目录中没有任何扩展名。如果您将其改为其他名称，可以使用 -f 标记手动传入文件名。例如，如果你把你的名字命名为 Dockerfile 如同 Dockerfile-text.txt，建造你的 Docker 容器使用标-f志，然后是你的文件，如下所示：
```
! docker build -t byo-container-test -f Dockerfile-text.txt .
```

推你的 Docker 亚马逊弹性容器注册表 (Amazon ECR) 的图片

在笔记本手机中，按下你的 Docker 镜像到 ECR。下面的代码示例显示了如何在本地构建容器、登录并将其推送到 ECR：


%%sh
# Name of algo -> ECR
algorithm_name=sm-pretrained-spacy

#make serve executable
chmod +x NER/serve
account=$(aws sts get-caller-identity --query Account --output text)
# Region, defaults to us-west-2
region=$(aws configure get region)
region=${region:-us-east-1}
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"
# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1
if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/nullfi
# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}
# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

前面的示例显示了如何执行以下必要步骤，将示例 Docker 容器推送到 ECR：

将算法名称定义为 sm-pretrained-spacy。
把serve文件放进去 NER 可执行文件夹。
设置 Amazon Web Services 区域.
如果 ECR 还不存在，则创建 ECR。
登录 ECR。
构建 Docker 本地容器。
推动 Docker 镜像到 ECR。

设置 SageMaker AI 客户端

如果要使用 SageMaker AI 托管服务进行推理，则必须创建模型、创建终端节点配置并创建终端节点。为了从您的端点获取推论，您可以使用 AI SageMaker boto3 运行时客户端，用于调用您的终端节点。以下代码向您展示了如何使用 SageMaker AI boto3 客户端设置 A SageMaker I 客户端和 SageMaker 运行时客户端：
```
import boto3
from sagemaker import get_execution_role

sm_client = boto3.client(service_name='sagemaker')
runtime_sm_client = boto3.client(service_name='sagemaker-runtime')

account_id = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name

#used to store model artifacts which SageMaker AI will extract to /opt/ml/model in the container, 
#in this example case we will not be making use of S3 to store the model artifacts
#s3_bucket = '<S3Bucket>'

role = get_execution_role()
```
在前面的代码示例中，未使用 Amazon S3 存储桶，而是作为注释插入，以显示如何存储模型构件。

如果您在运行前面的代码示例后出现权限错误，则可能需要为 IAM 角色添加权限。有关 IAM 角色的更多信息，请参阅 Amazon SageMaker 角色管理器。有关为当前角色添加权限的更多信息，请参阅 Amazon 亚马逊 A SageMaker I 的托管策略。

创建模型。

如果要使用 SageMaker AI 托管服务进行推理，则必须在 SageMaker AI 中创建模型。以下代码示例向您展示了如何创建 spaCy NER SageMaker 人工智能内部的模型：


from time import gmtime, strftime

model_name = 'spacy-nermodel-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
# MODEL S3 URL containing model atrifacts as either model.tar.gz or extracted artifacts. 
# Here we are not  
#model_url = 's3://{}/spacy/'.format(s3_bucket) 

container = '{}.dkr.ecr.{}.amazonaws.com/sm-pretrained-spacy:latest'.format(account_id, region)
instance_type = 'ml.c5d.18xlarge'

print('Model name: ' + model_name)
#print('Model data Url: ' + model_url)
print('Container image: ' + container)

container = {
'Image': container
}

create_model_response = sm_client.create_model(
    ModelName = model_name,
    ExecutionRoleArn = role,
    Containers = [container])

print("Model Arn: " + create_model_response['ModelArn'])

前面的代码示例说明了如果您要使用步骤 5 中注释中的 Amazon S3 存储桶，如何使用 s3_bucket 定义 model_url，并定义容器映像的 ECR URI。前面的代码示例将 ml.c5d.18xlarge 定义为实例类型。您还可以选择不同的实例类型。有关可用实例类型的更多信息，请参阅 Amazon EC2 实例类型。

在前面的代码示例中，Image 键指向容器映像 URI。create_model_response 定义使用 create_model method 创建模型，并返回模型名称、角色和包含容器信息的列表。

前面脚本的输出示例如下：


Model name: spacy-nermodel-YYYY-MM-DD-HH-MM-SS
Model data Url: s3://spacy-sagemaker-us-east-1-bucket/spacy/
Container image: 123456789012.dkr.ecr.us-east-2.amazonaws.com/sm-pretrained-spacy:latest
Model Arn: arn:aws:sagemaker:us-east-2:123456789012:model/spacy-nermodel-YYYY-MM-DD-HH-MM-SS

配置和创建端点

要使用 SageMaker AI 托管进行推理，您还必须配置和创建终端节点。 SageMaker AI 将使用此端点进行推理。下面的配置示例说明了如何使用您之前定义的实例类型和模型名称生成和配置端点：


endpoint_config_name = 'spacy-ner-config' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint config name: ' + endpoint_config_name)

create_endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': instance_type,
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])
        
print("Endpoint config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

在前面的配置示例中，create_endpoint_config_response 将 model_name 与使用时间戳创建的唯一的端点配置名称 endpoint_config_name 关联。

前面脚本的输出示例如下：


Endpoint config name: spacy-ner-configYYYY-MM-DD-HH-MM-SS
Endpoint config Arn: arn:aws:sagemaker:us-east-2:123456789012:endpoint-config/spacy-ner-config-MM-DD-HH-MM-SS

有关终端节点错误的更多信息，请参阅创建或更新终端节点时，为什么我的 SageMaker Amazon AI 终端节点会进入故障状态？

创建端点并等待端点投入使用。

下面的代码示例使用前面的配置示例中的配置创建了端点，并部署了模型：


%%time

import time

endpoint_name = 'spacy-ner-endpoint' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print('Endpoint name: ' + endpoint_name)

create_endpoint_response = sm_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print('Endpoint Arn: ' + create_endpoint_response['EndpointArn'])

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Endpoint Status: " + status)

print('Waiting for {} endpoint to be in service...'.format(endpoint_name))
waiter = sm_client.get_waiter('endpoint_in_service')
waiter.wait(EndpointName=endpoint_name)

在前面的代码示例中，create_endpoint 方法使用前面的代码示例中生成的端点名称创建端点，并打印端点的 Amazon 资源名称。describe_endpoint 方法返回有关端点及其状态的信息。A SageMaker I 服务员等待端点投入使用。

测试端点。

端点投入使用后，向端点发送调用请求。下面的代码示例说明了如何向端点发送测试请求：


import json
content_type = "application/json"
request_body = {"input": "This is a test with NER in America with \
    Amazon and Microsoft in Seattle, writing random stuff."}

#Serialize data for endpoint
#data = json.loads(json.dumps(request_body))
payload = json.dumps(request_body)

#Endpoint invocation
response = runtime_sm_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType=content_type,
Body=payload)

#Parse results
result = json.loads(response['Body'].read().decode())['output']
result

在前面的代码示例中，方法 json.dumps 将 request_body 序列化为 JSON 格式的字符串，并将其保存到有效载荷变量中。然后， SageMaker AI Runtime 客户端使用调用端点方法向您的终端节点发送有效负载。结果包含端点提取输出字段后的响应。

前面的代码示例应返回以下输出结果：


[['NER', 'ORG'],
 ['America', 'GPE'],
 ['Amazon', 'ORG'],
 ['Microsoft', 'ORG'],
 ['Seattle', 'GPE']]

删除端点

完成调用后，请删除端点以节省资源。下面的代码示例说明了如何删除端点：
```
sm_client.delete_endpoint(EndpointName=endpoint_name)
sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_client.delete_model(ModelName=model_name)
```
有关包含此代码示例的完整笔记本，请参阅 BYOC-Single-Model。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

使用需要身份验证的 Docker 注册表进行训练

使用自己的算法和模型创建容器

调整你自己的推理容器以适应 Amazon AI SageMaker

调整您的推理容器

重要

注意

配置和创建端点

创建端点并等待端点投入使用。