Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅
中国的 Amazon Web Services 服务入门
(PDF)。
本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
管理模型
Edge Manager 代理可以一次加载多个模型,并使用边缘设备上加载的模型进行推断。代理可以加载的型号数量由设备上的可用内存决定。代理验证模型签名并将边缘打包作业产生的所有工件加载到内存中。此步骤要求安装前面步骤中描述的所有必需证书以及其余的二进制文件安装。如果无法验证模型的签名,则模型的加载将失败,并显示相应的返回代码和原因。
SageMaker Edge Manager 代理提供了一系列在边缘设备上实现控制平面和数据平面 API 的模型管理 API。除了本文档外,我们还建议您查看示例客户端实现,该示例显示了下述 API 的规范用法。
该proto
文件作为发行工件的一部分提供(在发行版压缩包中)。在本文档中,我们列出并描述了此proto
文件中列出的 API 的用法。
这些 API 在 Windows 发行版上有 one-to-one 映射,并且使用 C# 实现的应用程序的示例代码与 Windows 的发行构件共享。以下说明用于将代理作为独立进程运行,适用于 Linux 的发行构件。
根据您的操作系统提取存档。哪里VERSION
分为三个部分:<MAJOR_VERSION>.<YYYY-MM-DD>-<SHA-7>
. 安装边缘管理器代理有关如何获取发行版本 (<MAJOR_VERSION>
)、发布对象的时间戳 () 和存储库提交 ID (<YYYY-MM-DD>
SHA-7
) 的信息,请参见
- Linux
-
可以使用以下命令提取 zip 存档:
tar -xvzf <VERSION>
.tgz
- Windows
-
可以使用 UI 或命令提取 zip 存档:
unzip <VERSION>
.tgz
发布构件层次结构(提取tar/zip
存档后)如下所示。代理proto
文件位于下方api/
。
0.20201205.7ee4b0b
├── bin
│ ├── sagemaker_edge_agent_binary
│ └── sagemaker_edge_agent_client_example
└── docs
├── api
│ └── agent.proto
├── attributions
│ ├── agent.txt
│ └── core.txt
└── examples
└── ipc_example
├── CMakeLists.txt
├── sagemaker_edge_client.cc
├── sagemaker_edge_client_example.cc
├── sagemaker_edge_client.hh
├── sagemaker_edge.proto
├── README.md
├── shm.cc
├── shm.hh
└── street_small.bmp
加载模型
Edge Manager 代理支持加载多个模型。此 API 验证模型签名并将EdgePackagingJob
操作产生的所有工件加载到内存中。此步骤要求在安装代理二进制文件时安装所有必需的证书。如果无法验证模型的签名,则此步骤将失败,并在日志中显示相应的返回代码和错误消息。
// perform load for a model
// Note:
// 1. currently only local filesystem paths are supported for loading models.
// 2. multiple models can be loaded at the same time, as limited by available device memory
// 3. users are required to unload any loaded model to load another model.
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
// 5. ALREADY_EXISTS - model with the same name is already loaded
// 6. RESOURCE_EXHAUSTED - memory is not available to load the model
// 7. FAILED_PRECONDITION - model is not compiled for the machine.
//
rpc LoadModel(LoadModelRequest) returns (LoadModelResponse);
- Input
-
//
// request for LoadModel rpc call
//
message LoadModelRequest {
string url = 1;
string name = 2; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}
- Output
-
//
//
// response for LoadModel rpc call
//
message LoadModelResponse {
Model model = 1;
}
//
// Model represents the metadata of a model
// url - url representing the path of the model
// name - name of model
// input_tensor_metadatas - TensorMetadata array for the input tensors
// output_tensor_metadatas - TensorMetadata array for the output tensors
//
// Note:
// 1. input and output tensor metadata could empty for dynamic models.
//
message Model {
string url = 1;
string name = 2;
repeated TensorMetadata input_tensor_metadatas = 3;
repeated TensorMetadata output_tensor_metadatas = 4;
}
UNLDOD 模型
卸载先前加载的模型。它是通过期间提供的模型别名来识别的loadModel
。如果找不到别名或模型未加载,则返回错误。
//
// perform unload for a model
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist
//
rpc UnLoadModel(UnLoadModelRequest) returns (UnLoadModelResponse);
- Input
-
//
// request for UnLoadModel rpc call
//
message UnLoadModelRequest {
string name = 1; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}
- Output
-
//
// response for UnLoadModel rpc call
//
message UnLoadModelResponse {}
列出模型
列出所有加载的模型及其别名。
//
// lists the loaded models
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
//
rpc ListModels(ListModelsRequest) returns (ListModelsResponse);
- Input
-
//
// request for ListModels rpc call
//
message ListModelsRequest {}
- Output
-
//
// response for ListModels rpc call
//
message ListModelsResponse {
repeated Model models = 1;
}
描述模型
描述加载到代理上的模型。
//
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
//
rpc DescribeModel(DescribeModelRequest) returns (DescribeModelResponse);
- Input
-
//
// request for DescribeModel rpc call
//
message DescribeModelRequest {
string name = 1;
}
- Output
-
//
// response for DescribeModel rpc call
//
message DescribeModelResponse {
Model model = 1;
}
捕获数据
允许客户端应用程序捕获 Amazon S3 存储桶中的输入和输出张量,也可以捕获辅助张量。客户端应用程序应在每次调用此 API 时传递一个唯一的捕获 ID。以后可以用它来查询捕获的状态。
//
// allows users to capture input and output tensors along with auxiliary data.
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 5. ALREADY_EXISTS - capture initiated for the given capture_id
// 6. RESOURCE_EXHAUSTED - buffer is full cannot accept any more requests.
// 7. OUT_OF_RANGE - timestamp is in the future.
// 8. INVALID_ARGUMENT - capture_id is not of expected format.
//
rpc CaptureData(CaptureDataRequest) returns (CaptureDataResponse);
- Input
-
enum Encoding {
CSV = 0;
JSON = 1;
NONE = 2;
BASE64 = 3;
}
//
// AuxilaryData represents a payload of extra data to be capture along with inputs and outputs of inference
// encoding - supports the encoding of the data
// data - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment id and
// offset in bytes to location of multi-dimensional tensor array.
//
message AuxilaryData {
string name = 1;
Encoding encoding = 2;
oneof data {
bytes byte_data = 3;
SharedMemoryHandle shared_memory_handle = 4;
}
}
//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
TensorMetadata tensor_metadata = 1; //optional in the predict request
oneof data {
bytes byte_data = 4;
// will only be used for input tensors
SharedMemoryHandle shared_memory_handle = 5;
}
}
//
// request for CaptureData rpc call
//
message CaptureDataRequest {
string model_name = 1;
string capture_id = 2; //uuid string
Timestamp inference_timestamp = 3;
repeated Tensor input_tensors = 4;
repeated Tensor output_tensors = 5;
repeated AuxilaryData inputs = 6;
repeated AuxilaryData outputs = 7;
}
- Output
-
//
// response for CaptureData rpc call
//
message CaptureDataResponse {}
获取捕获状态
根据加载的模型,输入和输出张量可能很大(适用于许多边缘设备)。捕获到云端可能很耗时。因此,CaptureData()
是作为异步操作实现的。捕获 ID 是客户端在捕获数据调用期间提供的唯一标识符,此 ID 可用于查询异步调用的状态。
//
// allows users to query status of capture data operation
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - given capture id doesn't exist.
//
rpc GetCaptureDataStatus(GetCaptureDataStatusRequest) returns (GetCaptureDataStatusResponse);
- Input
-
//
// request for GetCaptureDataStatus rpc call
//
message GetCaptureDataStatusRequest {
string capture_id = 1;
}
- Output
-
enum CaptureDataStatus {
FAILURE = 0;
SUCCESS = 1;
IN_PROGRESS = 2;
NOT_FOUND = 3;
}
//
// response for GetCaptureDataStatus rpc call
//
message GetCaptureDataStatusResponse {
CaptureDataStatus status = 1;
}
预测
predict
API 对先前加载的模型进行推断。它接受张量形式的请求,该张量直接输入到神经网络中。输出是模型的输出张量(或标量)。这是一个阻止性调用。
//
// perform inference on a model.
//
// Note:
// 1. users can chose to send the tensor data in the protobuf message or
// through a shared memory segment on a per tensor basis, the Predict
// method with handle the decode transparently.
// 2. serializing large tensors into the protobuf message can be quite expensive,
// based on our measurements it is recommended to use shared memory of
// tenors larger than 256KB.
// 3. SMEdge IPC server will not use shared memory for returning output tensors,
// i.e., the output tensor data will always send in byte form encoded
// in the tensors of PredictResponse.
// 4. currently SMEdge IPC server cannot handle concurrent predict calls, all
// these call will be serialized under the hood. this shall be addressed
// in a later release.
// Status Codes:
// 1. OK - prediction is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - when model not found
// 5. INVALID_ARGUMENT - when tenors types mismatch
//
rpc Predict(PredictRequest) returns (PredictResponse);
- Input
-
// request for Predict rpc call
//
message PredictRequest {
string name = 1;
repeated Tensor tensors = 2;
}
//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
TensorMetadata tensor_metadata = 1; //optional in the predict request
oneof data {
bytes byte_data = 4;
// will only be used for input tensors
SharedMemoryHandle shared_memory_handle = 5;
}
}
//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
TensorMetadata tensor_metadata = 1; //optional in the predict request
oneof data {
bytes byte_data = 4;
// will only be used for input tensors
SharedMemoryHandle shared_memory_handle = 5;
}
}
//
// TensorMetadata represents the metadata for a tensor
// name - name of the tensor
// data_type - data type of the tensor
// shape - array of dimensions of the tensor
//
message TensorMetadata {
string name = 1;
DataType data_type = 2;
repeated int32 shape = 3;
}
//
// SharedMemoryHandle represents a posix shared memory segment
// offset - offset in bytes from the start of the shared memory segment.
// segment_id - shared memory segment id corresponding to the posix shared memory segment.
// size - size in bytes of shared memory segment to use from the offset position.
//
message SharedMemoryHandle {
uint64 size = 1;
uint64 offset = 2;
uint64 segment_id = 3;
}
- Output
-
PredictResponse
唯一的回Tensors
报不是SharedMemoryHandle
。
// response for Predict rpc call
//
message PredictResponse {
repeated Tensor tensors = 1;
}