AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 AWS 服务入门。
本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
管理模型
SageMaker Edge Manager 代理提供了模型管理 APIs 的列表,可在边缘设备上实施控制层面和数据层面 APIs。除了本文档之外,我们还建议您浏览示例客户端实现,这些实现说明了以下所述的
APIs 的规范用法。
文件作为发布项目的一部分提供(在发布 tarball 内)。proto
在此文档中,我们列出并描述此 APIs 文件中列出的 proto
的使用情况。
在 Windows 发行版上,这些 APIs 具有一对一映射,并且在 C# 中实现的应用程序的示例代码与适用于 Windows 的发行版构件共享。以下说明适用于将
代理作为独立进程运行,适用于 Linux 的发布项目。
根据您的操作系统提取存档。其中 VERSION
分为三个组件:<MAJOR_VERSION>.<YYYY-MM-DD>-<SHA-7>
。 有关如何获取发行版本 (安装 Edge Manager 代理)、发行构件的时间戳 (<MAJOR_VERSION>
) 和存储库提交 ID (<YYYY-MM-DD>
) 的信息,请参阅SHA-7
- Linux
-
可以使用以下命令提取 zip 存档:
tar -xvzf <VERSION>
.tgz
- Windows
-
可以使用 UI 或命令提取 zip 存档:
unzip <VERSION>
.tgz
版本构件层次结构(提取 tar/zip
存档后)如下所示。代理 proto
文件位于 api/
下。
0.20201205.7ee4b0b
├── bin
│ ├── sagemaker_edge_agent_binary
│ └── sagemaker_edge_agent_client_example
└── docs
├── api
│ └── agent.proto
├── attributions
│ ├── agent.txt
│ └── core.txt
└── examples
└── ipc_example
├── CMakeLists.txt
├── sagemaker_edge_client.cc
├── sagemaker_edge_client_example.cc
├── sagemaker_edge_client.hh
├── sagemaker_edge.proto
├── README.md
├── shm.cc
├── shm.hh
└── street_small.bmp
加载模型
软件边缘设备支持一次加载一个模型,并对该模型调用推理。Edge Manager agent此 API 验证模型签名并将 EdgePackagingJob
操作生成的所有构件加载到内存中。此步骤要求安装所有必需的证书及其余代理二进制安装。如果无法验证模型的签名,此步骤将失败,并在日志中包含相应的返回代码和错误消息。
// perform load for a model
// Note:
// 1. currently only local filesystem paths are supported for loading models.
// 2. currently only one model could be loaded at any time, loading of multiple
// models simultaneously shall be implemented in the future.
// 3. users are required to unload any loaded model to load another model.
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
// 5. ALREADY_EXISTS - model with the same name is already loaded
// 6. RESOURCE_EXHAUSTED - memory is not available to load the model
// 7. FAILED_PRECONDITION - model is not compiled for the machine.
//
rpc LoadModel(LoadModelRequest) returns (LoadModelResponse);
- Input
-
//
// request for LoadModel rpc call
//
message LoadModelRequest {
string url = 1;
string name = 2; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}
- Output
-
//
//
// response for LoadModel rpc call
//
message LoadModelResponse {
Model model = 1;
}
//
// Model represents the metadata of a model
// url - url representing the path of the model
// name - name of model
// input_tensor_metadatas - TensorMetadata array for the input tensors
// output_tensor_metadatas - TensorMetadata array for the output tensors
//
// Note:
// 1. input and output tensor metadata could empty for dynamic models.
//
message Model {
string url = 1;
string name = 2;
repeated TensorMetadata input_tensor_metadatas = 3;
repeated TensorMetadata output_tensor_metadatas = 4;
}
卸载模型
卸载之前加载的模型。它通过 loadModel
期间提供的模型别名进行标识。 如果未找到别名或找不到加载模型,则返回错误。
//
// perform unload for a model
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist
//
rpc UnLoadModel(UnLoadModelRequest) returns (UnLoadModelResponse);
- Input
-
//
// request for UnLoadModel rpc call
//
message UnLoadModelRequest {
string name = 1; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$"
}
- Output
-
//
// response for UnLoadModel rpc call
//
message UnLoadModelResponse {}
列出模型
列出所有加载的模型及其别名。
//
// lists the loaded models
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
//
rpc ListModels(ListModelsRequest) returns (ListModelsResponse);
- Input
-
//
// request for ListModels rpc call
//
message ListModelsRequest {}
- Output
-
//
// response for ListModels rpc call
//
message ListModelsResponse {
repeated Model models = 1;
}
描述模型
描述在代理上加载的模型。
//
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
//
rpc DescribeModel(DescribeModelRequest) returns (DescribeModelResponse);
- Input
-
//
// request for DescribeModel rpc call
//
message DescribeModelRequest {
string name = 1;
}
- Output
-
//
// response for DescribeModel rpc call
//
message DescribeModelResponse {
Model model = 1;
}
捕获数据
允许客户端应用程序在 Amazon S3 存储桶中捕获输入和输出张量,并(可选)捕获辅助张量。客户端应用程序应在每次调用此 API 时传递一个唯一的捕获 ID。这稍后可用于查询捕获的状态。
//
// allows users to capture input and output tensors along with auxiliary data.
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 5. ALREADY_EXISTS - capture initiated for the given capture_id
// 6. RESOURCE_EXHAUSTED - buffer is full cannot accept any more requests.
// 7. OUT_OF_RANGE - timestamp is in the future.
// 8. INVALID_ARGUMENT - capture_id is not of expected format.
//
rpc CaptureData(CaptureDataRequest) returns (CaptureDataResponse);
- Input
-
enum Encoding {
CSV = 0;
JSON = 1;
NONE = 2;
BASE64 = 3;
}
//
// AuxilaryData represents a payload of extra data to be capture along with inputs and outputs of inference
// encoding - supports the encoding of the data
// data - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment id and
// offset in bytes to location of multi-dimensional tensor array.
//
message AuxilaryData {
string name = 1;
Encoding encoding = 2;
oneof data {
bytes byte_data = 3;
SharedMemoryHandle shared_memory_handle = 4;
}
}
//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
TensorMetadata tensor_metadata = 1; //optional in the predict request
oneof data {
bytes byte_data = 4;
// will only be used for input tensors
SharedMemoryHandle shared_memory_handle = 5;
}
}
//
// request for CaptureData rpc call
//
message CaptureDataRequest {
string model_name = 1;
string capture_id = 2; //uuid string
Timestamp inference_timestamp = 3;
repeated Tensor input_tensors = 4;
repeated Tensor output_tensors = 5;
repeated AuxilaryData inputs = 6;
repeated AuxilaryData outputs = 7;
}
- Output
-
//
// response for CaptureData rpc call
//
message CaptureDataResponse {}
获取捕获状态
根据加载的模型,输入和输出张量可能很大(对于许多边缘设备)。捕获到云中可能需要很长时间。因此,CaptureData()
是作为异步操作实施的。捕获 ID 是客户端在捕获数据调用期间提供的唯一标识符,此 ID 可用于查询异步调用的状态。
//
// allows users to query status of capture data operation
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - given capture id doesn't exist.
//
rpc GetCaptureDataStatus(GetCaptureDataStatusRequest) returns (GetCaptureDataStatusResponse);
- Input
-
//
// request for GetCaptureDataStatus rpc call
//
message GetCaptureDataStatusRequest {
string capture_id = 1;
}
- Output
-
enum CaptureDataStatus {
FAILURE = 0;
SUCCESS = 1;
IN_PROGRESS = 2;
NOT_FOUND = 3;
}
//
// response for GetCaptureDataStatus rpc call
//
message GetCaptureDataStatusResponse {
CaptureDataStatus status = 1;
}
Predict
API 对之前加载的模型执行推理。predict
它以直接注入到神经网络中的张量的形式接受请求。输出是模型中的输出张量(或标量)。这是一个阻止性调用。
//
// perform inference on a model.
//
// Note:
// 1. users can chose to send the tensor data in the protobuf message or
// through a shared memory segment on a per tensor basis, the Predict
// method with handle the decode transparently.
// 2. serializing large tensors into the protobuf message can be quite expensive,
// based on our measurements it is recommended to use shared memory of
// tenors larger than 256KB.
// 3. SMEdge IPC server will not use shared memory for returning output tensors,
// i.e., the output tensor data will always send in byte form encoded
// in the tensors of PredictResponse.
// 4. currently SMEdge IPC server cannot handle concurrent predict calls, all
// these call will be serialized under the hood. this shall be addressed
// in a later release.
// Status Codes:
// 1. OK - prediction is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - when model not found
// 5. INVALID_ARGUMENT - when tenors types mismatch
//
rpc Predict(PredictRequest) returns (PredictResponse);
- Input
-
// request for Predict rpc call
//
message PredictRequest {
string name = 1;
repeated Tensor tensors = 2;
}
//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
TensorMetadata tensor_metadata = 1; //optional in the predict request
oneof data {
bytes byte_data = 4;
// will only be used for input tensors
SharedMemoryHandle shared_memory_handle = 5;
}
}
//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
// tensor_metadata - represents metadata of the shared memory segment
// data_or_handle - represents the data of shared memory, this could be passed in two ways:
// a. send across the raw bytes of the multi-dimensional tensor array
// b. send a SharedMemoryHandle which contains the posix shared memory segment
// id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
TensorMetadata tensor_metadata = 1; //optional in the predict request
oneof data {
bytes byte_data = 4;
// will only be used for input tensors
SharedMemoryHandle shared_memory_handle = 5;
}
}
//
// TensorMetadata represents the metadata for a tensor
// name - name of the tensor
// data_type - data type of the tensor
// shape - array of dimensions of the tensor
//
message TensorMetadata {
string name = 1;
DataType data_type = 2;
repeated int32 shape = 3;
}
//
// SharedMemoryHandle represents a posix shared memory segment
// offset - offset in bytes from the start of the shared memory segment.
// segment_id - shared memory segment id corresponding to the posix shared memory segment.
// size - size in bytes of shared memory segment to use from the offset position.
//
message SharedMemoryHandle {
uint64 size = 1;
uint64 offset = 2;
uint64 segment_id = 3;
}
- Output
-
仅返回 PredictResponse
,而不返回 Tensors
。SharedMemoryHandle
// response for Predict rpc call
//
message PredictResponse {
repeated Tensor tensors = 1;
}