管理模型

Edge Manager 代理可以一次加载多个模型，并使用边缘设备上加载的模型进行推理。代理可加载的模型数取决于设备上的可用内存。代理验证模型签名，并将边缘打包作业生成的所有构件加载到内存中。此步骤要求安装前面步骤中描述的所有必需证书以及其余的二进制文件。如果无法验证模型的签名，模型加载将失败，并显示适当的返回代码和原因。

SageMaker Edge Manager 代理提供了在边缘设备 APIs 上实现控制平面和数据平面的模型管理 APIs 列表。除了本文档外，我们还建议您仔细阅读示例客户端实现，其中显示了下述内容的规范用法。 APIs

proto 文件作为发布构件的一部分提供（在发布 tarball 中）。在本文档中，我们列出并描述了此proto文件中 APIs 列出的用法。

注意

APIs 在 Windows 版本中存在这些 one-to-one映射，C# 中应用程序实现的示例代码已与 Windows 的发布工件共享。以下说明用于将代理作为独立进程运行，适用于 Linux 的发布构件。

根据操作系统提取存档。其中，VERSION 分为三个组成部分：<MAJOR_VERSION>.<YYYY-MM-DD>-<SHA-7>。有关如何获取发布版本 (<MAJOR_VERSION>)、发布构件的时间戳 (<YYYY-MM-DD>) 和存储库提交 ID (SHA-7) 的信息，请参阅安装 Edge Manager 代理。

发布构件层次结构（提取 tar/zip 存档后）如下所示。代理 proto 文件在 api/ 下面可用。


0.20201205.7ee4b0b
├── bin
│         ├── sagemaker_edge_agent_binary
│         └── sagemaker_edge_agent_client_example
└── docs
├── api
│         └── agent.proto
├── attributions
│         ├── agent.txt
│         └── core.txt
└── examples
└── ipc_example
├── CMakeLists.txt
├── sagemaker_edge_client.cc
├── sagemaker_edge_client_example.cc
├── sagemaker_edge_client.hh
├── sagemaker_edge.proto
├── README.md
├── shm.cc
├── shm.hh
└── street_small.bmp

加载模型

Edge Manager 代理支持加载多个模型。此 API 验证模型签名，并将通过 EdgePackagingJob 操作生成的所有构件加载到内存中。此步骤要求安装所有必需的证书以及其余的代理二进制文件。如果无法验证模型的签名，则此步骤将失败，并在日志中显示适当的返回代码和错误消息。


// perform load for a model
// Note:
// 1. currently only local filesystem paths are supported for loading models.
// 2. multiple models can be loaded at the same time, as limited by available device memory
// 3. users are required to unload any loaded model to load another model.
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
// 5. ALREADY_EXISTS - model with the same name is already loaded
// 6. RESOURCE_EXHAUSTED - memory is not available to load the model
// 7. FAILED_PRECONDITION - model is not compiled for the machine.
//
rpc LoadModel(LoadModelRequest) returns (LoadModelResponse);

卸载模型

卸载之前加载的模型。模型通过 loadModel 期间提供的模型别名进行识别。如果找不到别名或未加载模型，则返回错误。


//
// perform unload for a model
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist
//
rpc UnLoadModel(UnLoadModelRequest) returns (UnLoadModelResponse);

列出模型

列出所有已加载的模型及其别名。


//
// lists the loaded models
// Status Codes:
// 1. OK - unload is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
//
rpc ListModels(ListModelsRequest) returns (ListModelsResponse);

描述模型

描述在代理上加载的模型。


//
// Status Codes:
// 1. OK - load is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - model doesn't exist at the url
//
rpc DescribeModel(DescribeModelRequest) returns (DescribeModelResponse);

捕获数据

允许客户端应用程序捕获 Amazon S3 存储桶中的输入和输出张量，也可以选择捕获辅助数据。客户端应用程序应在每次调用此 API 时传递唯一的捕获 ID。以后可以用它来查询捕获的状态。


//
// allows users to capture input and output tensors along with auxiliary data.
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 5. ALREADY_EXISTS - capture initiated for the given capture_id
// 6. RESOURCE_EXHAUSTED - buffer is full cannot accept any more requests.
// 7. OUT_OF_RANGE - timestamp is in the future.
// 8. INVALID_ARGUMENT - capture_id is not of expected format.
//
rpc CaptureData(CaptureDataRequest) returns (CaptureDataResponse);

获取捕获状态

根据加载的模型，输入和输出张量可能很大（对于许多边缘设备而言）。捕获到云可能非常耗时。因此，CaptureData() 是作为异步操作实施的。捕获 ID 是客户端在捕获数据调用期间提供的唯一标识符，此 ID 可用于查询异步调用的状态。


//
// allows users to query status of capture data operation
// Status Codes:
// 1. OK - data capture successfully initiated
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - given capture id doesn't exist.
//
rpc GetCaptureDataStatus(GetCaptureDataStatusRequest) returns (GetCaptureDataStatusResponse);

预测

predict API 对先前加载的模型执行推理。它接受直接输入神经网络的张量形式的请求。输出是模型的输出张量（或标量）。这是一个阻止性调用。


//
// perform inference on a model.
//
// Note:
// 1. users can chose to send the tensor data in the protobuf message or
// through a shared memory segment on a per tensor basis, the Predict
// method with handle the decode transparently.
// 2. serializing large tensors into the protobuf message can be quite expensive,
// based on our measurements it is recommended to use shared memory of
// tenors larger than 256KB.
// 3. SMEdge IPC server will not use shared memory for returning output tensors,
// i.e., the output tensor data will always send in byte form encoded
// in the tensors of PredictResponse.
// 4. currently SMEdge IPC server cannot handle concurrent predict calls, all
// these call will be serialized under the hood. this shall be addressed
// in a later release.
// Status Codes:
// 1. OK - prediction is successful
// 2. UNKNOWN - unknown error has occurred
// 3. INTERNAL - an internal error has occurred
// 4. NOT_FOUND - when model not found
// 5. INVALID_ARGUMENT - when tenors types mismatch
//
rpc Predict(PredictRequest) returns (PredictResponse);

Input


// request for Predict rpc call
//
message PredictRequest {
string name = 1;
repeated Tensor tensors = 2;
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// Tensor represents a tensor, encoded as contiguous multi-dimensional array.
//    tensor_metadata - represents metadata of the shared memory segment
//    data_or_handle - represents the data of shared memory, this could be passed in two ways:
//                        a. send across the raw bytes of the multi-dimensional tensor array
//                        b. send a SharedMemoryHandle which contains the posix shared memory segment
//                            id and offset in bytes to location of multi-dimensional tensor array.
//
message Tensor {
  TensorMetadata tensor_metadata = 1; //optional in the predict request
  oneof data {
    bytes byte_data = 4;
    // will only be used for input tensors
    SharedMemoryHandle shared_memory_handle = 5;
  }
}

//
// TensorMetadata represents the metadata for a tensor
//    name - name of the tensor
//    data_type  - data type of the tensor
//    shape - array of dimensions of the tensor
//
message TensorMetadata {
  string name = 1;
  DataType data_type = 2;
  repeated int32 shape = 3;
}

//
// SharedMemoryHandle represents a posix shared memory segment
//    offset - offset in bytes from the start of the shared memory segment.
//    segment_id - shared memory segment id corresponding to the posix shared memory segment.
//    size - size in bytes of shared memory segment to use from the offset position.
//
message SharedMemoryHandle {
  uint64 size = 1;
  uint64 offset = 2;
  uint64 segment_id = 3;
}

Output

注意

PredictResponse 只会返回 Tensors，而不返回 SharedMemoryHandle。


// response for Predict rpc call
//
message PredictResponse {
   repeated Tensor tensors = 1;
}

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

使用 SageMaker 边缘管理器部署 API 直接部署 Model Package

SageMaker Edge Manager 的生命周期