管理模型 - Amazon SageMaker
AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 AWS 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

管理模型

SageMaker Edge Manager 代理提供了模型管理 APIs 的列表,可在边缘设备上实施控制层面和数据层面 APIs。除了本文档之外,我们还建议您浏览示例客户端实现,这些实现说明了以下所述的 APIs 的规范用法。

文件作为发布项目的一部分提供(在发布 tarball 内)。proto在此文档中,我们列出并描述此 APIs 文件中列出的 proto 的使用情况。

注意

在 Windows 发行版上,这些 APIs 具有一对一映射,并且在 C# 中实现的应用程序的示例代码与适用于 Windows 的发行版构件共享。以下说明适用于将 代理作为独立进程运行,适用于 Linux 的发布项目。

根据您的操作系统提取存档。其中 VERSION 分为三个组件:<MAJOR_VERSION>.<YYYY-MM-DD>-<SHA-7>。 有关如何获取发行版本 (安装 Edge Manager 代理)、发行构件的时间戳 (<MAJOR_VERSION>) 和存储库提交 ID (<YYYY-MM-DD>) 的信息,请参阅SHA-7

Linux

可以使用以下命令提取 zip 存档:

tar -xvzf <VERSION>.tgz
Windows

可以使用 UI 或命令提取 zip 存档:

unzip <VERSION>.tgz

版本构件层次结构(提取 tar/zip 存档后)如下所示。代理 proto 文件位于 api/ 下。

0.20201205.7ee4b0b ├── bin │ ├── sagemaker_edge_agent_binary │ └── sagemaker_edge_agent_client_example └── docs ├── api │ └── agent.proto ├── attributions │ ├── agent.txt │ └── core.txt └── examples └── ipc_example ├── CMakeLists.txt ├── sagemaker_edge_client.cc ├── sagemaker_edge_client_example.cc ├── sagemaker_edge_client.hh ├── sagemaker_edge.proto ├── README.md ├── shm.cc ├── shm.hh └── street_small.bmp

加载模型

软件边缘设备支持一次加载一个模型,并对该模型调用推理。Edge Manager agent此 API 验证模型签名并将 EdgePackagingJob 操作生成的所有构件加载到内存中。此步骤要求安装所有必需的证书及其余代理二进制安装。如果无法验证模型的签名,此步骤将失败,并在日志中包含相应的返回代码和错误消息。

// perform load for a model // Note: // 1. currently only local filesystem paths are supported for loading models. // 2. currently only one model could be loaded at any time, loading of multiple // models simultaneously shall be implemented in the future. // 3. users are required to unload any loaded model to load another model. // Status Codes: // 1. OK - load is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - model doesn't exist at the url // 5. ALREADY_EXISTS - model with the same name is already loaded // 6. RESOURCE_EXHAUSTED - memory is not available to load the model // 7. FAILED_PRECONDITION - model is not compiled for the machine. // rpc LoadModel(LoadModelRequest) returns (LoadModelResponse);
Input
// // request for LoadModel rpc call // message LoadModelRequest { string url = 1; string name = 2; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$" }
Output
// // // response for LoadModel rpc call // message LoadModelResponse { Model model = 1; } // // Model represents the metadata of a model // url - url representing the path of the model // name - name of model // input_tensor_metadatas - TensorMetadata array for the input tensors // output_tensor_metadatas - TensorMetadata array for the output tensors // // Note: // 1. input and output tensor metadata could empty for dynamic models. // message Model { string url = 1; string name = 2; repeated TensorMetadata input_tensor_metadatas = 3; repeated TensorMetadata output_tensor_metadatas = 4; }

卸载模型

卸载之前加载的模型。它通过 loadModel 期间提供的模型别名进行标识。 如果未找到别名或找不到加载模型,则返回错误。

// // perform unload for a model // Status Codes: // 1. OK - unload is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - model doesn't exist // rpc UnLoadModel(UnLoadModelRequest) returns (UnLoadModelResponse);
Input
// // request for UnLoadModel rpc call // message UnLoadModelRequest { string name = 1; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$" }
Output
// // response for UnLoadModel rpc call // message UnLoadModelResponse {}

列出模型

列出所有加载的模型及其别名。

// // lists the loaded models // Status Codes: // 1. OK - unload is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // rpc ListModels(ListModelsRequest) returns (ListModelsResponse);
Input
// // request for ListModels rpc call // message ListModelsRequest {}
Output
// // response for ListModels rpc call // message ListModelsResponse { repeated Model models = 1; }

描述模型

描述在代理上加载的模型。

// // Status Codes: // 1. OK - load is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - model doesn't exist at the url // rpc DescribeModel(DescribeModelRequest) returns (DescribeModelResponse);
Input
// // request for DescribeModel rpc call // message DescribeModelRequest { string name = 1; }
Output
// // response for DescribeModel rpc call // message DescribeModelResponse { Model model = 1; }

捕获数据

允许客户端应用程序在 Amazon S3 存储桶中捕获输入和输出张量,并(可选)捕获辅助张量。客户端应用程序应在每次调用此 API 时传递一个唯一的捕获 ID。这稍后可用于查询捕获的状态。

// // allows users to capture input and output tensors along with auxiliary data. // Status Codes: // 1. OK - data capture successfully initiated // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 5. ALREADY_EXISTS - capture initiated for the given capture_id // 6. RESOURCE_EXHAUSTED - buffer is full cannot accept any more requests. // 7. OUT_OF_RANGE - timestamp is in the future. // 8. INVALID_ARGUMENT - capture_id is not of expected format. // rpc CaptureData(CaptureDataRequest) returns (CaptureDataResponse);
Input
enum Encoding { CSV = 0; JSON = 1; NONE = 2; BASE64 = 3; } // // AuxilaryData represents a payload of extra data to be capture along with inputs and outputs of inference // encoding - supports the encoding of the data // data - represents the data of shared memory, this could be passed in two ways: // a. send across the raw bytes of the multi-dimensional tensor array // b. send a SharedMemoryHandle which contains the posix shared memory segment id and // offset in bytes to location of multi-dimensional tensor array. // message AuxilaryData { string name = 1; Encoding encoding = 2; oneof data { bytes byte_data = 3; SharedMemoryHandle shared_memory_handle = 4; } } // // Tensor represents a tensor, encoded as contiguous multi-dimensional array. // tensor_metadata - represents metadata of the shared memory segment // data_or_handle - represents the data of shared memory, this could be passed in two ways: // a. send across the raw bytes of the multi-dimensional tensor array // b. send a SharedMemoryHandle which contains the posix shared memory segment // id and offset in bytes to location of multi-dimensional tensor array. // message Tensor { TensorMetadata tensor_metadata = 1; //optional in the predict request oneof data { bytes byte_data = 4; // will only be used for input tensors SharedMemoryHandle shared_memory_handle = 5; } } // // request for CaptureData rpc call // message CaptureDataRequest { string model_name = 1; string capture_id = 2; //uuid string Timestamp inference_timestamp = 3; repeated Tensor input_tensors = 4; repeated Tensor output_tensors = 5; repeated AuxilaryData inputs = 6; repeated AuxilaryData outputs = 7; }
Output
// // response for CaptureData rpc call // message CaptureDataResponse {}

获取捕获状态

根据加载的模型,输入和输出张量可能很大(对于许多边缘设备)。捕获到云中可能需要很长时间。因此,CaptureData() 是作为异步操作实施的。捕获 ID 是客户端在捕获数据调用期间提供的唯一标识符,此 ID 可用于查询异步调用的状态。

// // allows users to query status of capture data operation // Status Codes: // 1. OK - data capture successfully initiated // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - given capture id doesn't exist. // rpc GetCaptureDataStatus(GetCaptureDataStatusRequest) returns (GetCaptureDataStatusResponse);
Input
// // request for GetCaptureDataStatus rpc call // message GetCaptureDataStatusRequest { string capture_id = 1; }
Output
enum CaptureDataStatus { FAILURE = 0; SUCCESS = 1; IN_PROGRESS = 2; NOT_FOUND = 3; } // // response for GetCaptureDataStatus rpc call // message GetCaptureDataStatusResponse { CaptureDataStatus status = 1; }

Predict

API 对之前加载的模型执行推理。predict它以直接注入到神经网络中的张量的形式接受请求。输出是模型中的输出张量(或标量)。这是一个阻止性调用。

// // perform inference on a model. // // Note: // 1. users can chose to send the tensor data in the protobuf message or // through a shared memory segment on a per tensor basis, the Predict // method with handle the decode transparently. // 2. serializing large tensors into the protobuf message can be quite expensive, // based on our measurements it is recommended to use shared memory of // tenors larger than 256KB. // 3. SMEdge IPC server will not use shared memory for returning output tensors, // i.e., the output tensor data will always send in byte form encoded // in the tensors of PredictResponse. // 4. currently SMEdge IPC server cannot handle concurrent predict calls, all // these call will be serialized under the hood. this shall be addressed // in a later release. // Status Codes: // 1. OK - prediction is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - when model not found // 5. INVALID_ARGUMENT - when tenors types mismatch // rpc Predict(PredictRequest) returns (PredictResponse);
Input
// request for Predict rpc call // message PredictRequest { string name = 1; repeated Tensor tensors = 2; } // // Tensor represents a tensor, encoded as contiguous multi-dimensional array. // tensor_metadata - represents metadata of the shared memory segment // data_or_handle - represents the data of shared memory, this could be passed in two ways: // a. send across the raw bytes of the multi-dimensional tensor array // b. send a SharedMemoryHandle which contains the posix shared memory segment // id and offset in bytes to location of multi-dimensional tensor array. // message Tensor { TensorMetadata tensor_metadata = 1; //optional in the predict request oneof data { bytes byte_data = 4; // will only be used for input tensors SharedMemoryHandle shared_memory_handle = 5; } } // // Tensor represents a tensor, encoded as contiguous multi-dimensional array. // tensor_metadata - represents metadata of the shared memory segment // data_or_handle - represents the data of shared memory, this could be passed in two ways: // a. send across the raw bytes of the multi-dimensional tensor array // b. send a SharedMemoryHandle which contains the posix shared memory segment // id and offset in bytes to location of multi-dimensional tensor array. // message Tensor { TensorMetadata tensor_metadata = 1; //optional in the predict request oneof data { bytes byte_data = 4; // will only be used for input tensors SharedMemoryHandle shared_memory_handle = 5; } } // // TensorMetadata represents the metadata for a tensor // name - name of the tensor // data_type - data type of the tensor // shape - array of dimensions of the tensor // message TensorMetadata { string name = 1; DataType data_type = 2; repeated int32 shape = 3; } // // SharedMemoryHandle represents a posix shared memory segment // offset - offset in bytes from the start of the shared memory segment. // segment_id - shared memory segment id corresponding to the posix shared memory segment. // size - size in bytes of shared memory segment to use from the offset position. // message SharedMemoryHandle { uint64 size = 1; uint64 offset = 2; uint64 segment_id = 3; }
Output
注意

仅返回 PredictResponse,而不返回 TensorsSharedMemoryHandle

// response for Predict rpc call // message PredictResponse { repeated Tensor tensors = 1; }