管理模型模型 - Amazon SageMaker
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 Amazon Web Services 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

管理模型模型

边缘管理器代理可以一次加载多个模型,并在边缘设备上使用加载的模型进行推断。代理可加载的型号数取决于设备上的可用内存。代理会验证模型签名,并将边缘打包作业生成的所有伪影加载到内存中。此步骤要求安装前面步骤中描述的所有必需证书以及其余的二进制安装。如果无法验证模型的签名,则模型加载失败,并显示适当的返回代码和原因。

SageMaker 边缘管理器代理提供了在边缘设备上实现控制平面和数据平面 API 的模型管理 API 的列表。除了本文档之外,我们建议您阅读示例客户端实现,其中显示了下面描述的 API 的规范用法。

这些区域有:proto文件作为发布工件的一部分(在发布 tarball 内部)可用。在本文档中,我们列出并描述了此proto文件。

注意

Windows 版本上的这些 API 有一对一的映射,C# 中的应用程序实现的示例代码与 Windows 的发布工件共享。以下说明用于将代理作为独立进程运行,适用于 Linux 的发布项目。

根据您的操作系统解压缩档案。其中VERSION分为三个组件:<MAJOR_VERSION>.<YYYY-MM-DD>-<SHA-7>. 请参阅安装边缘管理器代理,了解有关如何获取发行版本(<MAJOR_VERSION>)、释放工件的时间戳 (<YYYY-MM-DD>)和存储库提交 ID(SHA-7

Linux

可以使用以下命令提取 zip 档案:

tar -xvzf <VERSION>.tgz
Windows

zip 存档可以使用 UI 或命令解压缩:

unzip <VERSION>.tgz

发布对象层次结构(在提取tar/zip存档) 如下所示。代理proto文件位于api/.

0.20201205.7ee4b0b ├── bin │ ├── sagemaker_edge_agent_binary │ └── sagemaker_edge_agent_client_example └── docs ├── api │ └── agent.proto ├── attributions │ ├── agent.txt │ └── core.txt └── examples └── ipc_example ├── CMakeLists.txt ├── sagemaker_edge_client.cc ├── sagemaker_edge_client_example.cc ├── sagemaker_edge_client.hh ├── sagemaker_edge.proto ├── README.md ├── shm.cc ├── shm.hh └── street_small.bmp

加载模型

边缘管理器代理支持加载多个模型。此 API 验证模型签名,并将EdgePackagingJoboperation. 此步骤要求安装所有必需的证书以及代理二进制安装的其余部分。如果无法验证模型的签名,则此步骤将失败,并在日志中显示相应的返回代码和错误消息。

// perform load for a model // Note: // 1. currently only local filesystem paths are supported for loading models. // 2. multiple models can be loaded at the same time, as limited by available device memory // 3. users are required to unload any loaded model to load another model. // Status Codes: // 1. OK - load is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - model doesn't exist at the url // 5. ALREADY_EXISTS - model with the same name is already loaded // 6. RESOURCE_EXHAUSTED - memory is not available to load the model // 7. FAILED_PRECONDITION - model is not compiled for the machine. // rpc LoadModel(LoadModelRequest) returns (LoadModelResponse);
Input
// // request for LoadModel rpc call // message LoadModelRequest { string url = 1; string name = 2; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$" }
Output
// // // response for LoadModel rpc call // message LoadModelResponse { Model model = 1; } // // Model represents the metadata of a model // url - url representing the path of the model // name - name of model // input_tensor_metadatas - TensorMetadata array for the input tensors // output_tensor_metadatas - TensorMetadata array for the output tensors // // Note: // 1. input and output tensor metadata could empty for dynamic models. // message Model { string url = 1; string name = 2; repeated TensorMetadata input_tensor_metadatas = 3; repeated TensorMetadata output_tensor_metadatas = 4; }

卸载模型

卸载先前加载的模型。它通过模型别名进行识别,该别名在loadModel. 如果找不到别名或未加载模型,则返回错误。

// // perform unload for a model // Status Codes: // 1. OK - unload is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - model doesn't exist // rpc UnLoadModel(UnLoadModelRequest) returns (UnLoadModelResponse);
Input
// // request for UnLoadModel rpc call // message UnLoadModelRequest { string name = 1; // Model name needs to match regex "^[a-zA-Z0-9](-*[a-zA-Z0-9])*$" }
Output
// // response for UnLoadModel rpc call // message UnLoadModelResponse {}

列出模型

列出所有加载的模型及其别名。

// // lists the loaded models // Status Codes: // 1. OK - unload is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // rpc ListModels(ListModelsRequest) returns (ListModelsResponse);
Input
// // request for ListModels rpc call // message ListModelsRequest {}
Output
// // response for ListModels rpc call // message ListModelsResponse { repeated Model models = 1; }

描述型号

描述加载到代理上的模型。

// // Status Codes: // 1. OK - load is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - model doesn't exist at the url // rpc DescribeModel(DescribeModelRequest) returns (DescribeModelResponse);
Input
// // request for DescribeModel rpc call // message DescribeModelRequest { string name = 1; }
Output
// // response for DescribeModel rpc call // message DescribeModelResponse { Model model = 1; }

捕获数据

允许客户端应用程序捕获 Amazon S3 存储桶中的输入和输出张量,并可选择使用辅助。客户端应用程序需要在每次调用此 API 时传递唯一的捕获 ID。这可以稍后用于查询捕获的状态。

// // allows users to capture input and output tensors along with auxiliary data. // Status Codes: // 1. OK - data capture successfully initiated // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 5. ALREADY_EXISTS - capture initiated for the given capture_id // 6. RESOURCE_EXHAUSTED - buffer is full cannot accept any more requests. // 7. OUT_OF_RANGE - timestamp is in the future. // 8. INVALID_ARGUMENT - capture_id is not of expected format. // rpc CaptureData(CaptureDataRequest) returns (CaptureDataResponse);
Input
enum Encoding { CSV = 0; JSON = 1; NONE = 2; BASE64 = 3; } // // AuxilaryData represents a payload of extra data to be capture along with inputs and outputs of inference // encoding - supports the encoding of the data // data - represents the data of shared memory, this could be passed in two ways: // a. send across the raw bytes of the multi-dimensional tensor array // b. send a SharedMemoryHandle which contains the posix shared memory segment id and // offset in bytes to location of multi-dimensional tensor array. // message AuxilaryData { string name = 1; Encoding encoding = 2; oneof data { bytes byte_data = 3; SharedMemoryHandle shared_memory_handle = 4; } } // // Tensor represents a tensor, encoded as contiguous multi-dimensional array. // tensor_metadata - represents metadata of the shared memory segment // data_or_handle - represents the data of shared memory, this could be passed in two ways: // a. send across the raw bytes of the multi-dimensional tensor array // b. send a SharedMemoryHandle which contains the posix shared memory segment // id and offset in bytes to location of multi-dimensional tensor array. // message Tensor { TensorMetadata tensor_metadata = 1; //optional in the predict request oneof data { bytes byte_data = 4; // will only be used for input tensors SharedMemoryHandle shared_memory_handle = 5; } } // // request for CaptureData rpc call // message CaptureDataRequest { string model_name = 1; string capture_id = 2; //uuid string Timestamp inference_timestamp = 3; repeated Tensor input_tensors = 4; repeated Tensor output_tensors = 5; repeated AuxilaryData inputs = 6; repeated AuxilaryData outputs = 7; }
Output
// // response for CaptureData rpc call // message CaptureDataResponse {}

获取捕获状态

根据加载的型号,输入和输出张量可能很大(对于许多边缘设备)。捕获到云可能非常耗时。因此,CaptureData()是作为异步操作实现的。捕获 ID 是客户端在捕获数据调用期间提供的唯一标识符,此 ID 可用于查询异步调用的状态。

// // allows users to query status of capture data operation // Status Codes: // 1. OK - data capture successfully initiated // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - given capture id doesn't exist. // rpc GetCaptureDataStatus(GetCaptureDataStatusRequest) returns (GetCaptureDataStatusResponse);
Input
// // request for GetCaptureDataStatus rpc call // message GetCaptureDataStatusRequest { string capture_id = 1; }
Output
enum CaptureDataStatus { FAILURE = 0; SUCCESS = 1; IN_PROGRESS = 2; NOT_FOUND = 3; } // // response for GetCaptureDataStatus rpc call // message GetCaptureDataStatusResponse { CaptureDataStatus status = 1; }

Predict

这些区域有:predictAPI 对先前加载的模型执行推断。它接受直接输入神经网络的张量形式的请求。输出是模型的输出张量(或标量)。这是一个阻止性调用。

// // perform inference on a model. // // Note: // 1. users can chose to send the tensor data in the protobuf message or // through a shared memory segment on a per tensor basis, the Predict // method with handle the decode transparently. // 2. serializing large tensors into the protobuf message can be quite expensive, // based on our measurements it is recommended to use shared memory of // tenors larger than 256KB. // 3. SMEdge IPC server will not use shared memory for returning output tensors, // i.e., the output tensor data will always send in byte form encoded // in the tensors of PredictResponse. // 4. currently SMEdge IPC server cannot handle concurrent predict calls, all // these call will be serialized under the hood. this shall be addressed // in a later release. // Status Codes: // 1. OK - prediction is successful // 2. UNKNOWN - unknown error has occurred // 3. INTERNAL - an internal error has occurred // 4. NOT_FOUND - when model not found // 5. INVALID_ARGUMENT - when tenors types mismatch // rpc Predict(PredictRequest) returns (PredictResponse);
Input
// request for Predict rpc call // message PredictRequest { string name = 1; repeated Tensor tensors = 2; } // // Tensor represents a tensor, encoded as contiguous multi-dimensional array. // tensor_metadata - represents metadata of the shared memory segment // data_or_handle - represents the data of shared memory, this could be passed in two ways: // a. send across the raw bytes of the multi-dimensional tensor array // b. send a SharedMemoryHandle which contains the posix shared memory segment // id and offset in bytes to location of multi-dimensional tensor array. // message Tensor { TensorMetadata tensor_metadata = 1; //optional in the predict request oneof data { bytes byte_data = 4; // will only be used for input tensors SharedMemoryHandle shared_memory_handle = 5; } } // // Tensor represents a tensor, encoded as contiguous multi-dimensional array. // tensor_metadata - represents metadata of the shared memory segment // data_or_handle - represents the data of shared memory, this could be passed in two ways: // a. send across the raw bytes of the multi-dimensional tensor array // b. send a SharedMemoryHandle which contains the posix shared memory segment // id and offset in bytes to location of multi-dimensional tensor array. // message Tensor { TensorMetadata tensor_metadata = 1; //optional in the predict request oneof data { bytes byte_data = 4; // will only be used for input tensors SharedMemoryHandle shared_memory_handle = 5; } } // // TensorMetadata represents the metadata for a tensor // name - name of the tensor // data_type - data type of the tensor // shape - array of dimensions of the tensor // message TensorMetadata { string name = 1; DataType data_type = 2; repeated int32 shape = 3; } // // SharedMemoryHandle represents a posix shared memory segment // offset - offset in bytes from the start of the shared memory segment. // segment_id - shared memory segment id corresponding to the posix shared memory segment. // size - size in bytes of shared memory segment to use from the offset position. // message SharedMemoryHandle { uint64 size = 1; uint64 offset = 2; uint64 segment_id = 3; }
Output
注意

这些区域有:PredictResponse仅返回Tensors而不是SharedMemoryHandle.

// response for Predict rpc call // message PredictResponse { repeated Tensor tensors = 1; }