使用 Graviton GPU DLAMI PyTorch - 深度学习 AMI
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

使用 Graviton GPU DLAMI PyTorch

Amazon Deep Learning AMI可随时与基于 Arm 处理器的 Graviton GPU 配合使用,并针对以下方面进行了优化。PyTorchGraviton GPU PyTorch DLAMI 包括一个预先配置了、和TorchServe用于深度学习训练和PyTorch推理用TorchVision例的 Python 环境。查看发行说明,了解有关 Graviton GPU PyTorch DLAMI 的更多详细信息。

验证 PyTorch Python

使用以下命令连接到您的 G5G 实例并激活基本 Conda 环境:

source activate base

您的命令提示符应表明您正在基本 Conda 环境中工作,该环境包含PyTorchTorchVision、和其他库。

(base) $

验证PyTorch环境的默认工具路径:

(base) $ which python /opt/conda/bin/python (base) $ which pip /opt/conda/bin/pip (base) $ which conda /opt/conda/bin/conda (base) $ which mamba /opt/conda/bin/mamba

验证 Torch 和TorchVersion是否可用,检查其版本,然后测试基本功能:

(base) $ python Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 23:06:28) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch, torchvision >>> torch.__version__ '1.10.0' >>> torchvision.__version__ '0.11.1' >>> v = torch.autograd.Variable(torch.randn(10, 3, 224, 224)) >>> v = torch.autograd.Variable(torch.randn(10, 3, 224, 224)).cuda() >>> assert isinstance(v, torch.Tensor)

使用以下命令运行训练示例 PyTorch

运行示例 MNIST 训练作业:

git clone https://github.com/pytorch/examples.git cd examples/mnist python main.py

您的输出应类似于以下内容:

... Train Epoch: 14 [56320/60000 (94%)] Loss: 0.021424 Train Epoch: 14 [56960/60000 (95%)] Loss: 0.023695 Train Epoch: 14 [57600/60000 (96%)] Loss: 0.001973 Train Epoch: 14 [58240/60000 (97%)] Loss: 0.007121 Train Epoch: 14 [58880/60000 (98%)] Loss: 0.003717 Train Epoch: 14 [59520/60000 (99%)] Loss: 0.001729 Test set: Average loss: 0.0275, Accuracy: 9916/10000 (99%)

使用以下命令运行推理示例 PyTorch

使用以下命令下载预训练的 densenet161 模型并使用以下命令运行推理:TorchServe

# Set up TorchServe cd $HOME git clone https://github.com/pytorch/serve.git mkdir -p serve/model_store cd serve # Download a pre-trained densenet161 model wget https://download.pytorch.org/models/densenet161-8d451a50.pth >/dev/null # Save the model using torch-model-archiver torch-model-archiver --model-name densenet161 \ --version 1.0 \ --model-file examples/image_classifier/densenet_161/model.py \ --serialized-file densenet161-8d451a50.pth \ --handler image_classifier \ --extra-files examples/image_classifier/index_to_name.json \ --export-path model_store # Start the model server torchserve --start --no-config-snapshots \ --model-store model_store \ --models densenet161=densenet161.mar &> torchserve.log # Wait for the model server to start sleep 30 # Run a prediction request curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg

您的输出应类似于以下内容:

{ "tiger_cat": 0.4693363308906555, "tabby": 0.4633873701095581, "Egyptian_cat": 0.06456123292446136, "lynx": 0.0012828150065615773, "plastic_bag": 0.00023322898778133094 }

使用以下命令取消注册 densenet161 模型并停止服务器:

curl -X DELETE http://localhost:8081/models/densenet161/1.0 torchserve --stop

您的输出应类似于以下内容:

{ "status": "Model \"densenet161\" unregistered" } TorchServe has stopped.