使用 Graviton GPU PyTorch DLAMI - 深度学习 AMI
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 Amazon Web Services 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

使用 Graviton GPU PyTorch DLAMI

这些区域有:Amazon Deep Learning AMI已准备好与基于 Arm 处理器的 Graviton GPU 一起使用,并且已针对以下方面进行了优化 PyTorch. Graviton GPU PyTorch DLAMI 包括一个预先配置的 Python 环境PyTorch,TorchVision,以及TorchServe用于深度学习训练和推理用例。检查发布说明了解有关 Graviton GPU 的更多详细信息 PyTorch DLAMI。

验证 PyTorch Python 环境

Connect 您的 G5G 实例并使用以下命令激活基本 Conda 环境:

source activate base

您的命令提示符应表明您正在基本 Conda 环境中工作,其中包含 PyTorch, TorchVision,以及其他库。

(base) $

验证的默认刀具路径 PyTorch 环境:

(base) $ which python /opt/conda/bin/python (base) $ which pip /opt/conda/bin/pip (base) $ which conda /opt/conda/bin/conda (base) $ which mamba /opt/conda/bin/mamba

验证 Torch 和 TorchVersion 可用,请检查其版本并测试基本功能:

(base) $ python Python 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 23:06:28) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch, torchvision >>> torch.__version__ '1.10.0' >>> torchvision.__version__ '0.11.1' >>> v = torch.autograd.Variable(torch.randn(10, 3, 224, 224)) >>> v = torch.autograd.Variable(torch.randn(10, 3, 224, 224)).cuda() >>> assert isinstance(v, torch.Tensor)

使用以下命令运行训练样本 PyTorch

运行示例 MNIST 训练作业:

git clone https://github.com/pytorch/examples.git cd examples/mnist python main.py

您的输出应类似于以下内容:

... Train Epoch: 14 [56320/60000 (94%)] Loss: 0.021424 Train Epoch: 14 [56960/60000 (95%)] Loss: 0.023695 Train Epoch: 14 [57600/60000 (96%)] Loss: 0.001973 Train Epoch: 14 [58240/60000 (97%)] Loss: 0.007121 Train Epoch: 14 [58880/60000 (98%)] Loss: 0.003717 Train Epoch: 14 [59520/60000 (99%)] Loss: 0.001729 Test set: Average loss: 0.0275, Accuracy: 9916/10000 (99%)

使用以下命令运行推理示例 PyTorch

使用以下命令下载预训练的 densenet161 模型并使用以下命令运行推理 TorchServe:

# Set up TorchServe cd $HOME git clone https://github.com/pytorch/serve.git mkdir -p serve/model_store cd serve # Download a pre-trained densenet161 model wget https://download.pytorch.org/models/densenet161-8d451a50.pth >/dev/null # Save the model using torch-model-archiver torch-model-archiver --model-name densenet161 \ --version 1.0 \ --model-file examples/image_classifier/densenet_161/model.py \ --serialized-file densenet161-8d451a50.pth \ --handler image_classifier \ --extra-files examples/image_classifier/index_to_name.json \ --export-path model_store # Start the model server torchserve --start --no-config-snapshots \ --model-store model_store \ --models densenet161=densenet161.mar &> torchserve.log # Wait for the model server to start sleep 30 # Run a prediction request curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg

您的输出应类似于以下内容:

{ "tiger_cat": 0.4693363308906555, "tabby": 0.4633873701095581, "Egyptian_cat": 0.06456123292446136, "lynx": 0.0012828150065615773, "plastic_bag": 0.00023322898778133094 }

使用以下命令注销 densenet161 模型并停止服务器:

curl -X DELETE http://localhost:8081/models/densenet161/1.0 torchserve --stop

您的输出应类似于以下内容:

{ "status": "Model \"densenet161\" unregistered" } TorchServe has stopped.