Monitoring and Usage Tracking in Amazon Deep Learning Containers - Amazon Deep Learning Containers
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Monitoring and Usage Tracking in Amazon Deep Learning Containers

Your Amazon Deep Learning Containers do not come with monitoring utilities. For information on monitoring, see GPU Monitoring and Optimization, Monitoring Amazon EC2, Monitoring Amazon ECS, Monitoring Amazon EKS, and Monitoring Amazon SageMaker Studio.

Usage Tracking

Amazon uses customer feedback and usage information to improve the quality of the services and software we offer to customers. We have added usage data collection to the supported Amazon Deep Learning Containers in order to better understand customer usage and guide future improvements. Usage tracking for Deep Learning Containers is activated by default. Customers can change their settings at any point of time to activate or deactivate usage tracking.

Usage tracking for Amazon Deep Learning Containers collects the instance ID, frameworks, framework versions, container types, and Python versions used for the containers. Amazon also logs the event time in which it receives this metadata.

No information on the commands used within the containers is collected or retained. No other information about the containers is collected or retained.

To opt out of usage tracking, set the OPT_OUT_TRACKING environment variable to true.

OPT_OUT_TRACKING=true

Failure Rate Tracking

When using a first-party Amazon SageMaker Amazon Deep Learning Containers container, the SageMaker team will collect failure rate metadata to improve the quality of Amazon Deep Learning Containers. Failure rate tracking for Amazon Deep Learning Containers is active by default. Customers can change their settings to activate or deactivate failure rate tracking when creating an Amazon SageMaker endpoint.

Failure rate tracking for Amazon Deep Learning Containers collects the Instance ID, ModelServer name, ModelServer version, ErrorType, and ErrorCode. Amazon also logs the event time in which it receives this metadata.

No information on the commands used within the containers is collected or retained. No other information about the containers is collected or retained.

To opt out of failure rate tracking, set the OPT_OUT_TRACKING environment variable to true.

OPT_OUT_TRACKING=true

Usage Tracking in the following Framework Versions

These framework versions are no longer supported:

  • TensorFlow 1.15

  • TensorFlow 2.0

  • TensorFlow 2.1

  • PyTorch 1.2

  • PyTorch 1.3.1

  • MXNet 1.6

For a full description of our support policy, see Framework Support Policy.

While we recommend updating to supported Deep Learning Containers, to opt-out of usage tracking for Deep Learning Containers that use these frameworks, set the OPT_OUT_TRACKING environment variable to true and use a custom entry point to disable the call for the following services: