Monitor Amazon resources provisioned while using Amazon SageMaker - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Monitor Amazon resources provisioned while using Amazon SageMaker

Monitoring is an important part of maintaining the reliability, availability, and performance of SageMaker and your other Amazon solutions. Amazon provides the following monitoring tools to watch SageMaker, report when something is wrong, and take automatic actions when appropriate:

  • Amazon CloudWatch monitors your Amazon resources and the applications that you run on Amazon in real time. You can collect and track metrics, create customized dashboards, and set alarms that notify you or take actions when a specified metric reaches a threshold that you specify. For example, you can have CloudWatch track CPU usage or other metrics of your Amazon EC2 instances and automatically launch new instances when needed. For more information, see the Amazon CloudWatch User Guide.

  • Amazon CloudWatch Logs enables you to monitor, store, and access your log files from EC2 instances, Amazon CloudTrail, and other sources. CloudWatch Logs can monitor information in the log files and notify you when certain thresholds are met. You can also archive your log data in highly durable storage. For more information, see the Amazon CloudWatch Logs User Guide.

  • Amazon CloudTrail captures API calls and related events made by or on behalf of your Amazon account and delivers the log files to an Amazon S3 bucket that you specify. You can identify which users and accounts called Amazon, the source IP address from which the calls were made, and when the calls occurred. For more information, see the Amazon CloudTrail User Guide.

  • CloudWatch Events delivers a near real-time stream of system events that describe changes in Amazon resources. Create CloudWatch Events rules react to a status change in a SageMaker training, hyperparameter tuning, or batch transform job