Monitor experiment training metrics with Amazon CloudTrail
The training metrics for Amazon SageMaker Experiments are integrated with Amazon CloudTrail, a service that
provides a record of actions taken by a user, role, or an Amazon service. CloudTrail captures all API
calls for BatchPutMetrics
as events. SageMaker automatically calls
BatchPutMetrics
when you create an
experiment run using the SageMaker SDK for Python. Amazon CloudTrail captures data related to calls for
resource type AWS::SageMaker::ExperimentTrialComponent
.
Note
In the Studio Classic Experiments UI, trials are referred to as run groups and trial components are referred to as runs.
When you create an experiment run, you can also configure the continuous delivery of CloudTrail
events to an Amazon S3 bucket. Use CloudTrail to monitor all ingested training metrics for an experiment
run, including information such as the metric name, the training step of the recorded metric,
the timestamp, and the metric value. CloudTrail events also include the experiment run ARN, the ID of the
account that created the run, and the resource type, which should be
AWS::SageMaker::ExperimentTrialComponent
.
To monitor BatchPutMetrics
API calls as CloudTrail events, you must first set up the
logging of data plane API activity in CloudTrail. See Logging data
events for trails for more information. For granular control over which API calls you
want to selectively log and pay for, you can filter CloudTrail events by resource type. Specify
AWS::SageMaker::ExperimentTrialComponent
as a resource type to monitor calls to
the BatchPutMetrics
API. For more information, see DataResource in the
Amazon CloudTrail API
reference. To learn more about CloudTrail, see the Amazon CloudTrail User Guide.
For an in-depth explanation of how Amazon SageMaker works with Amazon CloudTrail, see Log Amazon SageMaker API Calls with Amazon CloudTrail.
The following is an example CloudTrail event for a training metric in an experiment run:
{ ... "eventTime":
"2022-12-14T21:53:41Z"
, "eventSource":"metrics-sagemaker.amazonaws.com"
, "eventName":"BatchPutMetrics"
, "awsRegion":"us-east-1"
, "sourceIPAddress":"192.0.2.0"
, "userAgent": "aws-cli/2.7.25 Python/3.9.11 Linux/5.4.214-134.408.amzn2int.x86_64 exe/x86_64.amzn.2 prompt/off command/sm-metrics.batch-put-metrics", "requestParameters": { "trialComponentName":"trial-component-name"
, "metricData": [ { "metricName":"foo"
, "timestamp":1670366870000
, "step":101
, "value":0.9
} ] }, ... "resources": [ { "accountId":"abcdef01234567890"
, "type":"AWS::SageMaker::ExperimentTrialComponent"
, "ARN":"arn:aws:sagemaker:us-east-1:1234567890abcdef0:experiment-trial-component/trial-component-name"
} ], ... }