Monitoring Progress Across Iterations - Amazon SageMaker AI
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Monitoring Progress Across Iterations

You can track metrics via MLflow.

Create an MLflow app

Using Studio UI: If you create a training job through the Studio UI, a default MLflow app is created automatically and selected by default under Advanced Options.

Using CLI: If you use the CLI, you must create an MLflow app and pass it as an input to the training job API request.

mlflow_app_name="<enter your MLflow app name>" role_arn="<enter your role ARN>" bucket_name="<enter your bucket name>" region="<enter your region>" mlflow_app_arn=$(aws sagemaker create-mlflow-app \ --name $mlflow_app_name \ --artifact-store-uri "s3://$bucket_name" \ --role-arn $role_arn \ --region $region)

Access the MLflow app

Using CLI: Create a pre-signed URL to access the MLflow app UI:

aws sagemaker create-presigned-mlflow-app-url \ --arn $mlflow_app_arn \ --region $region \ --output text

Using Studio UI: The Studio UI displays key metrics stored in MLflow and provides a link to the MLflow app UI.

Key metrics to track

Monitor these metrics across iterations to assess improvement and track the job progress:

For SFT

  • Training loss curves

  • Number of samples consumed and time to process samples

  • Performance accuracy on held-out test sets

  • Format compliance (e.g., valid JSON output rate)

  • Perplexity on domain-specific evaluation data

For RFT

  • Average reward scores over training

  • Reward distribution (percentage of high-reward responses)

  • Validation reward trends (watch for over-fitting)

  • Task-specific success rates (e.g., code execution pass rate, math problem accuracy)

General

  • Benchmark performance deltas between iterations

  • Human evaluation scores on representative samples

  • Production metrics (if deploying iteratively)

Determining when to stop

Stop iterating when:

  • Performance plateaus: Additional training no longer meaningfully improves target metrics

  • Technique switching helps: If one technique plateaus, try switching (e.g., SFT → RFT → SFT) to break through performance ceilings

  • Target metrics achieved: Your success criteria are met

  • Regression detected: New iterations degrade performance (see rollback procedures below)

For detailed evaluation procedures, refer to the Evaluation section.