Experiments FAQs - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Experiments FAQs

Refer to the following FAQ items for answers to commonly asked questions about SageMaker Experiments.

A: Experiments are a collection of runs aimed at finding the best model to solve a problem. To initialize a run within an experiment, use the SageMaker Python SDK Run class. For more examples, see Create an Amazon SageMaker Experiment.

Yes. You can create experiments using SageMaker script mode. In the Jupyter notebook or Python file you are using to define your estimator, initialize a run using the Run class. Within the run, launch an estimator with your custom entry point script. Within that entry point script, use the load_run method to initialize the run you defined within the entry point script and log your metrics. For in-depth examples, see Track experiments for SageMaker training jobs using script mode.

SageMaker Hyperparameter Optimzation (HPO) jobs (also known as tuning jobs) automatically create experiments to track all the training jobs launched during a hyperparameter search. All other SageMaker jobs create unassigned runs unless launched from within an experiment.

You can use SageMaker Experiments to track metrics from training jobs, processing jobs, and transform jobs.

Experiment runs that are automatically created by SageMaker jobs and containers are visible in the Experiments Studio Classic UI by default. To hide runs created by SageMaker jobs for a given experiment, choose the settings icon ( 
            The settings icon for Studio Classic.
          ) and toggle Show jobs.

Yes, the SageMaker Experiments SDK is still supported. However, as of v2.123.0, SageMaker Experiments is fully integrated with the SageMaker Python SDK. We recommend using the SageMaker Python SDK to create experiments and runs. For more information, see Create an Amazon SageMaker Experiment.

A: Yes. However, metrics for distributed training can be logged only at the epoch level. Be sure that you only log metrics generated by the leader node, as shown in the following example:

... if rank == 0: test_loss, correct, target, pred = test(model, test_loader, device, tracker) logger.info( "Test Average loss: {:.4f}, Test Accuracy: {:.0f}%;\n".format( test_loss, test_accuracy) ) ) run.log_metric(name = "train_loss", value = loss.item(), step = epoch) run.log_metric(name = "test_loss", value = test_loss, step = epoch) run.log_metric(name = "test_accuracy", value = test_accuracy, step = epoch) ...

For more information, see the Run a SageMaker Experiment with Pytorch Distributed Data Parallel - MNIST Handwritten Digits Classification example notebook.

A: All jobs in SageMaker (training jobs, processing jobs, transform jobs) correspond to runs. When launching these jobs, TrialComponents are created by default. TrialComponents map directly to runs. If these jobs are launched without being explicitly associated with an experiment or run, they are created as unassigned runs.

A: Yes. You need to load the run context into the training script, along with the SageMaker session information.

from sagemaker.session import Session from sagemaker.experiments.run import load_run session = Session(boto3.session.Session(region_name=args.region)) with load_run(sagemaker_session=session) as run: run.log_parameters( {"num_train_samples": len(train_set.data), "num_test_samples": len(test_set.data)} )

A: If you already created a comparison for your experiment and want to add a new run to analyze, select all the runs from your previous analysis as well as the new run and choose Analyze. If you don’t see your new run in the resulting analysis page, then refresh the Studio Classic browser. Note that refreshing your Studio Classic browser may impact your other open tabs.