Use your recipe as a SageMaker training job

Running a SageMaker training job

SageMaker HyperPod Recipes supports submitting a SageMaker training job. Before you submit the training job, you must update the cluster configuration, sm_job.yaml, and install corresponding environment.

Use your recipe as a SageMaker training job

You can use your recipe as a SageMaker training job if you aren't hosting a cluster. You must modify the SageMaker training job configuration file, sm_job.yaml, to run your recipe.


sm_jobs_config:
  output_path: null 
  tensorboard_config:
    output_path: null 
    container_logs_path: null
  wait: True 
  inputs: 
    s3: 
      train: null
      val: null
    file_system:  
      directory_path: null
  additional_estimator_kwargs: 
    max_run: 1800

output_path: You can specify where you're saving your model to an Amazon S3 URL.
tensorboard_config: You can specify a TensorBoard related configuration such as the output path or TensorBoard logs path.
wait: You can specify whether you're waiting for the job to be completed when you submit your training job.
inputs: You can specify the paths for your training and validation data. The data source can be from a shared filesystem such as Amazon FSx or an Amazon S3 URL.
additional_estimator_kwargs: Additional estimator arguments for submitting a training job to the SageMaker training job platform. For more information, see Algorithm Estimator.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Running a training job on HyperPod k8s

Considerations