How Amazon SageMaker Processing Configures Input and Output For Your Processing Container

When you create a processing job using the CreateProcessingJob operation, you can specify multiple ProcessingInput and ProcessingOutput. values.

You use the ProcessingInput parameter to specify an Amazon Simple Storage Service (Amazon S3) URI to download data from, and a path in your processing container to download the data to. The ProcessingOutput parameter configures a path in your processing container from which to upload data, and where in Amazon S3 to upload that data to. For both ProcessingInput and ProcessingOutput, the path in the processing container must begin with /opt/ml/processing/.

For example, you might create a processing job with one ProcessingInput parameter that downloads data from s3://your-data-bucket/path/to/input/csv/data into /opt/ml/processing/csv in your processing container, and a ProcessingOutput parameter that uploads data from /opt/ml/processing/processed_csv to s3://your-data-bucket/path/to/output/csv/data. Your processing job would read the input data, and write output data to /opt/ml/processing/processed_csv. Then it uploads the data written to this path to the specified Amazon S3 output location.

Important

Symbolic links (symlinks) can not be used to upload output data to Amazon S3. Symlinks are not followed when uploading output data.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

How Amazon SageMaker Processing Runs Your Processing Container Image

How Amazon SageMaker Processing Provides Logs and Metrics for Your Processing Container