Data Privacy in Amazon SageMaker - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Data Privacy in Amazon SageMaker

Amazon SageMaker collects aggregate information about the use of Amazon-owned and open source libraries used during training. SageMaker uses this aggregate metadata to improve services and customer experience.

The following sections provide explanations for the type of metadata that SageMaker collects and how to opt out of metadata collection.

Types of information collected

Usage Information

Metadata from Amazon-owned and open source libraries that are used with SageMaker training, such as those used for distributed training, compilation, and quantization.

Errors

Errors from unexpected behavior including failures, crashes, cascades, and failures that result from interacting with the SageMaker training platform.

How to opt out of metadata collection

You can opt out of sharing aggregated metadata with SageMaker training when creating a training job using the CreateTrainingJob API. If you are using the console to create training jobs, metadata collection is disabled by default.

Important

You must choose to opt out of metadata collection for each training job that you submit. You must also choose to opt out in an API call as shown in the following examples. You cannot choose to opt out inside a training script.

The following section shows how you can opt out of metadata collection using the Amazon CLI, Amazon SDK for Python (Boto3), or the SageMaker Python SDK.

Opt out of metadata collection using the Amazon Command Line Interface (Amazon CLI)

To opt out of metadata collection using the Amazon CLI, set the environment variable OPT_OUT_TRACKING to 1 in the create-training-job API as shown in the following code example.

aws sagemaker create-training-job \ --training-job-name your_job_name \ --algorithm-specification AlgorithmName=your_algorithm_name\ --output-data-config S3OutputPath=s3://bucket-name/key-name-prefix \ --resource-config InstanceType=ml.c5.xlarge, InstanceCount=1 \ --stopping-condition MaxRuntimeInSeconds=100 \ --environment OPT_OUT_TRACKING=1

Opt out of metadata collection using the Amazon SDK for Python (Boto3)

To opt out of metadata collection using the SDK for Python (Boto3), set the environment variable OPT_OUT_TRACKING to 1 in the create_training_job API as shown in the following code example.

boto3.client('sagemaker').create_training_job( TrainingJobName='your_training_job', AlgorithmSpecification={ 'AlgorithmName': 'your_algorithm_name', 'TrainingInputMode': 'File', }, RoleArn='your_arn', OutputDataConfig={ 'S3OutputPath': 's3://bucket-name/key-name-prefix', }, ResourceConfig={ 'InstanceType': 'ml.m4.xlarge', 'InstanceCount': 1, 'VolumeSizeInGB': 123, }, StoppingCondition={ 'MaxRuntimeInSeconds': 123, }, Environment={ 'OPT_OUT_TRACKING': '1' }, )

Opt out of metadata collection using the SageMaker Python SDK

To opt out of metadata collection using the SageMaker Python SDK, set the environment variable OPT_OUT_TRACKING to 1 inside a SageMaker estimator as shown in the following code example.

sagemaker.estimator( image_uri='path_to_container', role='rolearn', instance_count=1, instance_type='ml.c5.xlarge', environment={ 'OPT_OUT_TRACKING': '1' }, )

Opt out of metadata collection account-wide

If you want to opt-out of metadata collection for several accounts, you can set an environment variable to opt-out of tracking account-wide. You must use the SageMaker Python SDK to opt out of metadata collection at an account level.

The following code example shows how opt out of tracking account-wide.

SchemaVersion: '1.0' SageMaker: TrainingJob: Environment: 'OPT_OUT_TRACKING': '1'

For more information about how to opt out of tracking account-wide, see Configuring and using defaults with the SageMaker Python SDK.

Additional information

If your downstream service depends on SageMaker training

If you operate a service that relies on SageMaker training, it is highly recommended that you inform your customer about aggregate metadata collection in the SageMaker Training platform and present them with the choice to opt out. Alternatively, you can opt out of metadata collection on behalf of your customer.

If you are a client or a customer of a service that uses SageMaker training

If you are a client or customer of a service that uses SageMaker training, use your preferred method in the previous section to opt out of metadata collection.