

# Built-in SageMaker AI Algorithms for Computer Vision
<a name="algorithms-vision"></a>

SageMaker AI provides image processing algorithms that are used for image classification, object detection, and computer vision.
+ [Image Classification - MXNet](image-classification.md)—uses example data with answers (referred to as a *supervised algorithm*). Use this algorithm to classify images.
+ [Image Classification - TensorFlow](image-classification-tensorflow.md)—uses pretrained TensorFlow Hub models to fine-tune for specific tasks (referred to as a *supervised algorithm*). Use this algorithm to classify images.
+ [Object Detection - MXNet](object-detection.md)—detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene.
+ [Object Detection - TensorFlow](object-detection-tensorflow.md)—detects bounding boxes and object labels in an image. It is a supervised learning algorithm that supports transfer learning with available pretrained TensorFlow models.
+ [Semantic Segmentation Algorithm](semantic-segmentation.md)—provides a fine-grained, pixel-level approach to developing computer vision applications.


| Algorithm name | Channel name | Training input mode | File type | Instance class | Parallelizable | 
| --- | --- | --- | --- | --- | --- | 
| Image Classification - MXNet | train and validation, (optionally) train\$1lst, validation\$1lst, and model | File or Pipe | recordIO or image files (.jpg or .png)  | GPU | Yes | 
| Image Classification - TensorFlow | training and validation | File | image files (.jpg, .jpeg, or .png)  | CPU or GPU | Yes (only across multiple GPUs on a single instance) | 
| Object Detection | train and validation, (optionally) train\$1annotation, validation\$1annotation, and model | File or Pipe | recordIO or image files (.jpg or .png)  | GPU | Yes | 
| Object Detection - TensorFlow | training and validation | File | image files (.jpg, .jpeg, or .png)  | GPU | Yes (only across multiple GPUs on a single instance) | 
| Semantic Segmentation | train and validation, train\$1annotation, validation\$1annotation, and (optionally) label\$1map and model | File or Pipe | Image files | GPU (single instance only) | No | 

# Image Classification - MXNet
<a name="image-classification"></a>

The Amazon SageMaker image classification algorithm is a supervised learning algorithm that supports multi-label classification. It takes an image as input and outputs one or more labels assigned to that image. It uses a convolutional neural network that can be trained from scratch or trained using transfer learning when a large number of training images are not available 

The recommended input format for the Amazon SageMaker AI image classification algorithms is Apache MXNet [RecordIO](https://mxnet.apache.org/api/faq/recordio). However, you can also use raw images in .jpg or .png format. Refer to [this discussion](https://mxnet.apache.org/api/architecture/note_data_loading) for a broad overview of efficient data preparation and loading for machine learning systems. 

**Note**  
To maintain better interoperability with existing deep learning frameworks, this differs from the protobuf data formats commonly used by other Amazon SageMaker AI algorithms.

For more information on convolutional networks, see: 
+ [Deep residual learning for image recognition](https://arxiv.org/abs/1512.03385) Kaiming He, et al., 2016 IEEE Conference on Computer Vision and Pattern Recognition
+ [ImageNet image database](http://www.image-net.org/)
+ [Image classification with Gluon-CV and MXNet](https://gluon-cv.mxnet.io/build/examples_classification/index.html)

**Topics**
+ [

## Input/Output Interface for the Image Classification Algorithm
](#IC-inputoutput)
+ [

## EC2 Instance Recommendation for the Image Classification Algorithm
](#IC-instances)
+ [

## Image Classification Sample Notebooks
](#IC-sample-notebooks)
+ [

# How Image Classification Works
](IC-HowItWorks.md)
+ [

# Image Classification Hyperparameters
](IC-Hyperparameter.md)
+ [

# Tune an Image Classification Model
](IC-tuning.md)

## Input/Output Interface for the Image Classification Algorithm
<a name="IC-inputoutput"></a>

The SageMaker AI Image Classification algorithm supports both RecordIO (`application/x-recordio`) and image (`image/png`, `image/jpeg`, and `application/x-image`) content types for training in file mode, and supports the RecordIO (`application/x-recordio`) content type for training in pipe mode. However, you can also train in pipe mode using the image files (`image/png`, `image/jpeg`, and `application/x-image`), without creating RecordIO files, by using the augmented manifest format.

Distributed training is supported for file mode and pipe mode. When using the RecordIO content type in pipe mode, you must set the `S3DataDistributionType` of the `S3DataSource` to `FullyReplicated`. The algorithm supports a fully replicated model where your data is copied onto each machine.

The algorithm supports `image/png`, `image/jpeg`, and `application/x-image` for inference.

### Train with RecordIO Format
<a name="IC-recordio-training"></a>

If you use the RecordIO format for training, specify both `train` and `validation` channels as values for the `InputDataConfig` parameter of the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Specify one RecordIO (`.rec`) file in the `train` channel and one RecordIO file in the `validation` channel. Set the content type for both channels to `application/x-recordio`. 

### Train with Image Format
<a name="IC-image-training"></a>

If you use the Image format for training, specify `train`, `validation`, `train_lst`, and `validation_lst` channels as values for the `InputDataConfig` parameter of the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Specify the individual image data (`.jpg` or `.png` files) for the `train` and `validation` channels. Specify one `.lst` file in each of the `train_lst` and `validation_lst` channels. Set the content type for all four channels to `application/x-image`. 

**Note**  
SageMaker AI reads the training and validation data separately from different channels, so you must store the training and validation data in different folders.

A `.lst` file is a tab-separated file with three columns that contains a list of image files. The first column specifies the image index, the second column specifies the class label index for the image, and the third column specifies the relative path of the image file. The image index in the first column must be unique across all of the images. The set of class label indices are numbered successively and the numbering should start with 0. For example, 0 for the cat class, 1 for the dog class, and so on for additional classes. 

 The following is an example of a `.lst` file: 

```
5      1   your_image_directory/train_img_dog1.jpg
1000   0   your_image_directory/train_img_cat1.jpg
22     1   your_image_directory/train_img_dog2.jpg
```

For example, if your training images are stored in `s3://<your_bucket>/train/class_dog`, `s3://<your_bucket>/train/class_cat`, and so on, specify the path for your `train` channel as `s3://<your_bucket>/train`, which is the top-level directory for your data. In the `.lst` file, specify the relative path for an individual file named `train_image_dog1.jpg` in the `class_dog` class directory as `class_dog/train_image_dog1.jpg`. You can also store all your image files under one subdirectory inside the `train` directory. In that case, use that subdirectory for the relative path. For example, `s3://<your_bucket>/train/your_image_directory`. 

### Train with Augmented Manifest Image Format
<a name="IC-augmented-manifest-training"></a>

The augmented manifest format enables you to do training in Pipe mode using image files without needing to create RecordIO files. You need to specify both train and validation channels as values for the `InputDataConfig` parameter of the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. While using the format, an S3 manifest file needs to be generated that contains the list of images and their corresponding annotations. The manifest file format should be in [JSON Lines](http://jsonlines.org/) format in which each line represents one sample. The images are specified using the `'source-ref'` tag that points to the S3 location of the image. The annotations are provided under the `"AttributeNames"` parameter value as specified in the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. It can also contain additional metadata under the `metadata` tag, but these are ignored by the algorithm. In the following example, the `"AttributeNames"` are contained in the list of image and annotation references `["source-ref", "class"]`. The corresponding label value is `"0"` for the first image and `“1”` for the second image:

```
{"source-ref":"s3://image/filename1.jpg", "class":"0"}
{"source-ref":"s3://image/filename2.jpg", "class":"1", "class-metadata": {"class-name": "cat", "type" : "groundtruth/image-classification"}}
```

The order of `"AttributeNames"` in the input files matters when training the ImageClassification algorithm. It accepts piped data in a specific order, with `image` first, followed by `label`. So the "AttributeNames" in this example are provided with `"source-ref"` first, followed by `"class"`. When using the ImageClassification algorithm with Augmented Manifest, the value of the `RecordWrapperType` parameter must be `"RecordIO"`.

Multi-label training is also supported by specifying a JSON array of values. The `num_classes` hyperparameter must be set to match the total number of classes. There are two valid label formats: multi-hot and class-id. 

In the multi-hot format, each label is a multi-hot encoded vector of all classes, where each class takes the value of 0 or 1. In the following example, there are three classes. The first image is labeled with classes 0 and 2, while the second image is labeled with class 2 only: 

```
{"image-ref": "s3://amzn-s3-demo-bucket/sample01/image1.jpg", "class": "[1, 0, 1]"}
{"image-ref": "s3://amzn-s3-demo-bucket/sample02/image2.jpg", "class": "[0, 0, 1]"}
```

In the class-id format, each label is a list of the class ids, from [0, `num_classes`), which apply to the data point. The previous example would instead look like this:

```
{"image-ref": "s3://amzn-s3-demo-bucket/sample01/image1.jpg", "class": "[0, 2]"}
{"image-ref": "s3://amzn-s3-demo-bucket/sample02/image2.jpg", "class": "[2]"}
```

The multi-hot format is the default, but can be explicitly set in the content type with the `label-format` parameter: `"application/x-recordio; label-format=multi-hot".` The class-id format, which is the format outputted by GroundTruth, must be set explicitly: `"application/x-recordio; label-format=class-id".`

For more information on augmented manifest files, see [Augmented Manifest Files for Training Jobs](augmented-manifest.md).

### Incremental Training
<a name="IC-incremental-training"></a>

You can also seed the training of a new model with the artifacts from a model that you trained previously with SageMaker AI. Incremental training saves training time when you want to train a new model with the same or similar data. SageMaker AI image classification models can be seeded only with another built-in image classification model trained in SageMaker AI.

To use a pretrained model, in the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request, specify the `ChannelName` as "model" in the `InputDataConfig` parameter. Set the `ContentType` for the model channel to `application/x-sagemaker-model`. The input hyperparameters of both the new model and the pretrained model that you upload to the model channel must have the same settings for the `num_layers`, `image_shape` and `num_classes` input parameters. These parameters define the network architecture. For the pretrained model file, use the compressed model artifacts (in .tar.gz format) output by SageMaker AI. You can use either RecordIO or image formats for input data.

### Inference with the Image Classification Algorithm
<a name="IC-inference"></a>

The generated models can be hosted for inference and support encoded `.jpg` and `.png` image formats as `image/png, image/jpeg`, and `application/x-image` content-type. The input image is resized automatically. The output is the probability values for all classes encoded in JSON format, or in [JSON Lines text format](http://jsonlines.org/) for batch transform. The image classification model processes a single image per request and so outputs only one line in the JSON or JSON Lines format. The following is an example of a response in JSON Lines format:

```
accept: application/jsonlines

 {"prediction": [prob_0, prob_1, prob_2, prob_3, ...]}
```

For more details on training and inference, see the image classification sample notebook instances referenced in the introduction.

## EC2 Instance Recommendation for the Image Classification Algorithm
<a name="IC-instances"></a>

For image classification, we support P2, P3, G4dn, and G5 instances. We recommend using GPU instances with more memory for training with large batch sizes. You can also run the algorithm on multi-GPU and multi-machine settings for distributed training. Both CPU (such as C4) and GPU (P2, P3, G4dn, or G5) instances can be used for inference.

## Image Classification Sample Notebooks
<a name="IC-sample-notebooks"></a>

For a sample notebook that uses the SageMaker AI image classification algorithm, see [Build and Register an MXNet Image Classification Model via SageMaker Pipelines](https://github.com/aws-samples/amazon-sagemaker-pipelines-mxnet-image-classification/blob/main/image-classification-sagemaker-pipelines.ipynb). For instructions how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see [Amazon SageMaker notebook instances](nbi.md). Once you have created a notebook instance and opened it, select the **SageMaker AI Examples** tab to see a list of all the SageMaker AI samples. The example image classification notebooks are located in the **Introduction to Amazon algorithms** section. To open a notebook, click on its **Use** tab and select **Create copy**.

# How Image Classification Works
<a name="IC-HowItWorks"></a>

The image classification algorithm takes an image as input and classifies it into one of the output categories. Deep learning has revolutionized the image classification domain and has achieved great performance. Various deep learning networks such as [ResNet](https://arxiv.org/abs/1512.03385), [DenseNet](https://arxiv.org/abs/1608.06993), [Inception](https://arxiv.org/pdf/1409.4842.pdf), and so on, have been developed to be highly accurate for image classification. At the same time, there have been efforts to collect labeled image data that are essential for training these networks. [ImageNet](https://www.image-net.org/) is one such large dataset that has more than 11 million images with about 11,000 categories. Once a network is trained with ImageNet data, it can then be used to generalize with other datasets as well, by simple re-adjustment or fine-tuning. In this transfer learning approach, a network is initialized with weights (in this example, trained on ImageNet), which can be later fine-tuned for an image classification task in a different dataset. 

Image classification in Amazon SageMaker AI can be run in two modes: full training and transfer learning. In full training mode, the network is initialized with random weights and trained on user data from scratch. In transfer learning mode, the network is initialized with pre-trained weights and just the top fully connected layer is initialized with random weights. Then, the whole network is fine-tuned with new data. In this mode, training can be achieved even with a smaller dataset. This is because the network is already trained and therefore can be used in cases without sufficient training data.

# Image Classification Hyperparameters
<a name="IC-Hyperparameter"></a>

Hyperparameters are parameters that are set before a machine learning model begins learning. The following hyperparameters are supported by the Amazon SageMaker AI built-in Image Classification algorithm. See [Tune an Image Classification Model](IC-tuning.md) for information on image classification hyperparameter tuning. 


| Parameter Name | Description | 
| --- | --- | 
| num\$1classes | Number of output classes. This parameter defines the dimensions of the network output and is typically set to the number of classes in the dataset. Besides multi-class classification, multi-label classification is supported too. Please refer to [Input/Output Interface for the Image Classification Algorithm](image-classification.md#IC-inputoutput) for details on how to work with multi-label classification with augmented manifest files.  **Required** Valid values: positive integer  | 
| num\$1training\$1samples | Number of training examples in the input dataset. If there is a mismatch between this value and the number of samples in the training set, then the behavior of the `lr_scheduler_step` parameter is undefined and distributed training accuracy might be affected. **Required** Valid values: positive integer  | 
| augmentation\$1type |  Data augmentation type. The input images can be augmented in multiple ways as specified below. [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/IC-Hyperparameter.html) **Optional**  Valid values: `crop`, `crop_color`, or `crop_color_transform`. Default value: no default value  | 
| beta\$11 | The beta1 for `adam`, that is the exponential decay rate for the first moment estimates. **Optional**  Valid values: float. Range in [0, 1]. Default value: 0.9 | 
| beta\$12 | The beta2 for `adam`, that is the exponential decay rate for the second moment estimates. **Optional**  Valid values: float. Range in [0, 1]. Default value: 0.999 | 
| checkpoint\$1frequency | Period to store model parameters (in number of epochs). Note that all checkpoint files are saved as part of the final model file "model.tar.gz" and uploaded to S3 to the specified model location. This increases the size of the model file proportionally to the number of checkpoints saved during training. **Optional** Valid values: positive integer no greater than `epochs`. Default value: no default value (Save checkpoint at the epoch that has the best validation accuracy) | 
| early\$1stopping | `True` to use early stopping logic during training. `False` not to use it. **Optional** Valid values: `True` or `False` Default value: `False` | 
| early\$1stopping\$1min\$1epochs | The minimum number of epochs that must be run before the early stopping logic can be invoked. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 10 | 
| early\$1stopping\$1patience | The number of epochs to wait before ending training if no improvement is made in the relevant metric. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 5 | 
| early\$1stopping\$1tolerance | Relative tolerance to measure an improvement in accuracy validation metric. If the ratio of the improvement in accuracy divided by the previous best accuracy is smaller than the `early_stopping_tolerance` value set, early stopping considers there is no improvement. It is used only when `early_stopping` = `True`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.0 | 
| epochs | Number of training epochs. **Optional** Valid values: positive integer Default value: 30 | 
| eps | The epsilon for `adam` and `rmsprop`. It is usually set to a small value to avoid division by 0. **Optional** Valid values: float. Range in [0, 1]. Default value: 1e-8 | 
| gamma | The gamma for `rmsprop`, the decay factor for the moving average of the squared gradient. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.9 | 
| image\$1shape | The input image dimensions, which is the same size as the input layer of the network. The format is defined as '`num_channels`, height, width'. The image dimension can take on any value as the network can handle varied dimensions of the input. However, there may be memory constraints if a larger image dimension is used. Pretrained models can use only a fixed 224 x 224 image size. Typical image dimensions for image classification are '3,224,224'. This is similar to the ImageNet dataset.  For training, if any input image is smaller than this parameter in any dimension, training fails. If an image is larger, a portion of the image is cropped, with the cropped area specified by this parameter. If hyperparameter `augmentation_type` is set, random crop is taken; otherwise, central crop is taken.  At inference, input images are resized to the `image_shape` that was used during training. Aspect ratio is not preserved, and images are not cropped. **Optional** Valid values: string Default value: ‘3,224,224’ | 
| kv\$1store |  Weight update synchronization mode during distributed training. The weight updates can be updated either synchronously or asynchronously across machines. Synchronous updates typically provide better accuracy than asynchronous updates but can be slower. See distributed training in MXNet for more details. This parameter is not applicable to single machine training. [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/IC-Hyperparameter.html) **Optional** Valid values: `dist_sync` or `dist_async` Default value: no default value  | 
| learning\$1rate | Initial learning rate. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.1 | 
| lr\$1scheduler\$1factor | The ratio to reduce learning rate used in conjunction with the `lr_scheduler_step` parameter, defined as `lr_new` = `lr_old` \$1 `lr_scheduler_factor`. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.1 | 
| lr\$1scheduler\$1step | The epochs at which to reduce the learning rate. As explained in the `lr_scheduler_factor` parameter, the learning rate is reduced by `lr_scheduler_factor` at these epochs. For example, if the value is set to "10, 20", then the learning rate is reduced by `lr_scheduler_factor` after 10th epoch and again by `lr_scheduler_factor` after 20th epoch. The epochs are delimited by ",". **Optional** Valid values: string Default value: no default value | 
| mini\$1batch\$1size | The batch size for training. In a single-machine multi-GPU setting, each GPU handles `mini_batch_size`/num\$1gpu training samples. For the multi-machine training in dist\$1sync mode, the actual batch size is `mini_batch_size`\$1number of machines. See MXNet docs for more details. **Optional** Valid values: positive integer Default value: 32 | 
| momentum | The momentum for `sgd` and `nag`, ignored for other optimizers. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.9 | 
| multi\$1label |  Flag to use for multi-label classification where each sample can be assigned multiple labels. Average accuracy across all classes is logged. **Optional** Valid values: 0 or 1 Default value: 0  | 
| num\$1layers | Number of layers for the network. For data with large image size (for example, 224x224 - like ImageNet), we suggest selecting the number of layers from the set [18, 34, 50, 101, 152, 200]. For data with small image size (for example, 28x28 - like CIFAR), we suggest selecting the number of layers from the set [20, 32, 44, 56, 110]. The number of layers in each set is based on the ResNet paper. For transfer learning, the number of layers defines the architecture of base network and hence can only be selected from the set [18, 34, 50, 101, 152, 200]. **Optional** Valid values: positive integer in [18, 34, 50, 101, 152, 200] or [20, 32, 44, 56, 110] Default value: 152 | 
| optimizer | The optimizer type. For more details of the parameters for the optimizers, please refer to MXNet's API. **Optional** Valid values: One of `sgd`, `adam`, `rmsprop`, or `nag`. [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/IC-Hyperparameter.html) Default value: `sgd` | 
| precision\$1dtype | The precision of the weights used for training. The algorithm can use either single precision (`float32`) or half precision (`float16`) for the weights. Using half-precision for weights results in reduced memory consumption. **Optional** Valid values: `float32` or `float16` Default value: `float32` | 
| resize | The number of pixels in the shortest side of an image after resizing it for training. If the parameter is not set, then the training data is used without resizing. The parameter should be larger than both the width and height components of `image_shape` to prevent training failure. **Required** when using image content types **Optional** when using the RecordIO content type Valid values: positive integer Default value: no default value  | 
| top\$1k | Reports the top-k accuracy during training. This parameter has to be greater than 1, since the top-1 training accuracy is the same as the regular training accuracy that has already been reported. **Optional** Valid values: positive integer larger than 1. Default value: no default value | 
| use\$1pretrained\$1model | Flag to use pre-trained model for training. If set to 1, then the pretrained model with the corresponding number of layers is loaded and used for training. Only the top FC layer are reinitialized with random weights. Otherwise, the network is trained from scratch. **Optional** Valid values: 0 or 1 Default value: 0 | 
| use\$1weighted\$1loss |  Flag to use weighted cross-entropy loss for multi-label classification (used only when `multi_label` = 1), where the weights are calculated based on the distribution of classes. **Optional** Valid values: 0 or 1 Default value: 0  | 
| weight\$1decay | The coefficient weight decay for `sgd` and `nag`, ignored for other optimizers. **Optional** Valid values: float. Range in [0, 1]. Default value: 0.0001 | 

# Tune an Image Classification Model
<a name="IC-tuning"></a>

*Automatic model tuning*, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.

For more information about model tuning, see [Automatic model tuning with SageMaker AI](automatic-model-tuning.md).

## Metrics Computed by the Image Classification Algorithm
<a name="IC-metrics"></a>

The image classification algorithm is a supervised algorithm. It reports an accuracy metric that is computed during training. When tuning the model, choose this metric as the objective metric.


| Metric Name | Description | Optimization Direction | 
| --- | --- | --- | 
| validation:accuracy | The ratio of the number of correct predictions to the total number of predictions made. | Maximize | 

## Tunable Image Classification Hyperparameters
<a name="IC-tunable-hyperparameters"></a>

Tune an image classification model with the following hyperparameters. The hyperparameters that have the greatest impact on image classification objective metrics are: `mini_batch_size`, `learning_rate`, and `optimizer`. Tune the optimizer-related hyperparameters, such as `momentum`, `weight_decay`, `beta_1`, `beta_2`, `eps`, and `gamma`, based on the selected `optimizer`. For example, use `beta_1` and `beta_2` only when `adam` is the `optimizer`.

For more information about which hyperparameters are used in each optimizer, see [Image Classification Hyperparameters](IC-Hyperparameter.md).


| Parameter Name | Parameter Type | Recommended Ranges | 
| --- | --- | --- | 
| beta\$11 | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.999 | 
| beta\$12 | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.999 | 
| eps | ContinuousParameterRanges | MinValue: 1e-8, MaxValue: 1.0 | 
| gamma | ContinuousParameterRanges | MinValue: 1e-8, MaxValue: 0.999 | 
| learning\$1rate | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.5 | 
| mini\$1batch\$1size | IntegerParameterRanges | MinValue: 8, MaxValue: 512 | 
| momentum | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.999 | 
| optimizer | CategoricalParameterRanges | ['sgd', ‘adam’, ‘rmsprop’, 'nag'] | 
| weight\$1decay | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.999 | 

# Image Classification - TensorFlow
<a name="image-classification-tensorflow"></a>

The Amazon SageMaker Image Classification - TensorFlow algorithm is a supervised learning algorithm that supports transfer learning with many pretrained models from the [TensorFlow Hub](https://tfhub.dev/s?fine-tunable=yes&module-type=image-classification&subtype=module,placeholder&tf-version=tf2). Use transfer learning to fine-tune one of the available pretrained models on your own dataset, even if a large amount of image data is not available. The image classification algorithm takes an image as input and outputs a probability for each provided class label. Training datasets must consist of images in .jpg, .jpeg, or .png format. This page includes information about Amazon EC2 instance recommendations and sample notebooks for Image Classification - TensorFlow.

**Topics**
+ [

# How to use the SageMaker Image Classification - TensorFlow algorithm
](IC-TF-how-to-use.md)
+ [

# Input and output interface for the Image Classification - TensorFlow algorithm
](IC-TF-inputoutput.md)
+ [

## Amazon EC2 instance recommendation for the Image Classification - TensorFlow algorithm
](#IC-TF-instances)
+ [

## Image Classification - TensorFlow sample notebooks
](#IC-TF-sample-notebooks)
+ [

# How Image Classification - TensorFlow Works
](IC-TF-HowItWorks.md)
+ [

# TensorFlow Hub Models
](IC-TF-Models.md)
+ [

# Image Classification - TensorFlow Hyperparameters
](IC-TF-Hyperparameter.md)
+ [

# Tune an Image Classification - TensorFlow model
](IC-TF-tuning.md)

# How to use the SageMaker Image Classification - TensorFlow algorithm
<a name="IC-TF-how-to-use"></a>

You can use Image Classification - TensorFlow as an Amazon SageMaker AI built-in algorithm. The following section describes how to use Image Classification - TensorFlow with the SageMaker AI Python SDK. For information on how to use Image Classification - TensorFlow from the Amazon SageMaker Studio Classic UI, see [SageMaker JumpStart pretrained models](studio-jumpstart.md).

The Image Classification - TensorFlow algorithm supports transfer learning using any of the compatible pretrained TensorFlow Hub models. For a list of all available pretrained models, see [TensorFlow Hub Models](IC-TF-Models.md). Every pretrained model has a unique `model_id`. The following example uses MobileNet V2 1.00 224 (`model_id`: `tensorflow-ic-imagenet-mobilenet-v2-100-224-classification-4`) to fine-tune on a custom dataset. The pretrained models are all pre-downloaded from the TensorFlow Hub and stored in Amazon S3 buckets so that training jobs can run in network isolation. Use these pre-generated model training artifacts to construct a SageMaker AI Estimator.

First, retrieve the Docker image URI, training script URI, and pretrained model URI. Then, change the hyperparameters as you see fit. You can see a Python dictionary of all available hyperparameters and their default values with `hyperparameters.retrieve_default`. For more information, see [Image Classification - TensorFlow Hyperparameters](IC-TF-Hyperparameter.md). Use these values to construct a SageMaker AI Estimator.

**Note**  
Default hyperparameter values are different for different models. For larger models, the default batch size is smaller and the `train_only_top_layer` hyperparameter is set to `"True"`.

This example uses the [https://www.tensorflow.org/datasets/catalog/tf_flowers](https://www.tensorflow.org/datasets/catalog/tf_flowers) dataset, which contains five classes of flower images. We pre-downloaded the dataset from TensorFlow under the Apache 2.0 license and made it available with Amazon S3. To fine-tune your model, call `.fit` using the Amazon S3 location of your training dataset.

```
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.estimator import Estimator

model_id, model_version = "tensorflow-ic-imagenet-mobilenet-v2-100-224-classification-4", "*"
training_instance_type = "ml.p3.2xlarge"

# Retrieve the Docker image
train_image_uri = image_uris.retrieve(model_id=model_id,model_version=model_version,image_scope="training",instance_type=training_instance_type,region=None,framework=None)

# Retrieve the training script
train_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="training")

# Retrieve the pretrained model tarball for transfer learning
train_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="training")

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# [Optional] Override default hyperparameters with custom values
hyperparameters["epochs"] = "5"

# The sample training data is available in the following S3 bucket
training_data_bucket = f"jumpstart-cache-prod-{aws_region}"
training_data_prefix = "training-datasets/tf_flowers/"

training_dataset_s3_path = f"s3://{training_data_bucket}/{training_data_prefix}"

output_bucket = sess.default_bucket()
output_prefix = "jumpstart-example-ic-training"
s3_output_location = f"s3://{output_bucket}/{output_prefix}/output"

# Create SageMaker Estimator instance
tf_ic_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
)

# Use S3 path of the training data to launch SageMaker TrainingJob
tf_ic_estimator.fit({"training": training_dataset_s3_path}, logs=True)
```

# Input and output interface for the Image Classification - TensorFlow algorithm
<a name="IC-TF-inputoutput"></a>

Each of the pretrained models listed in TensorFlow Hub Models can be fine-tuned to any dataset with any number of image classes. Be mindful of how to format your training data for input to the Image Classification - TensorFlow model.
+ **Training data input format:** Your training data should be a directory with as many subdirectories as the number of classes. Each subdirectory should contain images belonging to that class in .jpg, .jpeg, or .png format.

The following is an example of an input directory structure. This example dataset has two classes: `roses` and `dandelion`. The image files in each class folder can have any name. The input directory should be hosted in an Amazon S3 bucket with a path similar to the following: `s3://bucket_name/input_directory/`. Note that the trailing `/` is required.

```
input_directory
    |--roses
        |--abc.jpg
        |--def.jpg
    |--dandelion
        |--ghi.jpg
        |--jkl.jpg
```

Trained models output label mapping files that map class folder names to the indices in the list of output class probabilities. This mapping is in alphabetical order. For example, in the preceding example, the dandelion class is index 0 and the roses class is index 1. 

After training, you have a fine-tuned model that you can further train using incremental training or deploy for inference. The Image Classification - TensorFlow algorithm automatically adds a pre-processing and post-processing signature to the fine-tuned model so that it can take in images as input and return class probabilities. The file mapping class indices to class labels is saved along with the models. 

## Incremental training
<a name="IC-TF-incremental-training"></a>

You can seed the training of a new model with artifacts from a model that you trained previously with SageMaker AI. Incremental training saves training time when you want to train a new model with the same or similar data.

**Note**  
You can only seed a SageMaker Image Classification - TensorFlow model with another Image Classification - TensorFlow model trained in SageMaker AI. 

You can use any dataset for incremental training, as long as the set of classes remains the same. The incremental training step is similar to the fine-tuning step, but instead of starting with a pretrained model, you start with an existing fine-tuned model. For an example of incremental training with the SageMaker AI Image Classification - TensorFlow algorithm, see the [Introduction to SageMaker TensorFlow - Image Classification](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/image_classification_tensorflow/Amazon_TensorFlow_Image_Classification.ipynb) sample notebook.

## Inference with the Image Classification - TensorFlow algorithm
<a name="IC-TF-inference"></a>

You can host the fine-tuned model that results from your TensorFlow Image Classification training for inference. Any input image for inference must be in `.jpg`, .`jpeg`, or `.png` format and be content type `application/x-image`. The Image Classification - TensorFlow algorithm resizes input images automatically. 

Running inference results in probability values, class labels for all classes, and the predicted label corresponding to the class index with the highest probability encoded in JSON format. The Image Classification - TensorFlow model processes a single image per request and outputs only one line. The following is an example of a JSON format response:

```
accept: application/json;verbose

 {"probabilities": [prob_0, prob_1, prob_2, ...],
  "labels":        [label_0, label_1, label_2, ...],
  "predicted_label": predicted_label}
```

If `accept` is set to `application/json`, then the model only outputs probabilities. For more information on training and inference with the Image Classification - TensorFlow algorithm, see the [Introduction to SageMaker TensorFlow - Image Classification](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/image_classification_tensorflow/Amazon_TensorFlow_Image_Classification.ipynb) sample notebook.

## Amazon EC2 instance recommendation for the Image Classification - TensorFlow algorithm
<a name="IC-TF-instances"></a>

The Image Classification - TensorFlow algorithm supports all CPU and GPU instances for training, including:
+ `ml.p2.xlarge`
+ `ml.p2.16xlarge`
+ `ml.p3.2xlarge`
+ `ml.p3.16xlarge`
+ `ml.g4dn.xlarge`
+ `ml.g4dn.16.xlarge`
+ `ml.g5.xlarge`
+ `ml.g5.48xlarge`

We recommend GPU instances with more memory for training with large batch sizes. Both CPU (such as M5) and GPU (P2, P3, G4dn, or G5) instances can be used for inference.

## Image Classification - TensorFlow sample notebooks
<a name="IC-TF-sample-notebooks"></a>

For more information about how to use the SageMaker Image Classification - TensorFlow algorithm for transfer learning on a custom dataset, see the [Introduction to SageMaker TensorFlow - Image Classification](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/image_classification_tensorflow/Amazon_TensorFlow_Image_Classification.ipynb) notebook.

For instructions how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see [Amazon SageMaker notebook instances](nbi.md). After you have created a notebook instance and opened it, select the **SageMaker AI Examples** tab to see a list of all the SageMaker AI samples. To open a notebook, choose its **Use** tab and choose **Create copy**.

# How Image Classification - TensorFlow Works
<a name="IC-TF-HowItWorks"></a>

The Image Classification - TensorFlow algorithm takes an image as input and classifies it into one of the output class labels. Various deep learning networks such as MobileNet, ResNet, Inception, and EfficientNet are highly accurate for image classification. There are also deep learning networks that are trained on large image datasets, such as ImageNet, which has over 11 million images and almost 11,000 classes. After a network is trained with ImageNet data, you can then fine-tune the network on a dataset with a particular focus to perform more specific classification tasks. The Amazon SageMaker Image Classification - TensorFlow algorithm supports transfer learning on many pretrained models that are available in the TensorFlow Hub.

According to the number of class labels in your training data, a classification layer is attached to the pretrained TensorFlow Hub model of your choice. The classification layer consists of a dropout layer, a dense layer, and a fully-connected layer with 2-norm regularizer that is initialized with random weights. The model has hyperparameters for the dropout rate of the dropout layer and the L2 regularization factor for the dense layer. You can then fine-tune either the entire network (including the pretrained model) or only the top classification layer on new training data. With this method of transfer learning, training with smaller datasets is possible.

# TensorFlow Hub Models
<a name="IC-TF-Models"></a>

The following pretrained models are available to use for transfer learning with the Image Classification - TensorFlow algorithm. 

The following models vary significantly in size, number of model parameters, training time, and inference latency for any given dataset. The best model for your use case depends on the complexity of your fine-tuning dataset and any requirements that you have on training time, inference latency, or model accuracy.


| Model Name | `model_id` | Source | 
| --- | --- | --- | 
| MobileNet V2 1.00 224 | `tensorflow-ic-imagenet-mobilenet-v2-100-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/classification/4) | 
| MobileNet V2 0.75 224 | `tensorflow-ic-imagenet-mobilenet-v2-075-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v2_075_224/classification/4) | 
| MobileNet V2 0.50 224 | `tensorflow-ic-imagenet-mobilenet-v2-050-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v2_050_224/classification/4) | 
| MobileNet V2 0.35 224 | `tensorflow-ic-imagenet-mobilenet-v2-035-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v2_035_224/classification/4) | 
| MobileNet V2 1.40 224 | `tensorflow-ic-imagenet-mobilenet-v2-140-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v2_140_224/classification/4) | 
| MobileNet V2 1.30 224 | `tensorflow-ic-imagenet-mobilenet-v2-130-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/4) | 
| MobileNet V2 | `tensorflow-ic-tf2-preview-mobilenet-v2-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification/4) | 
| Inception V3 | `tensorflow-ic-imagenet-inception-v3-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/inception_v3/classification/4) | 
| Inception V2 | `tensorflow-ic-imagenet-inception-v2-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/inception_v2/classification/4) | 
| Inception V1 | `tensorflow-ic-imagenet-inception-v1-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/inception_v1/classification/4) | 
| Inception V3 Preview | `tensorflow-ic-tf2-preview-inception-v3-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/tf2-preview/inception_v3/classification/4) | 
| Inception ResNet V2 | `tensorflow-ic-imagenet-inception-resnet-v2-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/inception_resnet_v2/classification/4) | 
| ResNet V2 50 | `tensorflow-ic-imagenet-resnet-v2-50-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/resnet_v2_50/classification/4) | 
| ResNet V2 101 | `tensorflow-ic-imagenet-resnet-v2-101-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/resnet_v2_101/classification/4) | 
| ResNet V2 152 | `tensorflow-ic-imagenet-resnet-v2-152-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/resnet_v2_152/classification/4) | 
| ResNet V1 50 | `tensorflow-ic-imagenet-resnet-v1-50-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/resnet_v1_50/classification/4) | 
| ResNet V1 101 | `tensorflow-ic-imagenet-resnet-v1-101-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/resnet_v1_101/classification/4) | 
| ResNet V1 152 | `tensorflow-ic-imagenet-resnet-v1-152-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/resnet_v1_152/classification/4) | 
| ResNet 50 | `tensorflow-ic-imagenet-resnet-50-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/resnet_50/classification/1) | 
| EfficientNet B0 | `tensorflow-ic-efficientnet-b0-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/efficientnet/b0/classification/1) | 
| EfficientNet B1 | `tensorflow-ic-efficientnet-b1-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/efficientnet/b1/classification/1) | 
| EfficientNet B2 | `tensorflow-ic-efficientnet-b2-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/efficientnet/b2/classification/1) | 
| EfficientNet B3 | `tensorflow-ic-efficientnet-b3-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/efficientnet/b3/classification/1) | 
| EfficientNet B4 | `tensorflow-ic-efficientnet-b4-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/efficientnet/b4/classification/1) | 
| EfficientNet B5 | `tensorflow-ic-efficientnet-b5-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/efficientnet/b5/classification/1) | 
| EfficientNet B6 | `tensorflow-ic-efficientnet-b6-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/efficientnet/b6/classification/1) | 
| EfficientNet B7 | `tensorflow-ic-efficientnet-b7-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/efficientnet/b7/classification/1) | 
| EfficientNet B0 Lite | `tensorflow-ic-efficientnet-lite0-classification-2` | [TensorFlow Hub link](https://tfhub.dev/tensorflow/efficientnet/lite0/classification/2) | 
| EfficientNet B1 Lite | `tensorflow-ic-efficientnet-lite1-classification-2` | [TensorFlow Hub link](https://tfhub.dev/tensorflow/efficientnet/lite1/classification/2) | 
| EfficientNet B2 Lite | `tensorflow-ic-efficientnet-lite2-classification-2` | [TensorFlow Hub link](https://tfhub.dev/tensorflow/efficientnet/lite2/classification/2) | 
| EfficientNet B3 Lite | `tensorflow-ic-efficientnet-lite3-classification-2` | [TensorFlow Hub link](https://tfhub.dev/tensorflow/efficientnet/lite3/classification/2) | 
| EfficientNet B4 Lite | `tensorflow-ic-efficientnet-lite4-classification-2` | [TensorFlow Hub link](https://tfhub.dev/tensorflow/efficientnet/lite4/classification/2) | 
| MobileNet V1 1.00 224 | `tensorflow-ic-imagenet-mobilenet-v1-100-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_100_224/classification/4) | 
| MobileNet V1 1.00 192 | `tensorflow-ic-imagenet-mobilenet-v1-100-192-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_100_192/classification/4) | 
| MobileNet V1 1.00 160 | `tensorflow-ic-imagenet-mobilenet-v1-100-160-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_100_160/classification/4) | 
| MobileNet V1 1.00 128 | `tensorflow-ic-imagenet-mobilenet-v1-100-128-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_100_128/classification/4) | 
| MobileNet V1 0.75 224 | `tensorflow-ic-imagenet-mobilenet-v1-075-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_075_224/classification/4) | 
| MobileNet V1 0.75 192 | `tensorflow-ic-imagenet-mobilenet-v1-075-192-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_075_192/classification/4) | 
| MobileNet V1 0.75 160 | `tensorflow-ic-imagenet-mobilenet-v1-075-160-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_075_160/classification/4) | 
| MobileNet V1 0.75 128 | `tensorflow-ic-imagenet-mobilenet-v1-075-128-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_075_128/classification/4) | 
| MobileNet V1 0.50 224 | `tensorflow-ic-imagenet-mobilenet-v1-050-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_050_224/classification/4) | 
| MobileNet V1 0.50 192 | `tensorflow-ic-imagenet-mobilenet-v1-050-192-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_050_192/classification/4) | 
| MobileNet V1 1.00 160 | `tensorflow-ic-imagenet-mobilenet-v1-050-160-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_050_160/classification/4) | 
| MobileNet V1 0.50 128 | `tensorflow-ic-imagenet-mobilenet-v1-050-128-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_050_128/classification/4) | 
| MobileNet V1 0.25 224 | `tensorflow-ic-imagenet-mobilenet-v1-025-224-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_025_224/classification/4) | 
| MobileNet V1 0.25 192 | `tensorflow-ic-imagenet-mobilenet-v1-025-192-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_025_192/classification/4) | 
| MobileNet V1 0.25 160 | `tensorflow-ic-imagenet-mobilenet-v1-025-160-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_025_160/classification/4) | 
| MobileNet V1 0.25 128 | `tensorflow-ic-imagenet-mobilenet-v1-025-128-classification-4` | [TensorFlow Hub link](https://tfhub.dev/google/imagenet/mobilenet_v1_025_128/classification/4) | 
| BiT-S R50x1 | `tensorflow-ic-bit-s-r50x1-ilsvrc2012-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/s-r50x1/ilsvrc2012_classification/1) | 
| BiT-S R50x3 | `tensorflow-ic-bit-s-r50x3-ilsvrc2012-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/s-r50x3/ilsvrc2012_classification/1) | 
| BiT-S R101x1 | `tensorflow-ic-bit-s-r101x1-ilsvrc2012-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/s-r101x1/ilsvrc2012_classification/1) | 
| BiT-S R101x3 | `tensorflow-ic-bit-s-r101x3-ilsvrc2012-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/s-r101x3/ilsvrc2012_classification/1) | 
| BiT-M R50x1 | `tensorflow-ic-bit-m-r50x1-ilsvrc2012-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/m-r50x1/ilsvrc2012_classification/1) | 
| BiT-M R50x3 | `tensorflow-ic-bit-m-r50x3-ilsvrc2012-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/m-r50x3/ilsvrc2012_classification/1) | 
| BiT-M R101x1 | `tensorflow-ic-bit-m-r101x1-ilsvrc2012-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/m-r101x1/ilsvrc2012_classification/1) | 
| BiT-M R101x3 | `tensorflow-ic-bit-m-r101x3-ilsvrc2012-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/m-r101x3/ilsvrc2012_classification/1) | 
| BiT-M R50x1 ImageNet-21k | `tensorflow-ic-bit-m-r50x1-imagenet21k-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/m-r50x1/imagenet21k_classification/1) | 
| BiT-M R50x3 ImageNet-21k | `tensorflow-ic-bit-m-r50x3-imagenet21k-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/m-r50x3/imagenet21k_classification/1) | 
| BiT-M R101x1 ImageNet-21k | `tensorflow-ic-bit-m-r101x1-imagenet21k-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/m-r101x1/imagenet21k_classification/1) | 
| BiT-M R101x3 ImageNet-21k | `tensorflow-ic-bit-m-r101x3-imagenet21k-classification-1` | [TensorFlow Hub link](https://tfhub.dev/google/bit/m-r101x3/imagenet21k_classification/1) | 

# Image Classification - TensorFlow Hyperparameters
<a name="IC-TF-Hyperparameter"></a>

Hyperparameters are parameters that are set before a machine learning model begins learning. The following hyperparameters are supported by the Amazon SageMaker AI built-in Image Classification - TensorFlow algorithm. See [Tune an Image Classification - TensorFlow model](IC-TF-tuning.md) for information on hyperparameter tuning. 


| Parameter Name | Description | 
| --- | --- | 
| augmentation |  Set to `"True"` to apply `augmentation_random_flip`, `augmentation_random_rotation`, and `augmentation_random_zoom` to the training data.  Valid values: string, either: (`"True"` or `"False"`). Default value: `"False"`.  | 
| augmentation\$1random\$1flip |  Indicates which flip mode to use for data augmentation when `augmentation` is set to `"True"`. For more information, see [RandomFlip](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomFlip) in the TensorFlow documentation. Valid values: string, any of the following: (`"horizontal_and_vertical"`, `"vertical"`, or `"None"`). Default value: `"horizontal_and_vertical"`.  | 
| augmentation\$1random\$1rotation |  Indicates how much rotation to use for data augmentation when `augmentation` is set to `"True"`. Values represent a fraction of 2π. Positive values rotate counterclockwise while negative values rotate clockwise. `0` means no rotation. For more information, see [RandomRotation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomRotation) in the TensorFlow documentation. Valid values: float, range: [`-1.0`, `1.0`]. Default value: `0.2`.  | 
| augmentation\$1random\$1zoom |  Indicates how much vertical zoom to use for data augmentation when `augmentation` is set to `"True"`. Positive values zoom out while negative values zoom in. `0` means no zoom. For more information, see [RandomZoom](https://www.tensorflow.org/api_docs/python/tf/keras/layers/RandomZoom) in the TensorFlow documentation. Valid values: float, range: [`-1.0`, `1.0`]. Default value: `0.1`.  | 
| batch\$1size |  The batch size for training. For training on instances with multiple GPUs, this batch size is used across the GPUs.  Valid values: positive integer. Default value: `32`.  | 
| beta\$11 |  The beta1 for the `"adam"` optimizer. Represents the exponential decay rate for the first moment estimates. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.9`.  | 
| beta\$12 |  The beta2 for the `"adam"` optimizer. Represents the exponential decay rate for the second moment estimates. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.999`.  | 
| binary\$1mode |  When `binary_mode` is set to `"True"`, the model returns a single probability number for the positive class and can use additional `eval_metric` options. Use only for binary classification problems. Valid values: string, either: (`"True"` or `"False"`). Default value: `"False"`.  | 
| dropout\$1rate | The dropout rate for the dropout layer in the top classification layer. Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.2` | 
| early\$1stopping |  Set to `"True"` to use early stopping logic during training. If `"False"`, early stopping is not used. Valid values: string, either: (`"True"` or `"False"`). Default value: `"False"`.  | 
| early\$1stopping\$1min\$1delta | The minimum change needed to qualify as an improvement. An absolute change less than the value of early\$1stopping\$1min\$1delta does not qualify as improvement. Used only when early\$1stopping is set to "True".Valid values: float, range: [`0.0`, `1.0`].Default value: `0.0`. | 
| early\$1stopping\$1patience |  The number of epochs to continue training with no improvement. Used only when `early_stopping` is set to `"True"`. Valid values: positive integer. Default value: `5`.  | 
| epochs |  The number of training epochs. Valid values: positive integer. Default value: `3`.  | 
| epsilon |  The epsilon for `"adam"`, `"rmsprop"`, `"adadelta"`, and `"adagrad"` optimizers. Usually set to a small value to avoid division by 0. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `1e-7`.  | 
| eval\$1metric |  If `binary_mode` is set to `"False"`, `eval_metric` can only be `"accuracy"`. If `binary_mode` is `"True"`, select any of the valid values. For more information, see [Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) in the TensorFlow documentation. Valid values: string, any of the following: (`"accuracy"`, `"precision"`, `"recall"`, `"auc"`, or `"prc"`). Default value: `"accuracy"`.  | 
| image\$1resize\$1interpolation |  Indicates interpolation method used when resizing images. For more information, see [image.resize](https://www.tensorflow.org/api_docs/python/tf/image/resize) in the TensorFlow documentation. Valid values: string, any of the following: (`"bilinear"`, `"nearest"`, `"bicubic"`, `"area"`,` "lanczos3"` , `"lanczos5"`, `"gaussian"`, or `"mitchellcubic"`). Default value: `"bilinear"`.  | 
| initial\$1accumulator\$1value |  The starting value for the accumulators, or the per-parameter momentum values, for the `"adagrad"` optimizer. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.0001`.  | 
| label\$1smoothing |  Indicates how much to relax the confidence on label values. For example, if `label_smoothing` is `0.1`, then non-target labels are `0.1/num_classes `and target labels are `0.9+0.1/num_classes`.  Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.1`.  | 
| learning\$1rate | The optimizer learning rate. Valid values: float, range: [`0.0`, `1.0`].Default value: `0.001`. | 
| momentum |  The momentum for `"sgd"`, `"nesterov"`, and `"rmsprop"` optimizers. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.9`.  | 
| optimizer |  The optimizer type. For more information, see [Optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) in the TensorFlow documentation. Valid values: string, any of the following: (`"adam"`, `"sgd"`, `"nesterov"`, `"rmsprop"`,` "adagrad"` , `"adadelta"`). Default value: `"adam"`.  | 
| regularizers\$1l2 |  The L2 regularization factor for the dense layer in the classification layer.  Valid values: float, range: [`0.0`, `1.0`]. Default value: `.0001`.  | 
| reinitialize\$1top\$1layer |  If set to `"Auto"`, the top classification layer parameters are re-initialized during fine-tuning. For incremental training, top classification layer parameters are not re-initialized unless set to `"True"`. Valid values: string, any of the following: (`"Auto"`, `"True"` or `"False"`). Default value: `"Auto"`.  | 
| rho |  The discounting factor for the gradient of the `"adadelta"` and `"rmsprop"` optimizers. Ignored for other optimizers.  Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.95`.  | 
| train\$1only\$1top\$1layer |  If `"True"`, only the top classification layer parameters are fine-tuned. If `"False"`, all model parameters are fine-tuned. Valid values: string, either: (`"True"` or `"False"`). Default value: `"False"`.  | 

# Tune an Image Classification - TensorFlow model
<a name="IC-TF-tuning"></a>

*Automatic model tuning*, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.

For more information about model tuning, see [Automatic model tuning with SageMaker AI](automatic-model-tuning.md).

## Metrics computed by the Image Classification - TensorFlow algorithm
<a name="IC-TF-metrics"></a>

The image classification algorithm is a supervised algorithm. It reports an accuracy metric that is computed during training. When tuning the model, choose this metric as the objective metric.


| Metric Name | Description | Optimization Direction | 
| --- | --- | --- | 
| validation:accuracy | The ratio of the number of correct predictions to the total number of predictions made. | Maximize | 

## Tunable Image Classification - TensorFlow hyperparameters
<a name="IC-TF-tunable-hyperparameters"></a>

Tune an image classification model with the following hyperparameters. The hyperparameters that have the greatest impact on image classification objective metrics are: `batch_size`, `learning_rate`, and `optimizer`. Tune the optimizer-related hyperparameters, such as `momentum`, `regularizers_l2`, `beta_1`, `beta_2`, and `eps` based on the selected `optimizer`. For example, use `beta_1` and `beta_2` only when `adam` is the `optimizer`.

For more information about which hyperparameters are used for each `optimizer`, see [Image Classification - TensorFlow Hyperparameters](IC-TF-Hyperparameter.md).


| Parameter Name | Parameter Type | Recommended Ranges | 
| --- | --- | --- | 
| batch\$1size | IntegerParameterRanges | MinValue: 8, MaxValue: 512 | 
| beta\$11 | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.999 | 
| beta\$12 | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.999 | 
| eps | ContinuousParameterRanges | MinValue: 1e-8, MaxValue: 1.0 | 
| learning\$1rate | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.5 | 
| momentum | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.999 | 
| optimizer | CategoricalParameterRanges | ['sgd', ‘adam’, ‘rmsprop’, 'nesterov', 'adagrad', 'adadelta'] | 
| regularizers\$1l2 | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.999 | 
| train\$1only\$1top\$1layer | ContinuousParameterRanges | ['True', 'False'] | 

# Object Detection - MXNet
<a name="object-detection"></a>

The Amazon SageMaker AI Object Detection - MXNet algorithm detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene. The object is categorized into one of the classes in a specified collection with a confidence score that it belongs to the class. Its location and scale in the image are indicated by a rectangular bounding box. It uses the [Single Shot multibox Detector (SSD)](https://arxiv.org/pdf/1512.02325.pdf) framework and supports two base networks: [VGG](https://arxiv.org/pdf/1409.1556.pdf) and [ResNet](https://arxiv.org/pdf/1603.05027.pdf). The network can be trained from scratch, or trained with models that have been pre-trained on the [ImageNet](http://www.image-net.org/) dataset.

**Topics**
+ [

## Input/Output Interface for the Object Detection Algorithm
](#object-detection-inputoutput)
+ [

## EC2 Instance Recommendation for the Object Detection Algorithm
](#object-detection-instances)
+ [

## Object Detection Sample Notebooks
](#object-detection-sample-notebooks)
+ [

# How Object Detection Works
](algo-object-detection-tech-notes.md)
+ [

# Object Detection Hyperparameters
](object-detection-api-config.md)
+ [

# Tune an Object Detection Model
](object-detection-tuning.md)
+ [

# Object Detection Request and Response Formats
](object-detection-in-formats.md)

## Input/Output Interface for the Object Detection Algorithm
<a name="object-detection-inputoutput"></a>

The SageMaker AI Object Detection algorithm supports both RecordIO (`application/x-recordio`) and image (`image/png`, `image/jpeg`, and `application/x-image`) content types for training in file mode and supports RecordIO (`application/x-recordio`) for training in pipe mode. However you can also train in pipe mode using the image files (`image/png`, `image/jpeg`, and `application/x-image`), without creating RecordIO files, by using the augmented manifest format. The recommended input format for the Amazon SageMaker AI object detection algorithms is [Apache MXNet RecordIO](https://mxnet.apache.org/api/architecture/note_data_loading). However, you can also use raw images in .jpg or .png format. The algorithm supports only `application/x-image` for inference.

**Note**  
To maintain better interoperability with existing deep learning frameworks, this differs from the protobuf data formats commonly used by other Amazon SageMaker AI algorithms.

See the [Object Detection Sample Notebooks](#object-detection-sample-notebooks) for more details on data formats.

### Train with the RecordIO Format
<a name="object-detection-recordio-training"></a>

If you use the RecordIO format for training, specify both train and validation channels as values for the `InputDataConfig` parameter of the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Specify one RecordIO (.rec) file in the train channel and one RecordIO file in the validation channel. Set the content type for both channels to `application/x-recordio`. An example of how to generate RecordIO file can be found in the object detection sample notebook. You can also use tools from the [MXNet's GluonCV](https://gluon-cv.mxnet.io/build/examples_datasets/recordio.html) to generate RecordIO files for popular datasets like the [PASCAL Visual Object Classes](http://host.robots.ox.ac.uk/pascal/VOC/) and [Common Objects in Context (COCO)](http://cocodataset.org/#home).

### Train with the Image Format
<a name="object-detection-image-training"></a>

If you use the image format for training, specify `train`, `validation`, `train_annotation`, and `validation_annotation` channels as values for the `InputDataConfig` parameter of [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Specify the individual image data (.jpg or .png) files for the train and validation channels. For annotation data, you can use the JSON format. Specify the corresponding .json files in the `train_annotation` and `validation_annotation` channels. Set the content type for all four channels to `image/png` or `image/jpeg` based on the image type. You can also use the content type `application/x-image` when your dataset contains both .jpg and .png images. The following is an example of a .json file.

```
{
   "file": "your_image_directory/sample_image1.jpg",
   "image_size": [
      {
         "width": 500,
         "height": 400,
         "depth": 3
      }
   ],
   "annotations": [
      {
         "class_id": 0,
         "left": 111,
         "top": 134,
         "width": 61,
         "height": 128
      },
      {
         "class_id": 0,
         "left": 161,
         "top": 250,
         "width": 79,
         "height": 143
      },
      {
         "class_id": 1,
         "left": 101,
         "top": 185,
         "width": 42,
         "height": 130
      }
   ],
   "categories": [
      {
         "class_id": 0,
         "name": "dog"
      },
      {
         "class_id": 1,
         "name": "cat"
      }
   ]
}
```

Each image needs a .json file for annotation, and the .json file should have the same name as the corresponding image. The name of above .json file should be "sample\$1image1.json". There are four properties in the annotation .json file. The property "file" specifies the relative path of the image file. For example, if your training images and corresponding .json files are stored in s3://*your\$1bucket*/train/sample\$1image and s3://*your\$1bucket*/train\$1annotation, specify the path for your train and train\$1annotation channels as s3://*your\$1bucket*/train and s3://*your\$1bucket*/train\$1annotation, respectively. 

In the .json file, the relative path for an image named sample\$1image1.jpg should be sample\$1image/sample\$1image1.jpg. The `"image_size"` property specifies the overall image dimensions. The SageMaker AI object detection algorithm currently only supports 3-channel images. The `"annotations"` property specifies the categories and bounding boxes for objects within the image. Each object is annotated by a `"class_id"` index and by four bounding box coordinates (`"left"`, `"top"`, `"width"`, `"height"`). The `"left"` (x-coordinate) and `"top"` (y-coordinate) values represent the upper-left corner of the bounding box. The `"width"` (x-coordinate) and `"height"` (y-coordinate) values represent the dimensions of the bounding box. The origin (0, 0) is the upper-left corner of the entire image. If you have multiple objects within one image, all the annotations should be included in a single .json file. The `"categories"` property stores the mapping between the class index and class name. The class indices should be numbered successively and the numbering should start with 0. The `"categories"` property is optional for the annotation .json file

### Train with Augmented Manifest Image Format
<a name="object-detection-augmented-manifest-training"></a>

The augmented manifest format enables you to do training in pipe mode using image files without needing to create RecordIO files. You need to specify both train and validation channels as values for the `InputDataConfig` parameter of the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. While using the format, an S3 manifest file needs to be generated that contains the list of images and their corresponding annotations. The manifest file format should be in [JSON Lines](http://jsonlines.org/) format in which each line represents one sample. The images are specified using the `'source-ref'` tag that points to the S3 location of the image. The annotations are provided under the `"AttributeNames"` parameter value as specified in the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. It can also contain additional metadata under the `metadata` tag, but these are ignored by the algorithm. In the following example, the `"AttributeNames` are contained in the list `["source-ref", "bounding-box"]`:

```
{"source-ref": "s3://your_bucket/image1.jpg", "bounding-box":{"image_size":[{ "width": 500, "height": 400, "depth":3}], "annotations":[{"class_id": 0, "left": 111, "top": 134, "width": 61, "height": 128}, {"class_id": 5, "left": 161, "top": 250, "width": 80, "height": 50}]}, "bounding-box-metadata":{"class-map":{"0": "dog", "5": "horse"}, "type": "groundtruth/object-detection"}}
{"source-ref": "s3://your_bucket/image2.jpg", "bounding-box":{"image_size":[{ "width": 400, "height": 300, "depth":3}], "annotations":[{"class_id": 1, "left": 100, "top": 120, "width": 43, "height": 78}]}, "bounding-box-metadata":{"class-map":{"1": "cat"}, "type": "groundtruth/object-detection"}}
```

The order of `"AttributeNames"` in the input files matters when training the Object Detection algorithm. It accepts piped data in a specific order, with `image` first, followed by `annotations`. So the "AttributeNames" in this example are provided with `"source-ref"` first, followed by `"bounding-box"`. When using Object Detection with Augmented Manifest, the value of parameter `RecordWrapperType` must be set as `"RecordIO"`.

For more information on augmented manifest files, see [Augmented Manifest Files for Training Jobs](augmented-manifest.md).

### Incremental Training
<a name="object-detection-incremental-training"></a>

You can also seed the training of a new model with the artifacts from a model that you trained previously with SageMaker AI. Incremental training saves training time when you want to train a new model with the same or similar data. SageMaker AI object detection models can be seeded only with another built-in object detection model trained in SageMaker AI.

To use a pretrained model, in the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request, specify the `ChannelName` as "model" in the `InputDataConfig` parameter. Set the `ContentType` for the model channel to `application/x-sagemaker-model`. The input hyperparameters of both the new model and the pretrained model that you upload to the model channel must have the same settings for the `base_network` and `num_classes` input parameters. These parameters define the network architecture. For the pretrained model file, use the compressed model artifacts (in .tar.gz format) output by SageMaker AI. You can use either RecordIO or image formats for input data.

For more information on incremental training and for instructions on how to use it, see [Use Incremental Training in Amazon SageMaker AI](incremental-training.md). 

## EC2 Instance Recommendation for the Object Detection Algorithm
<a name="object-detection-instances"></a>

The object detection algorithm supports P2, P3, G4dn, and G5 GPU instance families. We recommend using GPU instances with more memory for training with large batch sizes. You can run the object detection algorithm on multi-GPU and mult-machine settings for distributed training.

You can use both CPU (such as C5 and M5) and GPU (such as P3 and G4dn) instances for inference.

## Object Detection Sample Notebooks
<a name="object-detection-sample-notebooks"></a>

For a sample notebook that shows how to use the SageMaker AI Object Detection algorithm to train and host a model on the 

[Caltech Birds (CUB 200 2011)](http://www.vision.caltech.edu/datasets/cub_200_2011/) dataset using the Single Shot multibox Detector algorithm, see [Amazon SageMaker AI Object Detection for Bird Species](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/object_detection_birds/object_detection_birds.html). For instructions how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see [Amazon SageMaker notebook instances](nbi.md). Once you have created a notebook instance and opened it, select the **SageMaker AI Examples** tab to see a list of all the SageMaker AI samples. The object detection example notebook using the Object Detection algorithm is located in the **Introduction to Amazon Algorithms** section. To open a notebook, click on its **Use** tab and select **Create copy**.

For more information about the Amazon SageMaker AI Object Detection algorithm, see the following blog posts:
+ [Training the Amazon SageMaker AI object detection model and running it on Amazon IoT Greengrass – Part 1 of 3: Preparing training data](https://www.amazonaws.cn/blogs/iot/sagemaker-object-detection-greengrass-part-1-of-3/)
+ [Training the Amazon SageMaker AI object detection model and running it on Amazon IoT Greengrass – Part 2 of 3: Training a custom object detection model](https://www.amazonaws.cn/blogs/iot/sagemaker-object-detection-greengrass-part-2-of-3/)
+ [Training the Amazon SageMaker AI object detection model and running it on Amazon IoT Greengrass – Part 3 of 3: Deploying to the edge](https://www.amazonaws.cn/blogs/iot/sagemaker-object-detection-greengrass-part-3-of-3/)

# How Object Detection Works
<a name="algo-object-detection-tech-notes"></a>

The object detection algorithm identifies and locates all instances of objects in an image from a known collection of object categories. The algorithm takes an image as input and outputs the category that the object belongs to, along with a confidence score that it belongs to the category. The algorithm also predicts the object's location and scale with a rectangular bounding box. Amazon SageMaker AI Object Detection uses the [Single Shot multibox Detector (SSD)](https://arxiv.org/pdf/1512.02325.pdf) algorithm that takes a convolutional neural network (CNN) pretrained for classification task as the base network. SSD uses the output of intermediate layers as features for detection. 

Various CNNs such as [VGG](https://arxiv.org/pdf/1409.1556.pdf) and [ResNet](https://arxiv.org/pdf/1603.05027.pdf) have achieved great performance on the image classification task. Object detection in Amazon SageMaker AI supports both VGG-16 and ResNet-50 as a base network for SSD. The algorithm can be trained in full training mode or in transfer learning mode. In full training mode, the base network is initialized with random weights and then trained on user data. In transfer learning mode, the base network weights are loaded from pretrained models.

The object detection algorithm uses standard data augmentation operations, such as flip, rescale, and jitter, on the fly internally to help avoid overfitting.

# Object Detection Hyperparameters
<a name="object-detection-api-config"></a>

In the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request, you specify the training algorithm that you want to use. You can also specify algorithm-specific hyperparameters that are used to help estimate the parameters of the model from a training dataset. The following table lists the hyperparameters provided by Amazon SageMaker AI for training the object detection algorithm. For more information about how object training works, see [How Object Detection Works](algo-object-detection-tech-notes.md).


| Parameter Name | Description | 
| --- | --- | 
| num\$1classes |  The number of output classes. This parameter defines the dimensions of the network output and is typically set to the number of classes in the dataset. **Required** Valid values: positive integer  | 
| num\$1training\$1samples |  The number of training examples in the input dataset.  If there is a mismatch between this value and the number of samples in the training set, then the behavior of the `lr_scheduler_step` parameter will be undefined and distributed training accuracy may be affected.  **Required** Valid values: positive integer  | 
| base\$1network |  The base network architecture to use. **Optional** Valid values: 'vgg-16' or 'resnet-50' Default value: 'vgg-16'  | 
| early\$1stopping |  `True` to use early stopping logic during training. `False` not to use it. **Optional** Valid values: `True` or `False` Default value: `False`  | 
| early\$1stopping\$1min\$1epochs |  The minimum number of epochs that must be run before the early stopping logic can be invoked. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 10  | 
| early\$1stopping\$1patience |  The number of epochs to wait before ending training if no improvement, as defined by the `early_stopping_tolerance` hyperparameter, is made in the relevant metric. It is used only when `early_stopping` = `True`. **Optional** Valid values: positive integer Default value: 5  | 
| early\$1stopping\$1tolerance |  The tolerance value that the relative improvement in `validation:mAP`, the mean average precision (mAP), is required to exceed to avoid early stopping. If the ratio of the change in the mAP divided by the previous best mAP is smaller than the `early_stopping_tolerance` value set, early stopping considers that there is no improvement. It is used only when `early_stopping` = `True`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.0  | 
| image\$1shape |  The image size for input images. We rescale the input image to a square image with this size. We recommend using 300 and 512 for better performance. **Optional** Valid values: positive integer ≥300 Default: 300  | 
| epochs |  The number of training epochs.  **Optional** Valid values: positive integer Default: 30  | 
| freeze\$1layer\$1pattern |  The regular expression (regex) for freezing layers in the base network. For example, if we set `freeze_layer_pattern` = `"^(conv1_\|conv2_).*"`, then any layers with a name that contains `"conv1_"` or `"conv2_"` are frozen, which means that the weights for these layers are not updated during training. The layer names can be found in the network symbol files [vgg16-symbol.json](http://data.mxnet.io/models/imagenet/vgg/vgg16-symbol.json ) and [resnet-50-symbol.json](http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50-symbol.json). Freezing a layer means that its weights can not be modified further. This can reduce training time significantly in exchange for modest losses in accuracy. This technique is commonly used in transfer learning where the lower layers in the base network do not need to be retrained. **Optional** Valid values: string Default: No layers frozen.  | 
| kv\$1store |  The weight update synchronization mode used for distributed training. The weights can be updated either synchronously or asynchronously across machines. Synchronous updates typically provide better accuracy than asynchronous updates but can be slower. See the [Distributed Training](https://mxnet.apache.org/api/faq/distributed_training) MXNet tutorial for details.  This parameter is not applicable to single machine training.  **Optional** Valid values: `'dist_sync'` or `'dist_async'` [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/object-detection-api-config.html) Default: -  | 
| label\$1width |  The force padding label width used to sync across training and validation data. For example, if one image in the data contains at most 10 objects, and each object's annotation is specified with 5 numbers, [class\$1id, left, top, width, height], then the `label_width` should be no smaller than (10\$15 \$1 header information length). The header information length is usually 2. We recommend using a slightly larger `label_width` for the training, such as 60 for this example. **Optional** Valid values: Positive integer large enough to accommodate the largest annotation information length in the data. Default: 350  | 
| learning\$1rate |  The initial learning rate. **Optional** Valid values: float in (0, 1] Default: 0.001  | 
| lr\$1scheduler\$1factor |  The ratio to reduce learning rate. Used in conjunction with the `lr_scheduler_step` parameter defined as `lr_new` = `lr_old` \$1 `lr_scheduler_factor`. **Optional** Valid values: float in (0, 1) Default: 0.1  | 
| lr\$1scheduler\$1step |  The epochs at which to reduce the learning rate. The learning rate is reduced by `lr_scheduler_factor` at epochs listed in a comma-delimited string: "epoch1, epoch2, ...". For example, if the value is set to "10, 20" and the `lr_scheduler_factor` is set to 1/2, then the learning rate is halved after 10th epoch and then halved again after 20th epoch. **Optional** Valid values: string Default: empty string  | 
| mini\$1batch\$1size |  The batch size for training. In a single-machine multi-gpu setting, each GPU handles `mini_batch_size`/`num_gpu` training samples. For the multi-machine training in `dist_sync` mode, the actual batch size is `mini_batch_size`\$1number of machines. A large `mini_batch_size` usually leads to faster training, but it may cause out of memory problem. The memory usage is related to `mini_batch_size`, `image_shape`, and `base_network` architecture. For example, on a single p3.2xlarge instance, the largest `mini_batch_size` without an out of memory error is 32 with the base\$1network set to "resnet-50" and an `image_shape` of 300. With the same instance, you can use 64 as the `mini_batch_size` with the base network `vgg-16` and an `image_shape` of 300. **Optional** Valid values: positive integer Default: 32  | 
| momentum |  The momentum for `sgd`. Ignored for other optimizers. **Optional** Valid values: float in (0, 1] Default: 0.9  | 
| nms\$1threshold |  The non-maximum suppression threshold. **Optional** Valid values: float in (0, 1] Default: 0.45  | 
| optimizer |  The optimizer types. For details on optimizer values, see [MXNet's API](https://mxnet.apache.org/api/python/docs/api/). **Optional** Valid values: ['sgd', 'adam', 'rmsprop', 'adadelta'] Default: 'sgd'  | 
| overlap\$1threshold |  The evaluation overlap threshold. **Optional** Valid values: float in (0, 1] Default: 0.5  | 
| use\$1pretrained\$1model |  Indicates whether to use a pre-trained model for training. If set to 1, then the pre-trained model with corresponding architecture is loaded and used for training. Otherwise, the network is trained from scratch. **Optional** Valid values: 0 or 1 Default: 1  | 
| weight\$1decay |  The weight decay coefficient for `sgd` and `rmsprop`. Ignored for other optimizers. **Optional** Valid values: float in (0, 1) Default: 0.0005  | 

# Tune an Object Detection Model
<a name="object-detection-tuning"></a>

*Automatic model tuning*, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.

For more information about model tuning, see [Automatic model tuning with SageMaker AI](automatic-model-tuning.md).

## Metrics Computed by the Object Detection Algorithm
<a name="object-detection-metrics"></a>

The object detection algorithm reports on a single metric during training: `validation:mAP`. When tuning a model, choose this metric as the objective metric.


| Metric Name | Description | Optimization Direction | 
| --- | --- | --- | 
| validation:mAP |  Mean Average Precision (mAP) computed on the validation set.  |  Maximize  | 



## Tunable Object Detection Hyperparameters
<a name="object-detection-tunable-hyperparameters"></a>

Tune the Amazon SageMaker AI object detection model with the following hyperparameters. The hyperparameters that have the greatest impact on the object detection objective metric are: `mini_batch_size`, `learning_rate`, and `optimizer`.


| Parameter Name | Parameter Type | Recommended Ranges | 
| --- | --- | --- | 
| learning\$1rate |  ContinuousParameterRange  |  MinValue: 1e-6, MaxValue: 0.5  | 
| mini\$1batch\$1size |  IntegerParameterRanges  |  MinValue: 8, MaxValue: 64  | 
| momentum |  ContinuousParameterRange  |  MinValue: 0.0, MaxValue: 0.999  | 
| optimizer |  CategoricalParameterRanges  |  ['sgd', 'adam', 'rmsprop', 'adadelta']  | 
| weight\$1decay |  ContinuousParameterRange  |  MinValue: 0.0, MaxValue: 0.999  | 

# Object Detection Request and Response Formats
<a name="object-detection-in-formats"></a>

The following page describes the inference request and response formats for the Amazon SageMaker AI Object Detection - MXNet model.

## Request Format
<a name="object-detection-json"></a>

Query a trained model by using the model's endpoint. The endpoint takes .jpg and .png image formats with `image/jpeg` and `image/png` content-types.

## Response Formats
<a name="object-detection-recordio"></a>

The response is the class index with a confidence score and bounding box coordinates for all objects within the image encoded in JSON format. The following is an example of response .json file:

```
{"prediction":[
  [4.0, 0.86419455409049988, 0.3088374733924866, 0.07030484080314636, 0.7110607028007507, 0.9345266819000244],
  [0.0, 0.73376623392105103, 0.5714187026023865, 0.40427327156066895, 0.827075183391571, 0.9712159633636475],
  [4.0, 0.32643985450267792, 0.3677481412887573, 0.034883320331573486, 0.6318609714508057, 0.5967587828636169],
  [8.0, 0.22552496790885925, 0.6152569651603699, 0.5722782611846924, 0.882301390171051, 0.8985623121261597],
  [3.0, 0.42260299175977707, 0.019305512309074402, 0.08386176824569702, 0.39093565940856934, 0.9574796557426453]
]}
```

Each row in this .json file contains an array that represents a detected object. Each of these object arrays consists of a list of six numbers. The first number is the predicted class label. The second number is the associated confidence score for the detection. The last four numbers represent the bounding box coordinates [xmin, ymin, xmax, ymax]. These output bounding box corner indices are normalized by the overall image size. Note that this encoding is different than that use by the input .json format. For example, in the first entry of the detection result, 0.3088374733924866 is the left coordinate (x-coordinate of upper-left corner) of the bounding box as a ratio of the overall image width, 0.07030484080314636 is the top coordinate (y-coordinate of upper-left corner) of the bounding box as a ratio of the overall image height, 0.7110607028007507 is the right coordinate (x-coordinate of lower-right corner) of the bounding box as a ratio of the overall image width, and 0.9345266819000244 is the bottom coordinate (y-coordinate of lower-right corner) of the bounding box as a ratio of the overall image height. 

To avoid unreliable detection results, you might want to filter out the detection results with low confidence scores. In the [object detection sample notebook](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/object_detection_birds/object_detection_birds.ipynb), we provide examples of scripts that use a threshold to remove low confidence detections and to plot bounding boxes on the original images.

For batch transform, the response is in JSON format, where the format is identical to the JSON format described above. The detection results of each image is represented as a JSON file. For example:

```
{"prediction": [[label_id, confidence_score, xmin, ymin, xmax, ymax], [label_id, confidence_score, xmin, ymin, xmax, ymax]]}
```

For more details on training and inference, see the [Object Detection Sample Notebooks](object-detection.md#object-detection-sample-notebooks).

## OUTPUT: JSON Response Format
<a name="object-detection-output-json"></a>

accept: application/json;annotation=1

```
{
   "image_size": [
      {
         "width": 500,
         "height": 400,
         "depth": 3
      }
   ],
   "annotations": [
      {
         "class_id": 0,
         "score": 0.943,
         "left": 111,
         "top": 134,
         "width": 61,
         "height": 128
      },
      {
         "class_id": 0,
         "score": 0.0013,
         "left": 161,
         "top": 250,
         "width": 79,
         "height": 143
      },
      {
         "class_id": 1,
         "score": 0.0133,
         "left": 101,
         "top": 185,
         "width": 42,
         "height": 130
      }
   ]
}
```

# Object Detection - TensorFlow
<a name="object-detection-tensorflow"></a>

The Amazon SageMaker AI Object Detection - TensorFlow algorithm is a supervised learning algorithm that supports transfer learning with many pretrained models from the [TensorFlow Model Garden](https://github.com/tensorflow/models). Use transfer learning to fine-tune one of the available pretrained models on your own dataset, even if a large amount of image data is not available. The object detection algorithm takes an image as input and outputs a list of bounding boxes. Training datasets must consist of images in .`jpg`, `.jpeg`, or `.png` format. This page includes information about Amazon EC2 instance recommendations and sample notebooks for Object Detection - TensorFlow.

**Topics**
+ [

# How to use the SageMaker AI Object Detection - TensorFlow algorithm
](object-detection-tensorflow-how-to-use.md)
+ [

# Input and output interface for the Object Detection - TensorFlow algorithm
](object-detection-tensorflow-inputoutput.md)
+ [

## Amazon EC2 instance recommendation for the Object Detection - TensorFlow algorithm
](#object-detection-tensorflow-instances)
+ [

## Object Detection - TensorFlow sample notebooks
](#object-detection-tensorflow-sample-notebooks)
+ [

# How Object Detection - TensorFlow Works
](object-detection-tensorflow-HowItWorks.md)
+ [

# TensorFlow Models
](object-detection-tensorflow-Models.md)
+ [

# Object Detection - TensorFlow Hyperparameters
](object-detection-tensorflow-Hyperparameter.md)
+ [

# Tune an Object Detection - TensorFlow model
](object-detection-tensorflow-tuning.md)

# How to use the SageMaker AI Object Detection - TensorFlow algorithm
<a name="object-detection-tensorflow-how-to-use"></a>

You can use Object Detection - TensorFlow as an Amazon SageMaker AI built-in algorithm. The following section describes how to use Object Detection - TensorFlow with the SageMaker AI Python SDK. For information on how to use Object Detection - TensorFlow from the Amazon SageMaker Studio Classic UI, see [SageMaker JumpStart pretrained models](studio-jumpstart.md).

The Object Detection - TensorFlow algorithm supports transfer learning using any of the compatible pretrained TensorFlow models. For a list of all available pretrained models, see [TensorFlow Models](object-detection-tensorflow-Models.md). Every pretrained model has a unique `model_id`. The following example uses ResNet50 (`model_id`: `tensorflow-od1-ssd-resnet50-v1-fpn-640x640-coco17-tpu-8`) to fine-tune on a custom dataset. The pretrained models are all pre-downloaded from the TensorFlow Hub and stored in Amazon S3 buckets so that training jobs can run in network isolation. Use these pre-generated model training artifacts to construct a SageMaker AI Estimator.

First, retrieve the Docker image URI, training script URI, and pretrained model URI. Then, change the hyperparameters as you see fit. You can see a Python dictionary of all available hyperparameters and their default values with `hyperparameters.retrieve_default`. For more information, see [Object Detection - TensorFlow Hyperparameters](object-detection-tensorflow-Hyperparameter.md). Use these values to construct a SageMaker AI Estimator.

**Note**  
Default hyperparameter values are different for different models. For example, for larger models, the default number of epochs is smaller. 

This example uses the [https://www.cis.upenn.edu/~jshi/ped_html/#pub1](https://www.cis.upenn.edu/~jshi/ped_html/#pub1) dataset, which contains images of pedestriants in the street. We pre-downloaded the dataset and made it available with Amazon S3. To fine-tune your model, call `.fit` using the Amazon S3 location of your training dataset.

```
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.estimator import Estimator

model_id, model_version = "tensorflow-od1-ssd-resnet50-v1-fpn-640x640-coco17-tpu-8", "*"
training_instance_type = "ml.p3.2xlarge"

# Retrieve the Docker image
train_image_uri = image_uris.retrieve(model_id=model_id,model_version=model_version,image_scope="training",instance_type=training_instance_type,region=None,framework=None)

# Retrieve the training script
train_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="training")

# Retrieve the pretrained model tarball for transfer learning
train_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="training")

# Retrieve the default hyperparameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# [Optional] Override default hyperparameters with custom values
hyperparameters["epochs"] = "5"

# Sample training data is available in this bucket
training_data_bucket = f"jumpstart-cache-prod-{aws_region}"
training_data_prefix = "training-datasets/PennFudanPed_COCO_format/"

training_dataset_s3_path = f"s3://{training_data_bucket}/{training_data_prefix}"

output_bucket = sess.default_bucket()
output_prefix = "jumpstart-example-od-training"
s3_output_location = f"s3://{output_bucket}/{output_prefix}/output"

# Create an Estimator instance
tf_od_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
)

# Launch a training job
tf_od_estimator.fit({"training": training_dataset_s3_path}, logs=True)
```

For more information about how to use the SageMaker AI Object Detection - TensorFlow algorithm for transfer learning on a custom dataset, see the [Introduction to SageMaker TensorFlow - Object Detection](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/object_detection_tensorflow/Amazon_Tensorflow_Object_Detection.ipynb) notebook.

# Input and output interface for the Object Detection - TensorFlow algorithm
<a name="object-detection-tensorflow-inputoutput"></a>

Each of the pretrained models listed in TensorFlow Models can be fine-tuned to any dataset with any number of image classes. Be mindful of how to format your training data for input to the Object Detection - TensorFlow model.
+ **Training data input format:** Your training data should be a directory with an `images` subdirectory and an `annotations.json` file. 

The following is an example of an input directory structure. The input directory should be hosted in an Amazon S3 bucket with a path similar to the following: `s3://bucket_name/input_directory/`. Note that the trailing `/` is required.

```
input_directory
    |--images
        |--abc.png
        |--def.png
    |--annotations.json
```

The `annotations.json` file should contain information for bounding boxes and their class labels in the form of a dictionary `"images"` and `"annotations"` keys. The value for the `"images"` key should be a list of dictionaries. There should be one dictionary for each image with the following information: `{"file_name": image_name, "height": height, "width": width, "id": image_id}`. The value for the `"annotations"` key should also be a list of dictionaries. There should be one dictionary for each bounding box with the following information: `{"image_id": image_id, "bbox": [xmin, ymin, xmax, ymax], "category_id": bbox_label}`.

After training, a label mapping file and trained model are saved to your Amazon S3 bucket.

## Incremental training
<a name="object-detection-tensorflow-incremental-training"></a>

You can seed the training of a new model with artifacts from a model that you trained previously with SageMaker AI. Incremental training saves training time when you want to train a new model with the same or similar data.

**Note**  
You can only seed a SageMaker AI Object Detection - TensorFlow model with another Object Detection - TensorFlow model trained in SageMaker AI. 

You can use any dataset for incremental training, as long as the set of classes remains the same. The incremental training step is similar to the fine-tuning step, but instead of starting with a pretrained model, you start with an existing fine-tuned model. For more information about how to use incremental training with the SageMaker AI Object Detection - TensorFlow, see the [Introduction to SageMaker TensorFlow - Object Detection](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/object_detection_tensorflow/Amazon_Tensorflow_Object_Detection.ipynb) notebook.

## Inference with the Object Detection - TensorFlow algorithm
<a name="object-detection-tensorflow-inference"></a>

You can host the fine-tuned model that results from your TensorFlow Object Detection training for inference. Any input image for inference must be in `.jpg`, .`jpeg`, or `.png` format and be content type `application/x-image`. The Object Detection - TensorFlow algorithm resizes input images automatically. 

Running inference results in bounding boxes, predicted classes, and the scores of each prediction encoded in JSON format. The Object Detection - TensorFlow model processes a single image per request and outputs only one line. The following is an example of a JSON format response:

```
accept: application/json;verbose

{"normalized_boxes":[[xmin1, xmax1, ymin1, ymax1],....], 
    "classes":[classidx1, class_idx2,...], 
    "scores":[score_1, score_2,...], 
    "labels": [label1, label2, ...], 
    "tensorflow_model_output":<original output of the model>}
```

If `accept` is set to `application/json`, then the model only outputs normalized boxes, classes, and scores. 

## Amazon EC2 instance recommendation for the Object Detection - TensorFlow algorithm
<a name="object-detection-tensorflow-instances"></a>

The Object Detection - TensorFlow algorithm supports all GPU instances for training, including:
+ `ml.p2.xlarge`
+ `ml.p2.16xlarge`
+ `ml.p3.2xlarge`
+ `ml.p3.16xlarge`

We recommend GPU instances with more memory for training with large batch sizes. Both CPU (such as M5) and GPU (P2 or P3) instances can be used for inference. For a comprehensive list of SageMaker training and inference instances across Amazon Regions, see [Amazon SageMaker Pricing](https://www.amazonaws.cn/sagemaker/pricing/).

## Object Detection - TensorFlow sample notebooks
<a name="object-detection-tensorflow-sample-notebooks"></a>

For more information about how to use the SageMaker AI Object Detection - TensorFlow algorithm for transfer learning on a custom dataset, see the [Introduction to SageMaker TensorFlow - Object Detection](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/object_detection_tensorflow/Amazon_Tensorflow_Object_Detection.ipynb) notebook.

For instructions how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see [Amazon SageMaker notebook instances](nbi.md). After you have created a notebook instance and opened it, select the **SageMaker AI Examples** tab to see a list of all the SageMaker AI samples. To open a notebook, choose its **Use** tab and choose **Create copy**.

# How Object Detection - TensorFlow Works
<a name="object-detection-tensorflow-HowItWorks"></a>

The Object Detection - TensorFlow algorithm takes an image as input and predicts bounding boxes and object labels. Various deep learning networks such as MobileNet, ResNet, Inception, and EfficientNet are highly accurate for object detection. There are also deep learning networks that are trained on large image datasets, such as Common Objects in Context (COCO), which has 328,000 images. After a network is trained with COCO data, you can then fine-tune the network on a dataset with a particular focus to perform more specific object detection tasks. The Amazon SageMaker AI Object Detection - TensorFlow algorithm supports transfer learning on many pretrained models that are available in the TensorFlow Model Garden.

According to the number of class labels in your training data, an object detection layer is attached to the pretrained TensorFlow model of your choice. You can then fine-tune either the entire network (including the pretrained model) or only the top classification layer on new training data. With this method of transfer learning, training with smaller datasets is possible.

# TensorFlow Models
<a name="object-detection-tensorflow-Models"></a>

The following pretrained models are available to use for transfer learning with the Object Detection - TensorFlow algorithm. 

The following models vary significantly in size, number of model parameters, training time, and inference latency for any given dataset. The best model for your use case depends on the complexity of your fine-tuning dataset and any requirements that you have on training time, inference latency, or model accuracy.


| Model Name | `model_id` | Source | 
| --- | --- | --- | 
| ResNet50 V1 FPN 640 | `tensorflow-od1-ssd-resnet50-v1-fpn-640x640-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 
| EfficientDet D0 512 | `tensorflow-od1-ssd-efficientdet-d0-512x512-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d0_coco17_tpu-32.tar.gz) | 
| EfficientDet D1 640 | `tensorflow-od1-ssd-efficientdet-d1-640x640-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz) | 
| EfficientDet D2 768 | `tensorflow-od1-ssd-efficientdet-d2-768x768-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d2_coco17_tpu-32.tar.gz) | 
| EfficientDet D3 896 | `tensorflow-od1-ssd-efficientdet-d3-896x896-coco17-tpu-32` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d3_coco17_tpu-32.tar.gz) | 
| MobileNet V1 FPN 640 | `tensorflow-od1-ssd-mobilenet-v1-fpn-640x640-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 
| MobileNet V2 FPNLite 320 | `tensorflow-od1-ssd-mobilenet-v2-fpnlite-320x320-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz) | 
| MobileNet V2 FPNLite 640 | `tensorflow-od1-ssd-mobilenet-v2-fpnlite-640x640-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz) | 
| ResNet50 V1 FPN 1024 | `tensorflow-od1-ssd-resnet50-v1-fpn-1024x1024-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_1024x1024_coco17_tpu-8.tar.gz) | 
| ResNet101 V1 FPN 640 | `tensorflow-od1-ssd-resnet101-v1-fpn-640x640-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 
| ResNet101 V1 FPN 1024 | `tensorflow-od1-ssd-resnet101-v1-fpn-1024x1024-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet101_v1_fpn_1024x1024_coco17_tpu-8.tar.gz) | 
| ResNet152 V1 FPN 640 | `tensorflow-od1-ssd-resnet152-v1-fpn-640x640-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet152_v1_fpn_640x640_coco17_tpu-8.tar.gz) | 
| ResNet152 V1 FPN 1024 | `tensorflow-od1-ssd-resnet152-v1-fpn-1024x1024-coco17-tpu-8` | [TensorFlow Model Garden link](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8.tar.gz) | 

# Object Detection - TensorFlow Hyperparameters
<a name="object-detection-tensorflow-Hyperparameter"></a>

Hyperparameters are parameters that are set before a machine learning model begins learning. The following hyperparameters are supported by the Amazon SageMaker AI built-in Object Detection - TensorFlow algorithm. See [Tune an Object Detection - TensorFlow model](object-detection-tensorflow-tuning.md) for information on hyperparameter tuning. 


| Parameter Name | Description | 
| --- | --- | 
| batch\$1size |  The batch size for training.  Valid values: positive integer. Default value: `3`.  | 
| beta\$11 |  The beta1 for the `"adam"` optimizer. Represents the exponential decay rate for the first moment estimates. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.9`.  | 
| beta\$12 |  The beta2 for the `"adam"` optimizer. Represents the exponential decay rate for the second moment estimates. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.999`.  | 
| early\$1stopping |  Set to `"True"` to use early stopping logic during training. If `"False"`, early stopping is not used. Valid values: string, either: (`"True"` or `"False"`). Default value: `"False"`.  | 
| early\$1stopping\$1min\$1delta | The minimum change needed to qualify as an improvement. An absolute change less than the value of early\$1stopping\$1min\$1delta does not qualify as improvement. Used only when early\$1stopping is set to "True".Valid values: float, range: [`0.0`, `1.0`].Default value: `0.0`. | 
| early\$1stopping\$1patience |  The number of epochs to continue training with no improvement. Used only when `early_stopping` is set to `"True"`. Valid values: positive integer. Default value: `5`.  | 
| epochs |  The number of training epochs. Valid values: positive integer. Default value: `5` for smaller models, `1` for larger models.  | 
| epsilon |  The epsilon for `"adam"`, `"rmsprop"`, `"adadelta"`, and `"adagrad"` optimizers. Usually set to a small value to avoid division by 0. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `1e-7`.  | 
| initial\$1accumulator\$1value |  The starting value for the accumulators, or the per-parameter momentum values, for the `"adagrad"` optimizer. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.1`.  | 
| learning\$1rate | The optimizer learning rate. Valid values: float, range: [`0.0`, `1.0`].Default value: `0.001`. | 
| momentum |  The momentum for the `"sgd"` and `"nesterov"` optimizers. Ignored for other optimizers. Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.9`.  | 
| optimizer |  The optimizer type. For more information, see [Optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) in the TensorFlow documentation. Valid values: string, any of the following: (`"adam"`, `"sgd"`, `"nesterov"`, `"rmsprop"`,` "adagrad"` , `"adadelta"`). Default value: `"adam"`.  | 
| reinitialize\$1top\$1layer |  If set to `"Auto"`, the top classification layer parameters are re-initialized during fine-tuning. For incremental training, top classification layer parameters are not re-initialized unless set to `"True"`. Valid values: string, any of the following: (`"Auto"`, `"True"` or `"False"`). Default value: `"Auto"`.  | 
| rho |  The discounting factor for the gradient of the `"adadelta"` and `"rmsprop"` optimizers. Ignored for other optimizers.  Valid values: float, range: [`0.0`, `1.0`]. Default value: `0.95`.  | 
| train\$1only\$1on\$1top\$1layer |  If `"True"`, only the top classification layer parameters are fine-tuned. If `"False"`, all model parameters are fine-tuned. Valid values: string, either: (`"True"` or `"False"`). Default value: `"False"`.  | 

# Tune an Object Detection - TensorFlow model
<a name="object-detection-tensorflow-tuning"></a>

*Automatic model tuning*, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.

For more information about model tuning, see [Automatic model tuning with SageMaker AI](automatic-model-tuning.md).

## Metrics computed by the Object Detection - TensorFlow algorithm
<a name="object-detection-tensorflow-metrics"></a>

Refer to the following chart to find which metrics are computed by the Object Detection - TensorFlow algorithm.


| Metric Name | Description | Optimization Direction | Regex Pattern | 
| --- | --- | --- | --- | 
| validation:localization\$1loss | The localization loss for box prediction. | Minimize | `Val_localization=([0-9\\.]+)` | 

## Tunable Object Detection - TensorFlow hyperparameters
<a name="object-detection-tensorflow-tunable-hyperparameters"></a>

Tune an object detection model with the following hyperparameters. The hyperparameters that have the greatest impact on object detection objective metrics are: `batch_size`, `learning_rate`, and `optimizer`. Tune the optimizer-related hyperparameters, such as `momentum`, `regularizers_l2`, `beta_1`, `beta_2`, and `eps` based on the selected `optimizer`. For example, use `beta_1` and `beta_2` only when `adam` is the `optimizer`.

For more information about which hyperparameters are used for each `optimizer`, see [Object Detection - TensorFlow Hyperparameters](object-detection-tensorflow-Hyperparameter.md).


| Parameter Name | Parameter Type | Recommended Ranges | 
| --- | --- | --- | 
| batch\$1size | IntegerParameterRanges | MinValue: 8, MaxValue: 512 | 
| beta\$11 | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.999 | 
| beta\$12 | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.999 | 
| eps | ContinuousParameterRanges | MinValue: 1e-8, MaxValue: 1.0 | 
| learning\$1rate | ContinuousParameterRanges | MinValue: 1e-6, MaxValue: 0.5 | 
| momentum | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.999 | 
| optimizer | CategoricalParameterRanges | ['sgd', ‘adam’, ‘rmsprop’, 'nesterov', 'adagrad', 'adadelta'] | 
| regularizers\$1l2 | ContinuousParameterRanges | MinValue: 0.0, MaxValue: 0.999 | 
| train\$1only\$1on\$1top\$1layer | CategoricalParameterRanges | ['True', 'False'] | 
| initial\$1accumulator\$1value | CategoricalParameterRanges | MinValue: 0.0, MaxValue: 0.999 | 

# Semantic Segmentation Algorithm
<a name="semantic-segmentation"></a>

The SageMaker AI semantic segmentation algorithm provides a fine-grained, pixel-level approach to developing computer vision applications. It tags every pixel in an image with a class label from a predefined set of classes. Tagging is fundamental for understanding scenes, which is critical to an increasing number of computer vision applications, such as self-driving vehicles, medical imaging diagnostics, and robot sensing. 

For comparison, the SageMaker AI [Image Classification - MXNet](image-classification.md) is a supervised learning algorithm that analyzes only whole images, classifying them into one of multiple output categories. The [Object Detection - MXNet](object-detection.md) is a supervised learning algorithm that detects and classifies all instances of an object in an image. It indicates the location and scale of each object in the image with a rectangular bounding box. 

Because the semantic segmentation algorithm classifies every pixel in an image, it also provides information about the shapes of the objects contained in the image. The segmentation output is represented as a grayscale image, called a *segmentation mask*. A segmentation mask is a grayscale image with the same shape as the input image.

The SageMaker AI semantic segmentation algorithm is built using the [MXNet Gluon framework and the Gluon CV toolkit](https://github.com/dmlc/gluon-cv). It provides you with a choice of three built-in algorithms to train a deep neural network. You can use the [Fully-Convolutional Network (FCN) algorithm ](https://arxiv.org/abs/1605.06211), [Pyramid Scene Parsing (PSP) algorithm](https://arxiv.org/abs/1612.01105), or [DeepLabV3](https://arxiv.org/abs/1706.05587). 

Each of the three algorithms has two distinct components: 
+ The *backbone* (or *encoder*)—A network that produces reliable activation maps of features.
+ The *decoder*—A network that constructs the segmentation mask from the encoded activation maps.

You also have a choice of backbones for the FCN, PSP, and DeepLabV3 algorithms: [ResNet50 or ResNet101](https://arxiv.org/abs/1512.03385). These backbones include pretrained artifacts that were originally trained on the [ImageNet](http://www.image-net.org/) classification task. You can fine-tune these backbones for segmentation using your own data. Or, you can initialize and train these networks from scratch using only your own data. The decoders are never pretrained. 

To deploy the trained model for inference, use the SageMaker AI hosting service. During inference, you can request the segmentation mask either as a PNG image or as a set of probabilities for each class for each pixel. You can use these masks as part of a larger pipeline that includes additional downstream image processing or other applications.

**Topics**
+ [

## Semantic Segmentation Sample Notebooks
](#semantic-segmentation-sample-notebooks)
+ [

## Input/Output Interface for the Semantic Segmentation Algorithm
](#semantic-segmentation-inputoutput)
+ [

## EC2 Instance Recommendation for the Semantic Segmentation Algorithm
](#semantic-segmentation-instances)
+ [

# Semantic Segmentation Hyperparameters
](segmentation-hyperparameters.md)
+ [

# Tuning a Semantic Segmentation Model
](semantic-segmentation-tuning.md)

## Semantic Segmentation Sample Notebooks
<a name="semantic-segmentation-sample-notebooks"></a>

For a sample Jupyter notebook that uses the SageMaker AI semantic segmentation algorithm to train a model and deploy it to perform inferences, see the [Semantic Segmentation Example](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/semantic_segmentation_pascalvoc/semantic_segmentation_pascalvoc.html). For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker AI, see [Amazon SageMaker notebook instances](nbi.md). 

To see a list of all of the SageMaker AI samples, create and open a notebook instance, and choose the **SageMaker AI Examples** tab. The example semantic segmentation notebooks are located under **Introduction to Amazon algorithms**. To open a notebook, choose its **Use** tab, and choose **Create copy**.

## Input/Output Interface for the Semantic Segmentation Algorithm
<a name="semantic-segmentation-inputoutput"></a>

SageMaker AI semantic segmentation expects the customer's training dataset to be on [Amazon Simple Storage Service (Amazon S3)](https://aws.amazon.com/s3/). Once trained, it produces the resulting model artifacts on Amazon S3. The input interface format for the SageMaker AI semantic segmentation is similar to that of most standardized semantic segmentation benchmarking datasets. The dataset in Amazon S3 is expected to be presented in two channels, one for `train` and one for `validation` using four directories, two for images and two for annotations. Annotations are expected to be uncompressed PNG images. The dataset might also have a label map that describes how the annotation mappings are established. If not, the algorithm uses a default. It also supports the augmented manifest image format (`application/x-image`) for training in Pipe input mode straight from Amazon S3. For inference, an endpoint accepts images with an `image/jpeg` content type. 

### How Training Works
<a name="semantic-segmentation-inputoutput-training"></a>

The training data is split into four directories: `train`, `train_annotation`, `validation`, and `validation_annotation`. There is a channel for each of these directories. The dataset also expected to have one `label_map.json` file per channel for `train_annotation` and `validation_annotation` respectively. If you don't provide these JSON files, SageMaker AI provides the default set label map.

The dataset specifying these files should look similar to the following example:

```
s3://bucket_name
    |
    |- train
                 |
                 | - 0000.jpg
                 | - coffee.jpg
    |- validation
                 |
                 | - 00a0.jpg
                 | - bananna.jpg
    |- train_annotation
                 |
                 | - 0000.png
                 | - coffee.png
    |- validation_annotation
                 |
                 | - 00a0.png
                 | - bananna.png
    |- label_map
                 | - train_label_map.json
                 | - validation_label_map.json
```

Every JPG image in the train and validation directories has a corresponding PNG label image with the same name in the `train_annotation` and `validation_annotation` directories. This naming convention helps the algorithm to associate a label with its corresponding image during training. The `train`, `train_annotation`, `validation`, and `validation_annotation` channels are mandatory. The annotations are single-channel PNG images. The format works as long as the metadata (modes) in the image helps the algorithm read the annotation images into a single-channel 8-bit unsigned integer. For more information on our support for modes, see the [Python Image Library documentation](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#modes). We recommend using the 8-bit pixel, true color `P` mode. 

The image that is encoded is a simple 8-bit integer when using modes. To get from this mapping to a map of a label, the algorithm uses one mapping file per channel, called the *label map*. The label map is used to map the values in the image with actual label indices. In the default label map, which is provided by default if you don’t provide one, the pixel value in an annotation matrix (image) directly index the label. These images can be grayscale PNG files or 8-bit indexed PNG files. The label map file for the unscaled default case is the following: 

```
{
  "scale": "1"
}
```

To provide some contrast for viewing, some annotation software scales the label images by a constant amount. To support this, the SageMaker AI semantic segmentation algorithm provides a rescaling option to scale down the values to actual label values. When scaling down doesn’t convert the value to an appropriate integer, the algorithm defaults to the greatest integer less than or equal to the scale value. The following code shows how to set the scale value to rescale the label values:

```
{
  "scale": "3"
}
```

The following example shows how this `"scale"` value is used to rescale the `encoded_label` values of the input annotation image when they are mapped to the `mapped_label` values to be used in training. The label values in the input annotation image are 0, 3, 6, with scale 3, so they are mapped to 0, 1, 2 for training:

```
encoded_label = [0, 3, 6]
mapped_label = [0, 1, 2]
```

In some cases, you might need to specify a particular color mapping for each class. Use the map option in the label mapping as shown in the following example of a `label_map` file:

```
{
    "map": {
        "0": 5,
        "1": 0,
        "2": 2
    }
}
```

This label mapping for this example is:

```
encoded_label = [0, 5, 2]
mapped_label = [1, 0, 2]
```

With label mappings, you can use different annotation systems and annotation software to obtain data without a lot of preprocessing. You can provide one label map per channel. The files for a label map in the `label_map` channel must follow the naming conventions for the four directory structure. If you don't provide a label map, the algorithm assumes a scale of 1 (the default).

### Training with the Augmented Manifest Format
<a name="semantic-segmentation-inputoutput-training-augmented-manifest"></a>

The augmented manifest format enables you to do training in Pipe mode using image files without needing to create RecordIO files. The augmented manifest file contains data objects and should be in [JSON Lines](http://jsonlines.org/) format, as described in the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Each line in the manifest is an entry containing the Amazon S3 URI for the image and the URI for the annotation image.

Each JSON object in the manifest file must contain a `source-ref` key. The `source-ref` key should contain the value of the Amazon S3 URI to the image. The labels are provided under the `AttributeNames` parameter value as specified in the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. It can also contain additional metadata under the metadata tag, but these are ignored by the algorithm. In the example below, the `AttributeNames` are contained in the list of image and annotation references `["source-ref", "city-streets-ref"]`. These names must have `-ref` appended to them. When using the Semantic Segmentation algorithm with Augmented Manifest, the value of the `RecordWrapperType` parameter must be `"RecordIO"` and value of the `ContentType` parameter must be `application/x-recordio`.

```
{"source-ref": "S3 bucket location", "city-streets-ref": "S3 bucket location", "city-streets-metadata": {"job-name": "label-city-streets", }}
```

For more information on augmented manifest files, see [Augmented Manifest Files for Training Jobs](augmented-manifest.md).

### Incremental Training
<a name="semantic-segmentation-inputoutput-incremental-training"></a>

You can also seed the training of a new model with a model that you trained previously using SageMaker AI. This incremental training saves training time when you want to train a new model with the same or similar data. Currently, incremental training is supported only for models trained with the built-in SageMaker AI Semantic Segmentation.

To use your own pre-trained model, specify the `ChannelName` as "model" in the `InputDataConfig` for the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request. Set the `ContentType` for the model channel to `application/x-sagemaker-model`. The `backbone`, `algorithm`, `crop_size`, and `num_classes` input parameters that define the network architecture must be consistently specified in the input hyperparameters of the new model and the pre-trained model that you upload to the model channel. For the pretrained model file, you can use the compressed (.tar.gz) artifacts from SageMaker AI outputs. You can only use Image formats for input data. For more information on incremental training and for instructions on how to use it, see [Use Incremental Training in Amazon SageMaker AI](incremental-training.md). 

### Produce Inferences
<a name="semantic-segmentation-inputoutput-inference"></a>

To query a trained model that is deployed to an endpoint, you need to provide an image and an `AcceptType` that denotes the type of output required. The endpoint takes JPEG images with an `image/jpeg` content type. If you request an `AcceptType` of `image/png`, the algorithm outputs a PNG file with a segmentation mask in the same format as the labels themselves. If you request an accept type of`application/x-recordio-protobuf`, the algorithm returns class probabilities encoded in recordio-protobuf format. The latter format outputs a 3D tensor where the third dimension is the same size as the number of classes. This component denotes the probability of each class label for each pixel.

## EC2 Instance Recommendation for the Semantic Segmentation Algorithm
<a name="semantic-segmentation-instances"></a>

The SageMaker AI semantic segmentation algorithm only supports GPU instances for training, and we recommend using GPU instances with more memory for training with large batch sizes. The algorithm can be trained using P2, P3, G4dn, or G5 instances in single machine configurations.

For inference, you can use either CPU instances (such as C5 and M5) and GPU instances (such as P3 and G4dn) or both. For information about the instance types that provide varying combinations of CPU, GPU, memory, and networking capacity for inference, see [Amazon SageMaker AI ML Instance Types](https://aws.amazon.com/sagemaker/pricing/instance-types/).

# Semantic Segmentation Hyperparameters
<a name="segmentation-hyperparameters"></a>

The following tables list the hyperparameters supported by the Amazon SageMaker AI semantic segmentation algorithm for network architecture, data inputs, and training. You specify Semantic Segmentation for training in the `AlgorithmName` of the [https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html](https://docs.amazonaws.cn/sagemaker/latest/APIReference/API_CreateTrainingJob.html) request.

**Network Architecture Hyperparameters**


| Parameter Name | Description | 
| --- | --- | 
| backbone |  The backbone to use for the algorithm's encoder component. **Optional** Valid values: `resnet-50`, `resnet-101`  Default value: `resnet-50`  | 
| use\$1pretrained\$1model |  Whether a pretrained model is to be used for the backbone. **Optional** Valid values: `True`, `False` Default value: `True`  | 
| algorithm |  The algorithm to use for semantic segmentation.  **Optional** Valid values: [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/segmentation-hyperparameters.html) Default value: `fcn`  | 

**Data Hyperparameters**


| Parameter Name | Description | 
| --- | --- | 
| num\$1classes |  The number of classes to segment. **Required** Valid values: 2 ≤ positive integer ≤ 254  | 
| num\$1training\$1samples |  The number of samples in the training data. The algorithm uses this value to set up the learning rate scheduler. **Required** Valid values: positive integer  | 
| base\$1size |  Defines how images are rescaled before cropping. Images are rescaled such that the long size length is set to `base_size` multiplied by a random number from 0.5 to 2.0, and the short size is computed to preserve the aspect ratio. **Optional** Valid values: positive integer > 16 Default value: 520  | 
| crop\$1size |  The image size for input during training. We randomly rescale the input image based on `base_size`, and then take a random square crop with side length equal to `crop_size`. The `crop_size` will be automatically rounded up to multiples of 8. **Optional** Valid values: positive integer > 16 Default value: 240  | 

**Training Hyperparameters**


| Parameter Name | Description | 
| --- | --- | 
| early\$1stopping |  Whether to use early stopping logic during training. **Optional** Valid values: `True`, `False` Default value: `False`  | 
| early\$1stopping\$1min\$1epochs |  The minimum number of epochs that must be run. **Optional** Valid values: integer Default value: 5  | 
| early\$1stopping\$1patience |  The number of epochs that meet the tolerance for lower performance before the algorithm enforces an early stop. **Optional** Valid values: integer Default value: 4  | 
| early\$1stopping\$1tolerance |  If the relative improvement of the score of the training job, the mIOU, is smaller than this value, early stopping considers the epoch as not improved. This is used only when `early_stopping` = `True`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.0  | 
| epochs |  The number of epochs with which to train. **Optional** Valid values: positive integer Default value: 10  | 
| gamma1 |  The decay factor for the moving average of the squared gradient for `rmsprop`. Used only for `rmsprop`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.9  | 
| gamma2 |  The momentum factor for `rmsprop`. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.9  | 
| learning\$1rate |  The initial learning rate.  **Optional** Valid values: 0 < float ≤ 1 Default value: 0.001  | 
| lr\$1scheduler |  The shape of the learning rate schedule that controls its decrease over time. **Optional** Valid values:  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/segmentation-hyperparameters.html) Default value: `poly`  | 
| lr\$1scheduler\$1factor |  If `lr_scheduler` is set to `step`, the ratio by which to reduce (multipy) the `learning_rate` after each of the epochs specified by the `lr_scheduler_step`. Otherwise, ignored. **Optional** Valid values: 0 ≤ float ≤ 1 Default value: 0.1  | 
| lr\$1scheduler\$1step |  A comma delimited list of the epochs after which the `learning_rate` is reduced (multiplied) by an `lr_scheduler_factor`. For example, if the value is set to `"10, 20"`, then the `learning-rate` is reduced by `lr_scheduler_factor` after the 10th epoch and again by this factor after 20th epoch. **Conditionally Required** if `lr_scheduler` is set to `step`. Otherwise, ignored. Valid values: string Default value: (No default, as the value is required when used.)  | 
| mini\$1batch\$1size |  The batch size for training. Using a large `mini_batch_size` usually results in faster training, but it might cause you to run out of memory. Memory usage is affected by the values of the `mini_batch_size` and `image_shape` parameters, and the backbone architecture. **Optional** Valid values: positive integer  Default value: 16  | 
| momentum |  The momentum for the `sgd` optimizer. When you use other optimizers, the semantic segmentation algorithm ignores this parameter. **Optional** Valid values: 0 < float ≤ 1 Default value: 0.9  | 
| optimizer |  The type of optimizer. For more information about an optimizer, choose the appropriate link: [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/segmentation-hyperparameters.html) **Optional** Valid values: `adam`, `adagrad`, `nag`, `rmsprop`, `sgd`  Default value: `sgd`  | 
| syncbn |  If set to `True`, the batch normalization mean and variance are computed over all the samples processed across the GPUs. **Optional**  Valid values: `True`, `False`  Default value: `False`  | 
| validation\$1mini\$1batch\$1size |  The batch size for validation. A large `mini_batch_size` usually results in faster training, but it might cause you to run out of memory. Memory usage is affected by the values of the `mini_batch_size` and `image_shape` parameters, and the backbone architecture.  [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/segmentation-hyperparameters.html) **Optional** Valid values: positive integer Default value: 16  | 
| weight\$1decay |  The weight decay coefficient for the `sgd` optimizer. When you use other optimizers, the algorithm ignores this parameter.  **Optional** Valid values: 0 < float < 1 Default value: 0.0001  | 

# Tuning a Semantic Segmentation Model
<a name="semantic-segmentation-tuning"></a>

*Automatic model tuning*, also known as hyperparameter tuning, finds the best version of a model by running many jobs that test a range of hyperparameters on your dataset. You choose the tunable hyperparameters, a range of values for each, and an objective metric. You choose the objective metric from the metrics that the algorithm computes. Automatic model tuning searches the hyperparameters chosen to find the combination of values that result in the model that optimizes the objective metric.

## Metrics Computed by the Semantic Segmentation Algorithm
<a name="semantic-segmentation-metrics"></a>

The semantic segmentation algorithm reports two validation metrics. When tuning hyperparameter values, choose one of these metrics as the objective.


| Metric Name | Description | Optimization Direction | 
| --- | --- | --- | 
| validation:mIOU |  The area of the intersection of the predicted segmentation and the ground truth divided by the area of union between them for images in the validation set. Also known as the Jaccard Index.  |  Maximize  | 
| validation:pixel\$1accuracy | The percentage of pixels that are correctly classified in images from the validation set. |  Maximize  | 

## Tunable Semantic Segmentation Hyperparameters
<a name="semantic-segmentation-tunable-hyperparameters"></a>

You can tune the following hyperparameters for the semantic segmentation algorithm.


| Parameter Name | Parameter Type | Recommended Ranges | 
| --- | --- | --- | 
| learning\$1rate |  ContinuousParameterRange  |  MinValue: 1e-4, MaxValue: 1e-1  | 
| mini\$1batch\$1size |  IntegerParameterRanges  |  MinValue: 1, MaxValue: 128  | 
| momentum |  ContinuousParameterRange  |  MinValue: 0.9, MaxValue: 0.999  | 
| optimzer |  CategoricalParameterRanges  |  ['sgd', 'adam', 'adadelta']  | 
| weight\$1decay |  ContinuousParameterRange  |  MinValue: 1e-5, MaxValue: 1e-3  | 