Object Detection - MXNet - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Object Detection - MXNet

The Amazon SageMaker Object Detection - MXNet algorithm detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene. The object is categorized into one of the classes in a specified collection with a confidence score that it belongs to the class. Its location and scale in the image are indicated by a rectangular bounding box. It uses the Single Shot multibox Detector (SSD) framework and supports two base networks: VGG and ResNet. The network can be trained from scratch, or trained with models that have been pre-trained on the ImageNet dataset.

Input/Output Interface for the Object Detection Algorithm

The SageMaker Object Detection algorithm supports both RecordIO (application/x-recordio) and image (image/png, image/jpeg, and application/x-image) content types for training in file mode and supports RecordIO (application/x-recordio) for training in pipe mode. However you can also train in pipe mode using the image files (image/png, image/jpeg, and application/x-image), without creating RecordIO files, by using the augmented manifest format. The recommended input format for the Amazon SageMaker object detection algorithms is Apache MXNet RecordIO. However, you can also use raw images in .jpg or .png format. The algorithm supports only application/x-image for inference.


To maintain better interoperability with existing deep learning frameworks, this differs from the protobuf data formats commonly used by other Amazon SageMaker algorithms.

See the Object Detection Sample Notebooks for more details on data formats.

Train with the RecordIO Format

If you use the RecordIO format for training, specify both train and validation channels as values for the InputDataConfig parameter of the CreateTrainingJob request. Specify one RecordIO (.rec) file in the train channel and one RecordIO file in the validation channel. Set the content type for both channels to application/x-recordio. An example of how to generate RecordIO file can be found in the object detection sample notebook. You can also use tools from the MXNet's GluonCV to generate RecordIO files for popular datasets like the PASCAL Visual Object Classes and Common Objects in Context (COCO).

Train with the Image Format

If you use the image format for training, specify train, validation, train_annotation, and validation_annotation channels as values for the InputDataConfig parameter of CreateTrainingJob request. Specify the individual image data (.jpg or .png) files for the train and validation channels. For annotation data, you can use the JSON format. Specify the corresponding .json files in the train_annotation and validation_annotation channels. Set the content type for all four channels to image/png or image/jpeg based on the image type. You can also use the content type application/x-image when your dataset contains both .jpg and .png images. The following is an example of a .json file.

{ "file": "your_image_directory/sample_image1.jpg", "image_size": [ { "width": 500, "height": 400, "depth": 3 } ], "annotations": [ { "class_id": 0, "left": 111, "top": 134, "width": 61, "height": 128 }, { "class_id": 0, "left": 161, "top": 250, "width": 79, "height": 143 }, { "class_id": 1, "left": 101, "top": 185, "width": 42, "height": 130 } ], "categories": [ { "class_id": 0, "name": "dog" }, { "class_id": 1, "name": "cat" } ] }

Each image needs a .json file for annotation, and the .json file should have the same name as the corresponding image. The name of above .json file should be "sample_image1.json". There are four properties in the annotation .json file. The property "file" specifies the relative path of the image file. For example, if your training images and corresponding .json files are stored in s3://your_bucket/train/sample_image and s3://your_bucket/train_annotation, specify the path for your train and train_annotation channels as s3://your_bucket/train and s3://your_bucket/train_annotation, respectively.

In the .json file, the relative path for an image named sample_image1.jpg should be sample_image/sample_image1.jpg. The "image_size" property specifies the overall image dimensions. The SageMaker object detection algorithm currently only supports 3-channel images. The "annotations" property specifies the categories and bounding boxes for objects within the image. Each object is annotated by a "class_id" index and by four bounding box coordinates ("left", "top", "width", "height"). The "left" (x-coordinate) and "top" (y-coordinate) values represent the upper-left corner of the bounding box. The "width" (x-coordinate) and "height" (y-coordinate) values represent the dimensions of the bounding box. The origin (0, 0) is the upper-left corner of the entire image. If you have multiple objects within one image, all the annotations should be included in a single .json file. The "categories" property stores the mapping between the class index and class name. The class indices should be numbered successively and the numbering should start with 0. The "categories" property is optional for the annotation .json file

Train with Augmented Manifest Image Format

The augmented manifest format enables you to do training in pipe mode using image files without needing to create RecordIO files. You need to specify both train and validation channels as values for the InputDataConfig parameter of the CreateTrainingJob request. While using the format, an S3 manifest file needs to be generated that contains the list of images and their corresponding annotations. The manifest file format should be in JSON Lines format in which each line represents one sample. The images are specified using the 'source-ref' tag that points to the S3 location of the image. The annotations are provided under the "AttributeNames" parameter value as specified in the CreateTrainingJob request. It can also contain additional metadata under the metadata tag, but these are ignored by the algorithm. In the following example, the "AttributeNames are contained in the list ["source-ref", "bounding-box"]:

{"source-ref": "s3://your_bucket/image1.jpg", "bounding-box":{"image_size":[{ "width": 500, "height": 400, "depth":3}], "annotations":[{"class_id": 0, "left": 111, "top": 134, "width": 61, "height": 128}, {"class_id": 5, "left": 161, "top": 250, "width": 80, "height": 50}]}, "bounding-box-metadata":{"class-map":{"0": "dog", "5": "horse"}, "type": "groundtruth/object-detection"}} {"source-ref": "s3://your_bucket/image2.jpg", "bounding-box":{"image_size":[{ "width": 400, "height": 300, "depth":3}], "annotations":[{"class_id": 1, "left": 100, "top": 120, "width": 43, "height": 78}]}, "bounding-box-metadata":{"class-map":{"1": "cat"}, "type": "groundtruth/object-detection"}}

The order of "AttributeNames" in the input files matters when training the Object Detection algorithm. It accepts piped data in a specific order, with image first, followed by annotations. So the "AttributeNames" in this example are provided with "source-ref" first, followed by "bounding-box". When using Object Detection with Augmented Manifest, the value of parameter RecordWrapperType must be set as "RecordIO".

For more information on augmented manifest files, see Provide Dataset Metadata to Training Jobs with an Augmented Manifest File.

Incremental Training

You can also seed the training of a new model with the artifacts from a model that you trained previously with SageMaker. Incremental training saves training time when you want to train a new model with the same or similar data. SageMaker object detection models can be seeded only with another built-in object detection model trained in SageMaker.

To use a pretrained model, in the CreateTrainingJob request, specify the ChannelName as "model" in the InputDataConfig parameter. Set the ContentType for the model channel to application/x-sagemaker-model. The input hyperparameters of both the new model and the pretrained model that you upload to the model channel must have the same settings for the base_network and num_classes input parameters. These parameters define the network architecture. For the pretrained model file, use the compressed model artifacts (in .tar.gz format) output by SageMaker. You can use either RecordIO or image formats for input data.

For more information on incremental training and for instructions on how to use it, see Use Incremental Training in Amazon SageMaker.

EC2 Instance Recommendation for the Object Detection Algorithm

The object detection algorithm supports P2, P3, G4dn, and G5 GPU instance families. We recommend using GPU instances with more memory for training with large batch sizes. You can run the object detection algorithm on multi-GPU and mult-machine settings for distributed training.

You can use both CPU (such as C5 and M5) and GPU (such as P3 and G4dn) instances for inference.

Object Detection Sample Notebooks

For a sample notebook that shows how to use the SageMaker Object Detection algorithm to train and host a model on the

Caltech Birds (CUB 200 2011) dataset using the Single Shot multibox Detector algorithm, see Amazon SageMaker Object Detection for Bird Species. For instructions how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see Amazon SageMaker Notebook Instances. Once you have created a notebook instance and opened it, select the SageMaker Examples tab to see a list of all the SageMaker samples. The object detection example notebook using the Object Detection algorithm is located in the Introduction to Amazon Algorithms section. To open a notebook, click on its Use tab and select Create copy.

For more information about the Amazon SageMaker Object Detection algorithm, see the following blog posts: