Object2Vec Hyperparameters - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Object2Vec Hyperparameters

In the CreateTrainingJob request, you specify the training algorithm. You can also specify algorithm-specific hyperparameters as string-to-string maps. The following table lists the hyperparameters for the Object2Vec training algorithm.

Parameter Name Description
enc0_max_seq_len

The maximum sequence length for the enc0 encoder.

Required

Valid values: 1 ≤ integer ≤ 5000

enc0_vocab_size

The vocabulary size of enc0 tokens.

Required

Valid values: 2 ≤ integer ≤ 3000000

bucket_width

The allowed difference between data sequence length when bucketing is enabled. To enable bucketing, specify a non-zero value for this parameter.

Optional

Valid values: 0 ≤ integer ≤ 100

Default value: 0 (no bucketing)

comparator_list

A list used to customize the way in which two embeddings are compared. The Object2Vec comparator operator layer takes the encodings from both encoders as inputs and outputs a single vector. This vector is a concatenation of subvectors. The string values passed to the comparator_list and the order in which they are passed determine how these subvectors are assembled. For example, if comparator_list="hadamard, concat", then the comparator operator constructs the vector by concatenating the Hadamard product of two encodings and the concatenation of two encodings. If, on the other hand, comparator_list="hadamard", then the comparator operator constructs the vector as the hadamard product of only two encodings.

Optional

Valid values: A string that contains any combination of the names of the three binary operators: hadamard, concat, or abs_diff. The Object2Vec algorithm currently requires that the two vector encodings have the same dimension. These operators produce the subvectors as follows:

  • hadamard: Constructs a vector as the Hadamard (element-wise) product of two encodings.

  • concat: Constructs a vector as the concatenation of two encodings.

  • abs_diff: Constructs a vector as the absolute difference between two encodings.

Default value: "hadamard, concat, abs_diff"

dropout

The dropout probability for network layers. Dropout is a form of regularization used in neural networks that reduces overfitting by trimming codependent neurons.

Optional

Valid values: 0.0 ≤ float ≤ 1.0

Default value: 0.0

early_stopping_patience

The number of consecutive epochs without improvement allowed before early stopping is applied. Improvement is defined by with the early_stopping_tolerance hyperparameter.

Optional

Valid values: 1 ≤ integer ≤ 5

Default value: 3

early_stopping_tolerance

The reduction in the loss function that an algorithm must achieve between consecutive epochs to avoid early stopping after the number of consecutive epochs specified in the early_stopping_patience hyperparameter concludes.

Optional

Valid values: 0.000001 ≤ float ≤ 0.1

Default value: 0.01

enc_dim

The dimension of the output of the embedding layer.

Optional

Valid values: 4 ≤ integer ≤ 10000

Default value: 4096

enc0_network

The network model for the enc0 encoder.

Optional

Valid values: hcnn, bilstm, or pooled_embedding

  • hcnn: A hierarchical convolutional neural network.

  • bilstm: A bidirectional long short-term memory network (biLSTM), in which the signal propagates backward and forward in time. This is an appropriate recurrent neural network (RNN) architecture for sequential learning tasks.

  • pooled_embedding: Averages the embeddings of all of the tokens in the input.

Default value: hcnn

enc0_cnn_filter_width

The filter width of the convolutional neural network (CNN) enc0 encoder.

Conditional

Valid values: 1 ≤ integer ≤ 9

Default value: 3

enc0_freeze_pretrained_embedding

Whether to freeze enc0 pretrained embedding weights.

Conditional

Valid values: True or False

Default value: True

enc0_layers

The number of layers in the enc0 encoder.

Conditional

Valid values: auto or 1 ≤ integer ≤ 4

  • For hcnn, auto means 4.

  • For bilstm, auto means 1.

  • For pooled_embedding, auto ignores the number of layers.

Default value: auto

enc0_pretrained_embedding_file

The filename of the pretrained enc0 token embedding file in the auxiliary data channel.

Conditional

Valid values: String with alphanumeric characters, underscore, or period. [A-Za-z0-9\.\_]

Default value: "" (empty string)

enc0_token_embedding_dim

The output dimension of the enc0 token embedding layer.

Conditional

Valid values: 2 ≤ integer ≤ 1000

Default value: 300

enc0_vocab_file

The vocabulary file for mapping pretrained enc0 token embedding vectors to numerical vocabulary IDs.

Conditional

Valid values: String with alphanumeric characters, underscore, or period. [A-Za-z0-9\.\_]

Default value: "" (empty string)

enc1_network

The network model for the enc1 encoder. If you want the enc1 encoder to use the same network model as enc0, including the hyperparameter values, set the value to enc0.

Note

Even when the enc0 and enc1 encoder networks have symmetric architectures, you can't shared parameter values for these networks.

Optional

Valid values: enc0, hcnn, bilstm, or pooled_embedding

  • enc0: The network model for the enc0 encoder.

  • hcnn: A hierarchical convolutional neural network.

  • bilstm: A bidirectional LSTM, in which the signal propagates backward and forward in time. This is an appropriate recurrent neural network (RNN) architecture for sequential learning tasks.

  • pooled_embedding: The averages of the embeddings of all of the tokens in the input.

Default value: enc0

enc1_cnn_filter_width

The filter width of the CNN enc1 encoder.

Conditional

Valid values: 1 ≤ integer ≤ 9

Default value: 3

enc1_freeze_pretrained_embedding

Whether to freeze enc1 pretrained embedding weights.

Conditional

Valid values: True or False

Default value: True

enc1_layers

The number of layers in the enc1 encoder.

Conditional

Valid values: auto or 1 ≤ integer ≤ 4

  • For hcnn, auto means 4.

  • For bilstm, auto means 1.

  • For pooled_embedding, auto ignores the number of layers.

Default value: auto

enc1_max_seq_len

The maximum sequence length for the enc1 encoder.

Conditional

Valid values: 1 ≤ integer ≤ 5000

enc1_pretrained_embedding_file

The name of the enc1 pretrained token embedding file in the auxiliary data channel.

Conditional

Valid values: String with alphanumeric characters, underscore, or period. [A-Za-z0-9\.\_]

Default value: "" (empty string)

enc1_token_embedding_dim

The output dimension of the enc1 token embedding layer.

Conditional

Valid values: 2 ≤ integer ≤ 1000

Default value: 300

enc1_vocab_file

The vocabulary file for mapping pretrained enc1 token embeddings to vocabulary IDs.

Conditional

Valid values: String with alphanumeric characters, underscore, or period. [A-Za-z0-9\.\_]

Default value: "" (empty string)

enc1_vocab_size

The vocabulary size of enc0 tokens.

Conditional

Valid values: 2 ≤ integer ≤ 3000000

epochs

The number of epochs to run for training.

Optional

Valid values: 1 ≤ integer ≤ 100

Default value: 30

learning_rate

The learning rate for training.

Optional

Valid values: 1.0E-6 ≤ float ≤ 1.0

Default value: 0.0004

mini_batch_size

The batch size that the dataset is split into for an optimizer during training.

Optional

Valid values: 1 ≤ integer ≤ 10000

Default value: 32

mlp_activation

The type of activation function for the multilayer perceptron (MLP) layer.

Optional

Valid values: tanh, relu, or linear

  • tanh: Hyperbolic tangent

  • relu: Rectified linear unit (ReLU)

  • linear: Linear function

Default value: linear

mlp_dim

The dimension of the output from MLP layers.

Optional

Valid values: 2 ≤ integer ≤ 10000

Default value: 512

mlp_layers

The number of MLP layers in the network.

Optional

Valid values: 0 ≤ integer ≤ 10

Default value: 2

negative_sampling_rate

The ratio of negative samples, generated to assist in training the algorithm, to positive samples that are provided by users. Negative samples represent data that is unlikely to occur in reality and are labelled negatively for training. They facilitate training a model to discriminate between the positive samples observed and the negative samples that are not. To specify the ratio of negative samples to positive samples used for training, set the value to a positive integer. For example, if you train the algorithm on input data in which all of the samples are positive and set negative_sampling_rate to 2, the Object2Vec algorithm internally generates two negative samples per positive sample. If you don't want to generate or use negative samples during training, set the value to 0.

Optional

Valid values: 0 ≤ integer

Default value: 0 (off)

num_classes

The number of classes for classification training. Amazon SageMaker ignores this hyperparameter for regression problems.

Optional

Valid values: 2 ≤ integer ≤ 30

Default value: 2

optimizer

The optimizer type.

Optional

Valid values: adadelta, adagrad, adam, sgd, or rmsprop.

Default value: adam

output_layer

The type of output layer where you specify that the task is regression or classification.

Optional

Valid values: softmax or mean_squared_error

  • softmax: The Softmax function used for classification.

  • mean_squared_error: The MSE used for regression.

Default value: softmax

tied_token_embedding_weight

Whether to use a shared embedding layer for both encoders. If the inputs to both encoders use the same token-level units, use a shared token embedding layer. For example, for a collection of documents, if one encoder encodes sentences and another encodes whole documents, you can use a shared token embedding layer. That's because both sentences and documents are composed of word tokens from the same vocabulary.

Optional

Valid values: True or False

Default value: False

token_embedding_storage_type

The mode of gradient update used during training: when the dense mode is used, the optimizer calculates the full gradient matrix for the token embedding layer even if most rows of the gradient are zero-valued. When sparse mode is used, the optimizer only stores rows of the gradient that are actually being used in the mini-batch. If you want the algorithm to perform lazy gradient updates, which calculate the gradients only in the non-zero rows and which speed up training, specify row_sparse. Setting the value to row_sparse constrains the values available for other hyperparameters, as follows:

  • The optimizer hyperparameter must be set to adam, adagrad, or sgd. Otherwise, the algorithm throws a CustomerValueError.

  • The algorithm automatically disables bucketing, setting the bucket_width hyperparameter to 0.

Optional

Valid values: dense or row_sparse

Default value: dense

weight_decay

The weight decay parameter used for optimization.

Optional

Valid values: 0 ≤ float ≤ 10000

Default value: 0 (no decay)