Sequence-to-Sequence Hyperparameters - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Sequence-to-Sequence Hyperparameters

Parameter Name Description
batch_size

Mini batch size for gradient descent.

Optional

Valid values: positive integer

Default value: 64

beam_size

Length of the beam for beam search. Used during training for computing bleu and used during inference.

Optional

Valid values: positive integer

Default value: 5

bleu_sample_size

Number of instances to pick from validation dataset to decode and compute bleu score during training. Set to -1 to use full validation set (if bleu is chosen as optimized_metric).

Optional

Valid values: integer

Default value: 0

bucket_width

Returns (source,target) buckets up to (max_seq_len_source, max_seq_len_target). The longer side of the data uses steps of bucket_width while the shorter side uses steps scaled down by the average target/source length ratio. If one sided reaches its maximum length before the other, width of extra buckets on that side is fixed to that side of max_len.

Optional

Valid values: positive integer

Default value: 10

bucketing_enabled

Set to false to disable bucketing, unroll to maximum length.

Optional

Valid values: true or false

Default value: true

checkpoint_frequency_num_batches

Checkpoint and evaluate every x batches. This checkpointing hyperparameter is passed to the SageMaker's seq2seq algorithm for early stopping and retrieving the best model. The algorithm's checkpointing runs locally in the algorithm's training container and is not compatible with SageMaker checkpointing. The algorithm temporarily saves checkpoints to a local path and stores the best model artifact to the model output path in S3 after the training job has stopped.

Optional

Valid values: positive integer

Default value: 1000

checkpoint_threshold

Maximum number of checkpoints model is allowed to not improve in optimized_metric on validation dataset before training is stopped. This checkpointing hyperparameter is passed to the SageMaker's seq2seq algorithm for early stopping and retrieving the best model. The algorithm's checkpointing runs locally in the algorithm's training container and is not compatible with SageMaker checkpointing. The algorithm temporarily saves checkpoints to a local path and stores the best model artifact to the model output path in S3 after the training job has stopped.

Optional

Valid values: positive integer

Default value: 3

clip_gradient

Clip absolute gradient values greater than this. Set to negative to disable.

Optional

Valid values: float

Default value: 1

cnn_activation_type

The cnn activation type to be used.

Optional

Valid values: String. One of glu, relu, softrelu, sigmoid, or tanh.

Default value: glu

cnn_hidden_dropout

Dropout probability for dropout between convolutional layers.

Optional

Valid values: Float. Range in [0,1].

Default value: 0

cnn_kernel_width_decoder

Kernel width for the cnn decoder.

Optional

Valid values: positive integer

Default value: 5

cnn_kernel_width_encoder

Kernel width for the cnn encoder.

Optional

Valid values: positive integer

Default value: 3

cnn_num_hidden

Number of cnn hidden units for encoder and decoder.

Optional

Valid values: positive integer

Default value: 512

decoder_type

Decoder type.

Optional

Valid values: String. Either rnn or cnn.

Default value: rnn

embed_dropout_source

Dropout probability for source side embeddings.

Optional

Valid values: Float. Range in [0,1].

Default value: 0

embed_dropout_target

Dropout probability for target side embeddings.

Optional

Valid values: Float. Range in [0,1].

Default value: 0

encoder_type

Encoder type. The rnn architecture is based on attention mechanism by Bahdanau et al. and cnn architecture is based on Gehring et al.

Optional

Valid values: String. Either rnn or cnn.

Default value: rnn

fixed_rate_lr_half_life

Half life for learning rate in terms of number of checkpoints for fixed_rate_* schedulers.

Optional

Valid values: positive integer

Default value: 10

learning_rate

Initial learning rate.

Optional

Valid values: float

Default value: 0.0003

loss_type

Loss function for training.

Optional

Valid values: String. cross-entropy

Default value: cross-entropy

lr_scheduler_type

Learning rate scheduler type. plateau_reduce means reduce the learning rate whenever optimized_metric on validation_accuracy plateaus. inv_t is inverse time decay. learning_rate/(1+decay_rate*t)

Optional

Valid values: String. One of plateau_reduce, fixed_rate_inv_t, or fixed_rate_inv_sqrt_t.

Default value: plateau_reduce

max_num_batches

Maximum number of updates/batches to process. -1 for infinite.

Optional

Valid values: integer

Default value: -1

max_num_epochs

Maximum number of epochs to pass through training data before fitting is stopped. Training continues until this number of epochs even if validation accuracy is not improving if this parameter is passed. Ignored if not passed.

Optional

Valid values: Positive integer and less than or equal to max_num_epochs.

Default value: none

max_seq_len_source

Maximum length for the source sequence length. Sequences longer than this length are truncated to this length.

Optional

Valid values: positive integer

Default value: 100

max_seq_len_target

Maximum length for the target sequence length. Sequences longer than this length are truncated to this length.

Optional

Valid values: positive integer

Default value: 100

min_num_epochs

Minimum number of epochs the training must run before it is stopped via early_stopping conditions.

Optional

Valid values: positive integer

Default value: 0

momentum

Momentum constant used for sgd. Don't pass this parameter if you are using adam or rmsprop.

Optional

Valid values: float

Default value: none

num_embed_source

Embedding size for source tokens.

Optional

Valid values: positive integer

Default value: 512

num_embed_target

Embedding size for target tokens.

Optional

Valid values: positive integer

Default value: 512

num_layers_decoder

Number of layers for Decoder rnn or cnn.

Optional

Valid values: positive integer

Default value: 1

num_layers_encoder

Number of layers for Encoder rnn or cnn.

Optional

Valid values: positive integer

Default value: 1

optimized_metric

Metrics to optimize with early stopping.

Optional

Valid values: String. One of perplexity, accuracy, or bleu.

Default value: perplexity

optimizer_type

Optimizer to choose from.

Optional

Valid values: String. One of adam, sgd, or rmsprop.

Default value: adam

plateau_reduce_lr_factor

Factor to multiply learning rate with (for plateau_reduce).

Optional

Valid values: float

Default value: 0.5

plateau_reduce_lr_threshold

For plateau_reduce scheduler, multiply learning rate with reduce factor if optimized_metric didn't improve for this many checkpoints.

Optional

Valid values: positive integer

Default value: 3

rnn_attention_in_upper_layers

Pass the attention to upper layers of rnn, like Google NMT paper. Only applicable if more than one layer is used.

Optional

Valid values: boolean (true or false)

Default value: true

rnn_attention_num_hidden

Number of hidden units for attention layers. defaults to rnn_num_hidden.

Optional

Valid values: positive integer

Default value: rnn_num_hidden

rnn_attention_type

Attention model for encoders. mlp refers to concat and bilinear refers to general from the Luong et al. paper.

Optional

Valid values: String. One of dot, fixed, mlp, or bilinear.

Default value: mlp

rnn_cell_type

Specific type of rnn architecture.

Optional

Valid values: String. Either lstm or gru.

Default value: lstm

rnn_decoder_state_init

How to initialize rnn decoder states from encoders.

Optional

Valid values: String. One of last, avg, or zero.

Default value: last

rnn_first_residual_layer

First rnn layer to have a residual connection, only applicable if number of layers in encoder or decoder is more than 1.

Optional

Valid values: positive integer

Default value: 2

rnn_num_hidden

The number of rnn hidden units for encoder and decoder. This must be a multiple of 2 because the algorithm uses bi-directional Long Term Short Term Memory (LSTM) by default.

Optional

Valid values: positive even integer

Default value: 1024

rnn_residual_connections

Add residual connection to stacked rnn. Number of layers should be more than 1.

Optional

Valid values: boolean (true or false)

Default value: false

rnn_decoder_hidden_dropout

Dropout probability for hidden state that combines the context with the rnn hidden state in the decoder.

Optional

Valid values: Float. Range in [0,1].

Default value: 0

training_metric

Metrics to track on training on validation data.

Optional

Valid values: String. Either perplexity or accuracy.

Default value: perplexity

weight_decay

Weight decay constant.

Optional

Valid values: float

Default value: 0

weight_init_scale

Weight initialization scale (for uniform and xavier initialization).

Optional

Valid values: float

Default value: 2.34

weight_init_type

Type of weight initialization.

Optional

Valid values: String. Either uniform or xavier.

Default value: xavier

xavier_factor_type

Xavier factor type.

Optional

Valid values: String. One of in, out, or avg.

Default value: in