Text Classification - TensorFlow Hyperparameters - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Text Classification - TensorFlow Hyperparameters

Hyperparameters are parameters that are set before a machine learning model begins learning. The following hyperparameters are supported by the Amazon SageMaker built-in Object Detection - TensorFlow algorithm. See Tune a Text Classification - TensorFlow model for information on hyperparameter tuning.

Parameter Name Description
batch_size

The batch size for training. For training on instances with multiple GPUs, this batch size is used across the GPUs.

Valid values: positive integer.

Default value: 32.

beta_1

The beta1 for the "adam" and "adamw" optimizers. Represents the exponential decay rate for the first moment estimates. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.9.

beta_2

The beta2 for the "adam" and "adamw" optimizers. Represents the exponential decay rate for the second moment estimates. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.999.

dropout_rate

The dropout rate for the dropout layer in the top classification layer. Used only when reinitialize_top_layer is set to "True".

Valid values: float, range: [0.0, 1.0].

Default value: 0.2

early_stopping

Set to "True" to use early stopping logic during training. If "False", early stopping is not used.

Valid values: string, either: ("True" or "False").

Default value: "False".

early_stopping_min_delta The minimum change needed to qualify as an improvement. An absolute change less than the value of early_stopping_min_delta does not qualify as improvement. Used only when early_stopping is set to "True".

Valid values: float, range: [0.0, 1.0].

Default value: 0.0.

early_stopping_patience

The number of epochs to continue training with no improvement. Used only when early_stopping is set to "True".

Valid values: positive integer.

Default value: 5.

epochs

The number of training epochs.

Valid values: positive integer.

Default value: 10.

epsilon

The epsilon for "adam", "rmsprop", "adadelta", and "adagrad" optimizers. Usually set to a small value to avoid division by 0. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 1e-7.

initial_accumulator_value

The starting value for the accumulators, or the per-parameter momentum values, for the "adagrad" optimizer. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.0001.

learning_rate The optimizer learning rate.

Valid values: float, range: [0.0, 1.0].

Default value: 0.001.

momentum

The momentum for the "sgd" and "nesterov" optimizers. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.9.

optimizer

The optimizer type. For more information, see Optimizers in the TensorFlow documentation.

Valid values: string, any of the following: ("adamw", "adam", "sgd", "nesterov", "rmsprop", "adagrad" , "adadelta").

Default value: "adam".

regularizers_l2

The L2 regularization factor for the dense layer in the classification layer. Used only when reinitialize_top_layer is set to "True".

Valid values: float, range: [0.0, 1.0].

Default value: 0.0001.

reinitialize_top_layer

If set to "Auto", the top classification layer parameters are re-initialized during fine-tuning. For incremental training, top classification layer parameters are not re-initialized unless set to "True".

Valid values: string, any of the following: ("Auto", "True" or "False").

Default value: "Auto".

rho

The discounting factor for the gradient of the "adadelta" and "rmsprop" optimizers. Ignored for other optimizers.

Valid values: float, range: [0.0, 1.0].

Default value: 0.95.

train_only_on_top_layer

If "True", only the top classification layer parameters are fine-tuned. If "False", all model parameters are fine-tuned.

Valid values: string, either: ("True" or "False").

Default value: "False".

validation_split_ratio

The fraction of training data to randomly split to create validation data. Only used if validation data is not provided through the validation channel.

Valid values: float, range: [0.0, 1.0].

Default value: 0.2.

warmup_steps_fraction

The fraction of the total number of gradient update steps, where the learning rate increases from 0 to the initial learning rate as a warm up. Only used with the adamw optimizer.

Valid values: float, range: [0.0, 1.0].

Default value: 0.1.