K-Means Hyperparameters

In the CreateTrainingJob request, you specify the training algorithm that you want to use. You can also specify algorithm-specific hyperparameters as string-to-string maps. The following table lists the hyperparameters for the k-means training algorithm provided by Amazon SageMaker AI. For more information about how k-means clustering works, see How K-Means Clustering Works.

Parameter Name	Description
`feature_dim`	The number of features in the input data. Required Valid values: Positive integer
`k`	The number of required clusters. Required Valid values: Positive integer
`epochs`	The number of passes done over the training data. Optional Valid values: Positive integer Default value: 1
`eval_metrics`	A JSON list of metric types used to report a score for the model. Allowed values are `msd` for Means Square Deviation and `ssd` for Sum of Square Distance. If test data is provided, the score is reported for each of the metrics requested. Optional Valid values: Either `[\"msd\"]` or `[\"ssd\"]` or `[\"msd\",\"ssd\"]` . Default value: `[\"msd\"]`
`extra_center_factor`	The algorithm creates K centers = `num_clusters` * `extra_center_factor` as it runs and reduces the number of centers from K to `k` when finalizing the model. Optional Valid values: Either a positive integer or `auto`. Default value: `auto`
`half_life_time_size`	Used to determine the weight given to an observation when computing a cluster mean. This weight decays exponentially as more points are observed. When a point is first observed, it is assigned a weight of 1 when computing the cluster mean. The decay constant for the exponential decay function is chosen so that after observing `half_life_time_size` points, its weight is 1/2. If set to 0, there is no decay. Optional Valid values: Non-negative integer Default value: 0
`init_method`	Method by which the algorithm chooses the initial cluster centers. The standard k-means approach chooses them at random. An alternative k-means++ method chooses the first cluster center at random. Then it spreads out the position of the remaining initial clusters by weighting the selection of centers with a probability distribution that is proportional to the square of the distance of the remaining data points from existing centers. Optional Valid values: Either `random` or `kmeans++`. Default value: `random`
`local_lloyd_init_method`	The initialization method for Lloyd's expectation-maximization (EM) procedure used to build the final model containing `k` centers. Optional Valid values: Either `random` or `kmeans++`. Default value: `kmeans++`
`local_lloyd_max_iter`	The maximum number of iterations for Lloyd's expectation-maximization (EM) procedure used to build the final model containing `k` centers. Optional Valid values: Positive integer Default value: 300
`local_lloyd_num_trials`	The number of times the Lloyd's expectation-maximization (EM) procedure with the least loss is run when building the final model containing `k` centers. Optional Valid values: Either a positive integer or `auto`. Default value: `auto`
`local_lloyd_tol`	The tolerance for change in loss for early stopping of Lloyd's expectation-maximization (EM) procedure used to build the final model containing `k` centers. Optional Valid values: Float. Range in [0, 1]. Default value: 0.0001
`mini_batch_size`	The number of observations per mini-batch for the data iterator. Optional Valid values: Positive integer Default value: 5000

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

How It Works

Model Tuning