k-NN Hyperparameters - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

k-NN Hyperparameters

Parameter Name Description
feature_dim

The number of features in the input data.

Required

Valid values: positive integer.

k

The number of nearest neighbors.

Required

Valid values: positive integer

predictor_type

The type of inference to use on the data labels.

Required

Valid values: classifier for classification or regressor for regression.

sample_size

The number of data points to be sampled from the training data set.

Required

Valid values: positive integer

dimension_reduction_target

The target dimension to reduce to.

Required when you specify the dimension_reduction_type parameter.

Valid values: positive integer greater than 0 and less than feature_dim.

dimension_reduction_type

The type of dimension reduction method.

Optional

Valid values: sign for random projection or fjlt for the fast Johnson-Lindenstrauss transform.

Default value: No dimension reduction

faiss_index_ivf_nlists

The number of centroids to construct in the index when index_type is faiss.IVFFlat or faiss.IVFPQ.

Optional

Valid values: positive integer

Default value: auto, which resolves to sqrt(sample_size).

faiss_index_pq_m

The number of vector sub-components to construct in the index when index_type is set to faiss.IVFPQ.

The FaceBook AI Similarity Search (FAISS) library requires that the value of faiss_index_pq_m is a divisor of the data dimension. If faiss_index_pq_m is not a divisor of the data dimension, we increase the data dimension to smallest integer divisible by faiss_index_pq_m. If no dimension reduction is applied, the algorithm adds a padding of zeros. If dimension reduction is applied, the algorithm increase the value of the dimension_reduction_target hyper-parameter.

Optional

Valid values: One of the following positive integers: 1, 2, 3, 4, 8, 12, 16, 20, 24, 28, 32, 40, 48, 56, 64, 96

index_metric

The metric to measure the distance between points when finding nearest neighbors. When training with index_type set to faiss.IVFPQ, the INNER_PRODUCT distance and COSINE similarity are not supported.

Optional

Valid values: L2 for Euclidean-distance, INNER_PRODUCT for inner-product distance, COSINE for cosine similarity.

Default value: L2

index_type

The type of index.

Optional

Valid values: faiss.Flat, faiss.IVFFlat, faiss.IVFPQ.

Default values: faiss.Flat

mini_batch_size

The number of observations per mini-batch for the data iterator.

Optional

Valid values: positive integer

Default value: 5000