Unsupervised Built-in SageMaker Algorithms - Amazon SageMaker
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Unsupervised Built-in SageMaker Algorithms

Amazon SageMaker provides several built-in algorithms that can be used for a variety of unsupervised learning tasks such as clustering, dimension reduction, pattern recognition, and anomaly detection.

  • IP Insights—learns the usage patterns for IPv4 addresses. It is designed to capture associations between IPv4 addresses and various entities, such as user IDs or account numbers.

  • K-Means Algorithm—finds discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups.

  • Principal Component Analysis (PCA) Algorithm—reduces the dimensionality (number of features) within a dataset by projecting data points onto the first few principal components. The objective is to retain as much information or variation as possible. For mathematicians, principal components are eigenvectors of the data's covariance matrix.

  • Random Cut Forest (RCF) Algorithm—detects anomalous data points within a data set that diverge from otherwise well-structured or patterned data.

Algorithm name Channel name Training input mode File type Instance class Parallelizable
IP Insights train and (optionally) validation File CSV CPU or GPU Yes
K-Means train and (optionally) test File or Pipe recordIO-protobuf or CSV CPU or GPUCommon (single GPU device on one or more instances) No
PCA train and (optionally) test File or Pipe recordIO-protobuf or CSV GPU or CPU Yes
Random Cut Forest train and (optionally) test File or Pipe recordIO-protobuf or CSV CPU Yes