Skip to content

Built-in Algorithms

Overview

SageMaker provides optimized built-in algorithms for common ML tasks.

Supervised Learning

XGBoost

Gradient boosting for classification and regression.

Parameter Description
num_round Number of boosting rounds
max_depth Maximum tree depth
eta Learning rate
objective Loss function
from sagemaker import image_uris

xgb_image = image_uris.retrieve("xgboost", region, version="1.5-1")

xgb = Estimator(
    image_uri=xgb_image,
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    hyperparameters={
        "objective": "binary:logistic",
        "num_round": 100,
        "max_depth": 5
    }
)

Linear Learner

Linear models for classification and regression.

  • Supports L1/L2 regularization
  • Automatic model tuning
  • Built-in normalization

K-Nearest Neighbors (KNN)

Classification and regression based on similarity.

  • Supports different distance metrics
  • Index-based for fast inference
  • Good for recommendation systems

Unsupervised Learning

K-Means

Clustering algorithm.

Parameter Description
k Number of clusters
init_method Initialization (random, kmeans++)

Principal Component Analysis (PCA)

Dimensionality reduction.

  • Regular mode: covariance matrix
  • Randomized mode: for large datasets

Random Cut Forest

Anomaly detection.

  • Unsupervised anomaly scoring
  • Real-time and batch inference

Computer Vision

Image Classification

CNN-based image classification.

  • Transfer learning with pretrained models
  • Multi-GPU training support

Object Detection

Detect objects and bounding boxes.

  • SSD (Single Shot Detector)
  • Faster R-CNN

Semantic Segmentation

Pixel-level classification.

  • FCN (Fully Convolutional Network)
  • PSP (Pyramid Scene Parsing)

NLP

BlazingText

Word embeddings and text classification.

Mode Use Case
Word2Vec Word embeddings
Text Classification Supervised classification

Sequence-to-Sequence

Encoder-decoder for translation, summarization.

Algorithm Selection Guide

graph TD
    A[Problem Type] --> B{Supervised?}
    B -->|Yes| C{Target Type}
    B -->|No| D{Goal}
    C -->|Continuous| E[XGBoost, Linear Learner]
    C -->|Categorical| F[XGBoost, Linear Learner, KNN]
    D -->|Clustering| G[K-Means]
    D -->|Anomaly| H[Random Cut Forest]
    D -->|Dimensionality| I[PCA]

Exam Tips

!!! warning "Algorithm Selection" - XGBoost: Most versatile, start here for tabular data - Linear Learner: Sparse data, need interpretability - Random Cut Forest: Streaming anomaly detection - BlazingText: Fast text classification