Summary

This week, we will continue our discussion of supervised learning, switching our focus to the classification task. We will again focus on \(k\)-nearest neighbors (KNN), but now for the classification task.

Learning Objectives

After completing this week, you are expected to be able to:

  • Use sklearn DummyClassifier as a baseline for comparison.
  • Calculate simple metrics to evaluate predictions from learned classifiers.
  • Differentiate between regression and classification tasks.
  • Use k-nearest neighbors to make predictions for pre-processed data.
  • Understand how conditional probabilities relate to classifications.
  • Estimate and calculate conditional probabilities.
  • Use k-nearest neighbors to estimate conditional probabilities.
  • Use sklearn features such as Pipeline, ColumnTransformer, SimpleImputer, StandardScaler, and OneHotEncoder to perform preprocessing while avoiding data leakage.

Topics

  • Probability
    • Conditional Probability
    • Bayes’ Theorem
  • Baseline Methods
    • DummyClassifier
  • K-Nearest Neighbors (KNN) Classification
    • KNeighborsClassifier
  • Supervised Learning Metrics
    • Classification
      • Accuracy
      • Misclassification
  • Preprocessing
    • Numeric Scaling
    • One-Hot Encoding
    • Dummy Variables and Encoding
    • Missing Data and Imputation
    • Data Leakage
  • sklearn API and Pipelines

Activities