Summary

This week, we will begin our discussion of supervised learning, focusing on the regression task. We will introduce one of the foundational models of machine learning: \(k\)-nearest neighbors (KNN). With KNN as an example of a model used for the regression task, we will also look at data splitting and begin discussing overfitting and generalization.

Learning Objectives

After completing this week, you are expected to be able to:

  • Use sklearn DummyRegressor as a baseline for comparison.
  • Calculate simple metrics to evaluate predictions from learned regressors.
  • Use k-nearest neighbors to make predictions.
  • Split data into train, validation, and test sets.
  • Modify a tuning parameter to control the flexibility of a model.
  • Avoid overfitting by tuning an a model through the use of a validation set.

Topics

  • Baseline Methods
    • DummyRegressor
  • Supervised Learning Metrics
    • Regression
      • Root Mean Square Error (RMSE)
      • Mean Absolute Error (MAE)
      • Mean Absolute Percentage Error (MAPE)
      • Coefficient of Determination (\(R^2\))
      • Max Error
  • K-Nearest Neighbors (KNN) Regression
    • KNeighborsRegressor
  • Generalization
    • Overfitting
    • Underfitting
    • Generalization Gap
  • Train, Test, and Validation Datasets
    • train_test_split
  • Model Tuning
    • Tuning Parameters

Activities