week-02 – CS 307

Start: Monday, September 1
End: Saturday, September 6

Summary

This week, we will begin our discussion of supervised learning, focusing on the regression task. We will introduce one of the foundational models of machine learning: \(k\)-nearest neighbors (KNN). With KNN as an example of a model used for the regression task, we will also look at data splitting and begin discussing overfitting and generalization.

Learning Objectives

After completing this week, you are expected to be able to:

Use sklearn DummyRegressor as a baseline for comparison.
Calculate simple metrics to evaluate predictions from learned regressors.
Use k-nearest neighbors to make predictions.
Split data into train, validation, and test sets.
Modify a tuning parameter to control the flexibility of a model.
Avoid overfitting by tuning an a model through the use of a validation set.

Topics

Baseline Methods
- DummyRegressor
Supervised Learning Metrics
- Regression
  - Root Mean Square Error (RMSE)
  - Mean Absolute Error (MAE)
  - Mean Absolute Percentage Error (MAPE)
  - Coefficient of Determination (\(R^2\))
  - Max Error
K-Nearest Neighbors (KNN) Regression
- KNeighborsRegressor
Generalization
- Overfitting
- Underfitting
- Generalization Gap
Train, Test, and Validation Datasets
- train_test_split
Model Tuning
- Tuning Parameters

Summary

Learning Objectives

Topics

Activities