- Start: Monday, September 1
- End: Saturday, September 6
Summary
This week, we will begin our discussion of supervised learning, focusing on the regression task. We will introduce one of the foundational models of machine learning: \(k\)-nearest neighbors (KNN). With KNN as an example of a model used for the regression task, we will also look at data splitting and begin discussing overfitting and generalization.
Learning Objectives
After completing this week, you are expected to be able to:
- Use
sklearn
DummyRegressor
as a baseline for comparison. - Calculate simple metrics to evaluate predictions from learned regressors.
- Use k-nearest neighbors to make predictions.
- Split data into train, validation, and test sets.
- Modify a tuning parameter to control the flexibility of a model.
- Avoid overfitting by tuning an a model through the use of a validation set.
Topics
- Baseline Methods
DummyRegressor
- Supervised Learning Metrics
- Regression
- Root Mean Square Error (RMSE)
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
- Coefficient of Determination (\(R^2\))
- Max Error
- Regression
- K-Nearest Neighbors (KNN) Regression
KNeighborsRegressor
- Generalization
- Overfitting
- Underfitting
- Generalization Gap
- Train, Test, and Validation Datasets
train_test_split
- Model Tuning
- Tuning Parameters