Course Content
Weekly Objectives, Topics, and Activities
This page tracks the content for each week of the course, in reverse chronological order.
Week 10
- Start: Monday, March 31
- End: Friday, April 4
Summary
This week, we will look at two extensions of decision trees: random forests and boosted models. These are both ensemble methods that combine the predictions of many trees to improve model performance.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand how averaging the predictions from many trees (for example using a random forest) can improve model performance.
- Use a random forest to perform regression and classification.
- Use boosting to perform regression and classification.
Topics
- Ensemble Methods
- Random Forests
- Boosted Models
Activities
Homework 08 and Lab 08 coming soon!
Week 10
- Start: Monday, March 24
- End: Friday, March 28
Summary
This week, we will prepare for Exam 02! No new content will be introduced. Instead we will review and practice for the exam.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand the policies and procedures for Exam 02.
Topics
- No new topics this week!
Activities
Week 09
- Start: Monday, March 17
- End: Friday, March 21
Summary
Itโs Spring Break! ๐๐๐
Week 08
- Start: Monday, March 10
- End: Friday, March 14
Summary
This week, we will discuss linear regression, our first parametric model for regression. We will also introduce regularization as a method to control the complexity of (linear) models.
Learning Objectives
After completing this week, you are expected to be able to:
- Differentiate between parametric and nonparametric regression.
- Use
sklearn
to fit linear regression models and make predictions for unseen data. - Preprocess data to add polynomial and interaction terms for use in linear models.
- Understand what makes linear models linear and how both linear regression and logistic regression are linear models.
- Understand how the ridge and lasso constraints lead to shrunken and spare estimates.
- Use ridge regression to perform regression and classification.
- Use lasso to perform regression and classification.
Topics
- Linear Regression
- Parametric Models
- Polynomial and Interaction Terms
- Regularization
- Lasso
- Ridge
Activities
Week 07
- Start: Monday, March 3
- End: Friday, March 7
Summary
This week, we will discuss binary classification in depth, in particular, metrics for evaluating binary classification models. We will also take a look at a linear models for classification, logistic regression, a parametric model. We will begin to compare and contrast parametric and nonparametric methods.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand the definitions of false positives, false negatives, and related metrics.
- Calculate metrics specific to binary classification.
- Evaluate models for binary classification.
- Differentiate between parametric and nonparametric regression.
- Estimate conditional probabilities with logistic regression.
- Use
sklearn
to fit logistic regression models and make predictions for unseen data. - Preprocess data to add polynomial and interaction terms for use in linear models.
Topics
- Binary Classification
- Binary Classification Metrics
- Model Evaluation
- Logistic Regression
- Parametric versus Nonparametric Models
Activities
Week 06
- Start: Monday, February 24
- End: Friday, February 28
Summary
This week, we will introduce another nonparametric method for supervised learning: decision trees.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand how decision trees differ from KNN when determining similarity of data.
- Find and evaluate decision tree splits for regression.
- Find and evaluate decision tree splits for classification.
- Use decision trees to make predictions for regression tasks using
sklearn
. - Use decision trees to make predictions for classification tasks using
sklearn
. - Use decision trees to estimate conditional probabilities for classification tasks using
sklearn
. - Tune the parameters of decision trees to avoid overfitting.
Topics
- Regression Trees
- Classification Trees
Activities
Week 05
- Start: Monday, February 17
- End: Friday, February 21
Summary
This week, we will prepare for Exam 01! No new content will be introduced. Instead we will review and practice for the exam.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand the policies and procedures for Exam 01.
Topics
- No new topics this week!
Activities
Week 04
- Start: Monday, February 10
- End: Friday, February 14
Summary
This week, we will focus on model selection and related theory. We will introduce cross-validation as an important generic technique for model selection and hyperparameter tuning. We will also look at some theory related to generalization including the bias-variance tradeoff, model flexibility, and overfitting.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand how model flexibility relates to the bias-variance tradeoff and thus model performance.
- Tune models by manipulating their flexibility through the use of a tuning parameter to find a model that generalizes well.
- Avoid overfitting by selecting a model of appropriate flexibility through the use of a validation set or cross-validation.
- Use
GridSearchCV
to tune models with cross-validation.
Topics
- Generalization
- Model Flexibility
- Overfitting
- Bias-Variance Tradeoff
- Cross-Validation
- \(k\)-Fold Cross-Validation
GridSearchCV
Activities
Week 03
- Start: Monday, February 3
- End: Friday, February 7
Summary
This week, we will continue our discussion of supervised learning, switching our focus to the classification task. To start, weโll review some basic probability, before using \(k\)-nearest neighbors (KNN) for classification tasks.
Learning Objectives
After completing this week, you are expected to be able to:
- Use
sklearn
DummyClassifier
as a baseline for comparison. - Calculate simple metrics to evaluate predictions from learned classifiers.
- Differentiate between regression and classification tasks.
- Use k-nearest neighbors to make predictions for pre-processed data.
- Understand how conditional probabilities relate to classifications.
- Estimate and calculate conditional probabilities.
- Use k-nearest neighbors to estimate conditional probabilities.
- Use
sklearn
features such asPipeline
,ColumnTransformer
,SimpleImputer
,StandardScaler
, andOneHotEncoder
to perform preprocessing while avoiding data leakage.
Topics
- Probability
- Conditional Probability
- Bayesโ Theorem
- K-Nearest Neighbors (KNN) Classification
KNeighborsClassifier
- Supervised Learning Metrics
- Classification
- Accuracy
- Misclassification
- Classification
- Preprocessing
- Numeric Scaling
- One-Hot Encoding
- Dummy Variables and Encoding
- Missing Data and Imputation
- Data Leakage
sklearn
API and Pipelines
Activities
Week 02
- Start: Monday, January 27
- End: Friday, January 31
Summary
This week, we will begin our discussion of supervised learning, focusing on the regression task. We will introduce one of the foundational methods of machine learning: \(k\)-nearest neighbors (KNN). With KNN as an example of a model used for the regression task, we will also look at data splitting and begin discussing overfitting and generalization.
Learning Objectives
After completing this week, you are expected to be able to:
- Differentiate between supervised, unsupervised, and reinforcement learning.
- Identify regression and classification tasks.
- Use
sklearn
DummyRegressor
as a baseline for comparison. - Calculate simple metrics to evaluate predictions from learned regressors.
- Use k-nearest neighbors to make predictions.
- Split data into train, validation, and test sets.
- Modify a tuning parameter to control the flexibility of a model.
- Avoid overfitting by tuning an a model through the use of a validation set.
Topics
- Machine Learning Paradigms and Tasks
- Supervised Learning
- Classification
- Regression
- Unsupervised Learning
- Density Estimation
- Clustering
- Anomaly Detection
- Dimension Reduction
- Reinforcement Learning
- Supervised Learning
- Baseline Methods
DummyRegressor
- Supervised Learning Metrics
- Regression
- Root Mean Square Error (RMSE)
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
- Coefficient of Determination (\(R^2\))
- Max Error
- Regression
- K-Nearest Neighbors (KNN) Regression
KNeighborsRegressor
- Generalization
- Overfitting
- Underfitting
- Generalization Gap
- Train, Test, and Validation Datasets
train_test_split
Activities
Week 01
- Start: Tuesday, January 21
- End: Friday, January 24
Summary
Welcome to CS 307! This week, you will become familiar with the course policies and setup your machine to complete homework and labs.
Learning Objectives
After completing this week, you are expected to be able to:
Topics
- What is Machine Learning?
- Computing Setup
- CS 307 Course Policies