Course Content

Weekly Objectives, Topics, and Activities

This page tracks the content for each week of the course, in reverse chronological order.

Week 15

Start: Monday, March 28
End: Friday, May 2

Summary

This week, we will introduce neural networks and deep learning using PyTorch.

Learning Objectives

Train neural networks using pytorch.
Evaluate neural networks using pytorch.

Topics

Neural Networks
Deep Learning
PyTorch

Activities

Week 14

Start: Monday, April 21
End: Friday, April 25

Summary

This week, we will prepare for Exam 03! No new content will be introduced. Instead we will review and practice for the exam.

Learning Objectives

After completing this week, you are expected to be able to:

Understand the policies and procedures for Exam 03.

Topics

No new topics this week!

Activities

Assessment: Practice Exam 03

Week 13

Start: Monday, April 14
End: Friday, April 18

Summary

This week, we will introduce generative models for classification, with an emphasis on Naive Bayes.

Learning Objectives

After completing this week, you are expected to be able to:

Understand the difference between supervised and unsupervised machine learning tasks.
Identify supervised and unsupervised machine learning tasks.
Understand and identify unsupervised learning subtasks: dimension reduction, clustering, density estimation, and outlier detection.
Use principal components analysis (PCA) for dimension reduction.
Use k-means and other methods for clustering.
Use kernel density estimation and mixture models for density estimation.
Use isolation forest for outlier detection.
Use CalibratedClassifierCV to calibrate classifiers.
Use quantile regression (via QuantileRegressor and HistGradientBoostingRegressor) to estimate conditional quantiles and make prediction intervals.

Topics

Unsupervised Learning
- Dimension Reduction
  - PCA
- Clustering
  - k-Means
  - Agglomerative Clustering
  - DBSCAN
- Density Estimation
  - Kernel Density Estimation
  - Gaussian Mixture Models
- Outlier Detection
  - Isolation Forest
Classifier Calibration
- Brier Score
- Log Loss
- CalibratedClassifierCV
  - Platt Scaling
  - Isotonic Regression
Quantile Regression
- QuantileRegressor
- HistGradientBoostingRegressor
- Pinball Loss
- Prediction Intervals

Activities

Week 12

Start: Monday, April 7
End: Friday, April 11

Summary

This week, we will introduce generative models for classification, with an emphasis on Naive Bayes.

Learning Objectives

After completing this week, you are expected to be able to:

Understand the difference between discriminative and generative models.
Use Naive Bayes models for the classification task.
Use linear discriminant analysis (LDA) for the classification task.
Use quadratic discriminant analysis (QDA) for the classification task.

Topics

Generative Models
- Naive Bayes
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)

Activities

Week 11

Start: Monday, March 31
End: Friday, April 4

Summary

This week, we will look at two extensions of decision trees: random forests and boosted models. These are both ensemble methods that combine the predictions of many trees to improve model performance.

Learning Objectives

After completing this week, you are expected to be able to:

Understand how averaging the predictions from many trees (for example using a random forest) can improve model performance.
Use a random forest to perform regression and classification.
Use boosting to perform regression and classification.

Topics

Ensemble Methods
- Random Forests
- Boosted Models

Activities

Week 10

Start: Monday, March 24
End: Friday, March 28

Summary

This week, we will prepare for Exam 02! No new content will be introduced. Instead we will review and practice for the exam.

Learning Objectives

After completing this week, you are expected to be able to:

Understand the policies and procedures for Exam 02.

Topics

No new topics this week!

Activities

Assessment: Practice Exam 02

Week 09

Start: Monday, March 17
End: Friday, March 21

Summary

It’s Spring Break! 🎉🎉🎉

Week 08

Start: Monday, March 10
End: Friday, March 14

Summary

This week, we will discuss linear regression, our first parametric model for regression. We will also introduce regularization as a method to control the complexity of (linear) models.

Learning Objectives

After completing this week, you are expected to be able to:

Differentiate between parametric and nonparametric regression.
Use sklearn to fit linear regression models and make predictions for unseen data.
Preprocess data to add polynomial and interaction terms for use in linear models.
Understand what makes linear models linear and how both linear regression and logistic regression are linear models.
Understand how the ridge and lasso constraints lead to shrunken and spare estimates.
Use ridge regression to perform regression and classification.
Use lasso to perform regression and classification.

Topics

Linear Regression
- Parametric Models
- Polynomial and Interaction Terms
Regularization
- Lasso
- Ridge

Activities

Week 07

Start: Monday, March 3
End: Friday, March 7

Summary

This week, we will discuss binary classification in depth, in particular, metrics for evaluating binary classification models. We will also take a look at a linear models for classification, logistic regression, a parametric model. We will begin to compare and contrast parametric and nonparametric methods.

Learning Objectives

After completing this week, you are expected to be able to:

Understand the definitions of false positives, false negatives, and related metrics.
Calculate metrics specific to binary classification.
Evaluate models for binary classification.
Differentiate between parametric and nonparametric regression.
Estimate conditional probabilities with logistic regression.
Use sklearn to fit logistic regression models and make predictions for unseen data.
Preprocess data to add polynomial and interaction terms for use in linear models.

Topics

Binary Classification
Binary Classification Metrics
Model Evaluation
Logistic Regression
Parametric versus Nonparametric Models

Activities

Week 06

Start: Monday, February 24
End: Friday, February 28

Summary

This week, we will introduce another nonparametric method for supervised learning: decision trees.

Learning Objectives

After completing this week, you are expected to be able to:

Understand how decision trees differ from KNN when determining similarity of data.
Find and evaluate decision tree splits for regression.
Find and evaluate decision tree splits for classification.
Use decision trees to make predictions for regression tasks using sklearn.
Use decision trees to make predictions for classification tasks using sklearn.
Use decision trees to estimate conditional probabilities for classification tasks using sklearn.
Tune the parameters of decision trees to avoid overfitting.

Topics

Regression Trees
Classification Trees

Activities

Week 05

Start: Monday, February 17
End: Friday, February 21

Summary

This week, we will prepare for Exam 01! No new content will be introduced. Instead we will review and practice for the exam.

Learning Objectives

After completing this week, you are expected to be able to:

Understand the policies and procedures for Exam 01.

Topics

No new topics this week!

Activities

Assessment: Practice Exam 01

Week 04

Start: Monday, February 10
End: Friday, February 14

Summary

This week, we will focus on model selection and related theory. We will introduce cross-validation as an important generic technique for model selection and hyperparameter tuning. We will also look at some theory related to generalization including the bias-variance tradeoff, model flexibility, and overfitting.

Learning Objectives

After completing this week, you are expected to be able to:

Understand how model flexibility relates to the bias-variance tradeoff and thus model performance.
Tune models by manipulating their flexibility through the use of a tuning parameter to find a model that generalizes well.
Avoid overfitting by selecting a model of appropriate flexibility through the use of a validation set or cross-validation.
Use GridSearchCV to tune models with cross-validation.

Topics

Generalization
- Model Flexibility
- Overfitting
- Bias-Variance Tradeoff
Cross-Validation
- \(k\)-Fold Cross-Validation
- GridSearchCV

Activities

Week 03

Start: Monday, February 3
End: Friday, February 7

Summary

This week, we will continue our discussion of supervised learning, switching our focus to the classification task. To start, we’ll review some basic probability, before using \(k\)-nearest neighbors (KNN) for classification tasks.

Learning Objectives

After completing this week, you are expected to be able to:

Use sklearn DummyClassifier as a baseline for comparison.
Calculate simple metrics to evaluate predictions from learned classifiers.
Differentiate between regression and classification tasks.
Use k-nearest neighbors to make predictions for pre-processed data.
Understand how conditional probabilities relate to classifications.
Estimate and calculate conditional probabilities.
Use k-nearest neighbors to estimate conditional probabilities.
Use sklearn features such as Pipeline, ColumnTransformer, SimpleImputer, StandardScaler, and OneHotEncoder to perform preprocessing while avoiding data leakage.

Topics

Probability
- Conditional Probability
- Bayes’ Theorem
K-Nearest Neighbors (KNN) Classification
- KNeighborsClassifier
Supervised Learning Metrics
- Classification
  - Accuracy
  - Misclassification
Preprocessing
- Numeric Scaling
- One-Hot Encoding
- Dummy Variables and Encoding
- Missing Data and Imputation
- Data Leakage
sklearn API and Pipelines

Activities

Week 02

Start: Monday, January 27
End: Friday, January 31

Summary

This week, we will begin our discussion of supervised learning, focusing on the regression task. We will introduce one of the foundational methods of machine learning: \(k\)-nearest neighbors (KNN). With KNN as an example of a model used for the regression task, we will also look at data splitting and begin discussing overfitting and generalization.

Learning Objectives

After completing this week, you are expected to be able to:

Differentiate between supervised, unsupervised, and reinforcement learning.
Identify regression and classification tasks.
Use sklearn DummyRegressor as a baseline for comparison.
Calculate simple metrics to evaluate predictions from learned regressors.
Use k-nearest neighbors to make predictions.
Split data into train, validation, and test sets.
Modify a tuning parameter to control the flexibility of a model.
Avoid overfitting by tuning an a model through the use of a validation set.

Topics

Machine Learning Paradigms and Tasks
- Supervised Learning
  - Classification
  - Regression
- Unsupervised Learning
  - Density Estimation
  - Clustering
  - Anomaly Detection
  - Dimension Reduction
- Reinforcement Learning
Baseline Methods
- DummyRegressor
Supervised Learning Metrics
- Regression
  - Root Mean Square Error (RMSE)
  - Mean Absolute Error (MAE)
  - Mean Absolute Percentage Error (MAPE)
  - Coefficient of Determination (\(R^2\))
  - Max Error
K-Nearest Neighbors (KNN) Regression
- KNeighborsRegressor
Generalization
- Overfitting
- Underfitting
- Generalization Gap
Train, Test, and Validation Datasets
- train_test_split

Activities

Remember, homework and lab are released at the start of discussion of Friday. The homework and lab shown here, are the homework and lab related to the content for the week. The homework and lab that are due this week, are the previous homework and lab. Recall that deadlines for all assessments can be found on the homepage.

Week 01

Start: Tuesday, January 21
End: Friday, January 24

Summary

Welcome to CS 307! This week, you will become familiar with the course policies and setup your machine to complete homework and labs.

Learning Objectives

After completing this week, you are expected to be able to:

Understand the syllabus of the course.
Understand the objectives of the course.
Communicate with the course staff.
Use Python, Jupyter, and VSCode to produce code for homework, labs and exams.
Use PrairieLearn to complete homework, lab models, and exams.
Use Canvas to complete lab reports.

Topics

What is Machine Learning?
Computing Setup
CS 307 Course Policies