Course Content

Weekly Objectives, Topics, and Activities

This page tracks the content for each week of the course. Jump to current week!

Week 01

  • Start: Monday, August 25
  • End: Saturday, August 30

Summary

Welcome to CS 307! This week, you will become familiar with the course policies and setup your machine to complete homework and labs.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand the syllabus of the course.
  • Understand the objectives of the course.
  • Communicate with the course staff.
  • Use Python, Jupyter, and VSCode to produce code for homework, labs and exams.
  • Use PrairieLearn to complete homework, lab models, and exams.
  • Differentiate between supervised, unsupervised, and reinforcement learning.
  • Identify regression and classification tasks.

Topics

  • What is Machine Learning?
  • Computing Setup
  • CS 307 Course Policies
  • Machine Learning Paradigms and Tasks
    • Supervised Learning
      • Classification
      • Regression
    • Unsupervised Learning
      • Density Estimation
      • Clustering
      • Anomaly Detection
      • Dimension Reduction
    • Reinforcement Learning

Activities

Week 02

  • Start: Monday, September 1
  • End: Saturday, September 6

Summary

This week, we will begin our discussion of supervised learning, focusing on the regression task. We will introduce one of the foundational models of machine learning: \(k\)-nearest neighbors (KNN). With KNN as an example of a model used for the regression task, we will also look at data splitting and begin discussing overfitting and generalization.

Learning Objectives

After completing this week, you are expected to be able to:

  • Use sklearn DummyRegressor as a baseline for comparison.
  • Calculate simple metrics to evaluate predictions from learned regressors.
  • Use k-nearest neighbors to make predictions.
  • Split data into train, validation, and test sets.
  • Modify a tuning parameter to control the flexibility of a model.
  • Avoid overfitting by tuning an a model through the use of a validation set.

Topics

  • Baseline Methods
    • DummyRegressor
  • Supervised Learning Metrics
    • Regression
      • Root Mean Square Error (RMSE)
      • Mean Absolute Error (MAE)
      • Mean Absolute Percentage Error (MAPE)
      • Coefficient of Determination (\(R^2\))
      • Max Error
  • K-Nearest Neighbors (KNN) Regression
    • KNeighborsRegressor
  • Generalization
    • Overfitting
    • Underfitting
    • Generalization Gap
  • Train, Test, and Validation Datasets
    • train_test_split
  • Model Tuning
    • Tuning Parameters

Activities

Week 03

  • Start: Monday, September 8
  • End: Saturday, September 13

Summary

This week, we will continue our discussion of supervised learning, switching our focus to the classification task. We will again focus on \(k\)-nearest neighbors (KNN), but now for the classification task.

Learning Objectives

After completing this week, you are expected to be able to:

  • Use sklearn DummyClassifier as a baseline for comparison.
  • Calculate simple metrics to evaluate predictions from learned classifiers.
  • Differentiate between regression and classification tasks.
  • Use k-nearest neighbors to make predictions for pre-processed data.
  • Understand how conditional probabilities relate to classifications.
  • Estimate and calculate conditional probabilities.
  • Use k-nearest neighbors to estimate conditional probabilities.
  • Use sklearn features such as Pipeline, ColumnTransformer, SimpleImputer, StandardScaler, and OneHotEncoder to perform preprocessing while avoiding data leakage.

Topics

  • Probability
    • Conditional Probability
    • Bayes’ Theorem
  • Baseline Methods
    • DummyClassifier
  • K-Nearest Neighbors (KNN) Classification
    • KNeighborsClassifier
  • Supervised Learning Metrics
    • Classification
      • Accuracy
      • Misclassification
  • Preprocessing
    • Numeric Scaling
    • One-Hot Encoding
    • Dummy Variables and Encoding
    • Missing Data and Imputation
    • Data Leakage
  • sklearn API and Pipelines

Activities

Week 04

  • Start: Monday, September 15
  • End: Saturday, September 20

Summary

This week, we will focus on model selection and related theory. We will introduce cross-validation as an important and generic technique for model selection and hyperparameter tuning. We will also look at some theory related to generalization including the bias-variance tradeoff, model flexibility, and overfitting.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand how model flexibility relates to the bias-variance tradeoff and thus model performance.
  • Tune models by manipulating their flexibility through the use of a tuning parameter to find a model that generalizes well.
  • Avoid overfitting by selecting a model of appropriate flexibility through the use of a validation set or cross-validation.
  • Use GridSearchCV to tune models or pipelines with cross-validation.

Topics

  • Generalization
    • Model Flexibility
    • Overfitting
    • Bias-Variance Tradeoff
  • Cross-Validation
    • \(k\)-Fold Cross-Validation
    • GridSearchCV

Activities

Week 05

Week 06

Week 07

Week 08

Week 09

Week 10

Week 11

Week 12

Week 13

Week 14

Week 15

Week 16

Week 17