Content

Weekly Objectives, Topics, and Activities

Week 01

  • Start: Monday, August 26
  • End: Friday, August 30

Summary

Welcome to CS 307! This week, you will become familiar with the course policies and we will get your machine setup to complete homework and labs.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand the syllabus of the course.
  • Understand the objectives of the course.
  • Communicate with the course staff.
  • Use Python, Jupyter, and VSCode to produce code for homework, labs and quizzes.
  • Use PrairieLearn to complete homework, lab models, and quizzes.
  • Use Canvas to complete lab reports.

Topics

  • Big Questions in ML
  • Compute Setup
  • CS 307 Course Policy
    • Lab Policy
    • Homework Policy

Activities

This section will list the course activities for each week. Said another way, this section will list the things you need to do! Note that the assessments listed are not due during this week, but are related to the content of the week. Deadlines can be found on PrairieLearn (homework and lab models), PrairieTest (quizzes), and Canvas (lab reports). Alternatively, the home page will always display both upcoming and future deadlines.

Remember, Homework 00 and Lab 00 are not part of your grade, but instead important risk-free practice.

Week 02

  • Start: Tuesday, September 3
  • End: Friday, September 26

Summary

In Week 02 we will begin our discussion of supervised learning, both the regression and classification tasks. We’ll start by introducing two very basic methods, and definite basic metrics for assessing supervised learning metrics. Then we will look at one of the foundational methods of machine learning: k-nearest neighbors. We will also introduce data splitting and overfitting.

Learning Objectives

After completing this week, you are expected to be able to:

  • Differentiate between supervised, unsupervised, and reinforcement learning.
  • Identify regression and classification tasks.
  • Use sklearn baseline models DummyClassifier and DummyRegressor.
  • Calculate simple metrics to evaluate predictions from regression and classification methods.
  • Differentiate between regression and classification tasks.
  • Use k-nearest neighbors to make predictions for pre-processed data.
  • Understand how conditional probabilities relate to classifications.
  • Estimate and calculate conditional probabilities.
  • Use k-nearest neighbors to estimate conditional probabilities.
  • Split data into train, validation, and test sets.
  • Modify a tuning parameter to control the flexibility of a model.
  • Avoid overfitting by selecting an a model through the use of a validation set.

Topics

  • Machine Learning Tasks
    • Supervised Learning
      • Classification
      • Regression
    • Unsupervised Learning
      • Density Estimation
      • Clustering
      • Novelty and Outlier Detection
      • Dimension Reduction
    • Reinforcement Learning
  • Baseline Methods
    • DummyClassifier
    • DummyRegressor
  • Supervised Learning Metrics
    • Regression
      • Root Mean Square Error (RMSE)
      • Mean Absolute Error (MAE)
      • Mean Absolute Percentage Error (MAPE)
      • Coefficient of Determination (\(R^2\))
      • Max Error
    • Classification
      • Accuracy
      • Misclassification
  • K-Nearest Neighbors (KNN) Regression
    • KNeighborsRegressor
  • K-Nearest Neighbors (KNN) Classification
    • KNeighborsClassifier
  • Overfitting
  • Train, Test, and Validation Datasets
    • train_test_split

Activities

Week 03

  • Start: Monday, September 9
  • End: Friday, September 13

Summary

In Week 03 we will look at the bigger picture and focus on selecting tuning parameters and preprocessing data, especially heterogenous data stored in Pandas DataFrames objects. We’ll also discuss overfitting, generalization, and a concept related both: the bias-variance tradeoff.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand how model flexibility relates to the bias-variance tradeoff and thus model performance.
  • Tune models by manipulating their flexibility through the use of a tuning parameter to find a model that generalizes well.
  • Avoid overfitting by selecting a model of appropriate flexibility through the use of a validation set or cross-validation.
  • Use sklearn features such as Pipeline, ColumnTransformer, SimpleImputer, StandardScaler, OneHotEncoder and others to perform reproducible preprocessing.
  • Use GridSearchCV to tune models (select appropriate values of tuning parameters) with cross-validation.

Topics

  • Bias-Variance Tradeoff
  • Generalization
  • Cross-Validation
  • Preprocessing
  • sklearn API and Pipelines

Activities

More activities coming soon! Notes will be released on Tuesday. Videos will be released on Wednesday.

Week 04

  • Start: Monday, September 16
  • End: Friday, September 20

Summary

In Week 04 we will introduce another nonparametric method for supervised learning: decision trees.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand how decision trees differ from KNN when determining similarity of data.
  • Find and evaluate decision tree splits for regression.
  • Find and evaluate decision tree splits for classification.
  • Use decision trees to make predictions for regression tasks using sklearn.
  • Use decision trees to make predictions for classification tasks using sklearn.
  • Use decision trees to estimate conditional probabilities for classification tasks using sklearn.
  • Tune the parameters of decision trees to avoid overfitting.

Topics

  • Regression Trees
  • Classification Trees

Activities

Week 05

  • Start: Monday, September 23
  • End: Friday, September 27

Summary

In Week 05 we will prepare for Quiz 01!

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand the policies and procedures for Quiz 01.

Topics

  • No new topics this week!

Activities

Week 06

  • Start: Monday, September 30
  • End: Friday, October 4

Summary

In Week 06 you will take Quiz 01!

Topics

  • No new topics this week!

Activities

Week 07

  • Start: Monday, October 7
  • End: Friday, October 11

Summary

In Week 07 we will introduce linear models for classification and regression, both parametric methods. We will begin to compare and contrast parametric and nonparametric methods.

Learning Objectives

After completing this week, you are expected to be able to:

  • Differentiate between parametric and nonparametric regression.
  • Use sklearn to fit linear regression models and make predictions for unseen data.
  • Estimate conditional probabilities with logistic regression.
  • Use sklearn to fit logistic regression models and make predictions for unseen data..
  • Preprocess data to add polynomial and interaction terms for use in linear models.
  • Understand what makes linear models linear and how both linear regression and logistic regression are linear models.

Topics

  • Linear Models
    • Linear Regression
    • Logistic Regression
  • Parametric versus Nonparametric Models

Activities

Week 08

  • Start: Monday, October 14
  • End: Friday, October 18

Summary

In Week 08 we will modify previously seen methods to potentially improve their performance. We’ll look at ensembles of trees and add regularization to regression.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand how the ridge and lasso constraints lead to shrunken and spare estimates.
  • Use ridge regression to perform regression and classification.
  • Use lasso to perform regression and classification.
  • Understand how averaging the predictions from many trees (for example using a random forest) can improve model performance.
  • Use a random forest to perform regression and classification.
  • Use boosting to perform regression and classification.

Topics

  • Ensemble Methods
    • Random Forests
    • Boosted Models
  • Regularization
    • Lasso
    • Ridge

Activities

Week 09

  • Start: Monday, October 21
  • End: Friday, October 25

Summary

In Week 08 we will discuss binary classification in depth, in particular, metrics for evaluating binary classification models.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand the definitions of false positives, false negatives, and related metrics.
  • Calculate metrics specific to binary classification.
  • Evaluate models for binary classification.

Topics

  • Binary Classification
  • Binary Classification Metrics
  • Model Evaluation

Activities

Week 10

  • Start: Monday, October 28
  • End: Friday, November 1

Summary

In Week 10 we will prepare for Quiz 02!

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand the policies and procedures for Quiz 02.

Topics

  • No new topics this week!

Activities

Week 11

  • Start: Monday, November 4
  • End: Friday, November 8

Summary

In Week 11 you will take Quiz 02!

Topics

  • No new topics this week!

Week 14

  • Start: Monday, November 25
  • End: Friday, November 29

Summary

In Week 14 you will do nothing! It is Fall Break!

Topics

  • No new topics this week!

Week 15

  • Start: Monday, December 2
  • End: Friday, December 6

Summary

In Week 15 you will first take Quiz 03. Then we will introduce generative models for classification, with an emphasis on Naive Bayes. Lastly, we will overview unsupervised learning. We will look at a variety of methods for the various related subtasks: dimension reduction, clustering, density estimation, and outlier detection.

Learning Objectives

  • Understand the difference between discriminative and generative models.
  • Use Naive Bayes models for the classification task.
  • Understand the difference between supervised and unsupervised machine learning tasks.
  • Identify supervised and unsupervised machine learning tasks.
  • Understand and identify unsupervised learning subtasks: dimension reduction, clustering, density estimation, and outlier detection.
  • Use principal components analysis (PCA) for dimension reduction.
  • Use k-means and other methods for clustering.
  • Use kernel density estimation and mixture models for density estimation.
  • Use one-class SVM and isolation forest for outlier detection.

Topics

  • Generative Models
    • Naive Bayes
    • Linear Discriminant Analysis (LDA)
    • Quadratic Discriminant Analysis (QDA)
  • Unsupervised Learning
    • Dimension Reduction
      • PCA
    • Clustering
      • k-Means
      • Agglomerative Clustering
      • DBSCAN
    • Density Estimation
      • Kernel Density Estimation
      • Gaussian Mixture Models
    • Outlier Detection
      • Isolation Forest

Activities

Week 16

  • Start: Monday, December 9
  • End: Friday, December 11

Summary

In Week 16 we will introduce neural networks and deep learning using PyTorch.

Learning Objectives

  • Train neural networks using pytorch.
  • Evaluate neural networks using pytorch.

Topics

  • Neural Networks
  • Deep Learning
  • PyTorch

Activities

Additional Reading and Resources