Course Content

Weekly Objectives, Topics, and Activities

This page tracks the content for each week of the course, in reverse chronological order.

Week 07

  • Start: Monday, March 3
  • End: Friday, March 7

Summary

This week, we will discuss binary classification in depth, in particular, metrics for evaluating binary classification models. We will also take a look at a linear models for classification, logistic regression, a parametric model. We will begin to compare and contrast parametric and nonparametric methods.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand the definitions of false positives, false negatives, and related metrics.
  • Calculate metrics specific to binary classification.
  • Evaluate models for binary classification.
  • Differentiate between parametric and nonparametric regression.
  • Estimate conditional probabilities with logistic regression.
  • Use sklearn to fit logistic regression models and make predictions for unseen data.
  • Preprocess data to add polynomial and interaction terms for use in linear models.

Topics

  • Binary Classification
  • Binary Classification Metrics
  • Model Evaluation
  • Logistic Regression
  • Parametric versus Nonparametric Models

Activities

Homework 06 and Lab 06 coming soon!

Week 06

  • Start: Monday, February 24
  • End: Friday, February 28

Summary

This week, we will introduce another nonparametric method for supervised learning: decision trees.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand how decision trees differ from KNN when determining similarity of data.
  • Find and evaluate decision tree splits for regression.
  • Find and evaluate decision tree splits for classification.
  • Use decision trees to make predictions for regression tasks using sklearn.
  • Use decision trees to make predictions for classification tasks using sklearn.
  • Use decision trees to estimate conditional probabilities for classification tasks using sklearn.
  • Tune the parameters of decision trees to avoid overfitting.

Topics

  • Regression Trees
  • Classification Trees

Activities

Week 05

  • Start: Monday, February 17
  • End: Friday, February 21

Summary

This week, we will prepare for Exam 01! No new content will be introduced. Instead we will review and practice for the exam.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand the policies and procedures for Exam 01.

Topics

  • No new topics this week!

Activities

Week 04

  • Start: Monday, February 10
  • End: Friday, February 14

Summary

This week, we will focus on model selection and related theory. We will introduce cross-validation as an important generic technique for model selection and hyperparameter tuning. We will also look at some theory related to generalization including the bias-variance tradeoff, model flexibility, and overfitting.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand how model flexibility relates to the bias-variance tradeoff and thus model performance.
  • Tune models by manipulating their flexibility through the use of a tuning parameter to find a model that generalizes well.
  • Avoid overfitting by selecting a model of appropriate flexibility through the use of a validation set or cross-validation.
  • Use GridSearchCV to tune models with cross-validation.

Topics

  • Generalization
    • Model Flexibility
    • Overfitting
    • Bias-Variance Tradeoff
  • Cross-Validation
    • \(k\)-Fold Cross-Validation
    • GridSearchCV

Activities

Week 03

  • Start: Monday, February 3
  • End: Friday, February 7

Summary

This week, we will continue our discussion of supervised learning, switching our focus to the classification task. To start, we’ll review some basic probability, before using \(k\)-nearest neighbors (KNN) for classification tasks.

Learning Objectives

After completing this week, you are expected to be able to:

  • Use sklearn DummyClassifier as a baseline for comparison.
  • Calculate simple metrics to evaluate predictions from learned classifiers.
  • Differentiate between regression and classification tasks.
  • Use k-nearest neighbors to make predictions for pre-processed data.
  • Understand how conditional probabilities relate to classifications.
  • Estimate and calculate conditional probabilities.
  • Use k-nearest neighbors to estimate conditional probabilities.
  • Use sklearn features such as Pipeline, ColumnTransformer, SimpleImputer, StandardScaler, and OneHotEncoder to perform preprocessing while avoiding data leakage.

Topics

  • Probability
    • Conditional Probability
    • Bayes’ Theorem
  • K-Nearest Neighbors (KNN) Classification
    • KNeighborsClassifier
  • Supervised Learning Metrics
    • Classification
      • Accuracy
      • Misclassification
  • Preprocessing
    • Numeric Scaling
    • One-Hot Encoding
    • Dummy Variables and Encoding
    • Missing Data and Imputation
    • Data Leakage
  • sklearn API and Pipelines

Activities

Week 02

  • Start: Monday, January 27
  • End: Friday, January 31

Summary

This week, we will begin our discussion of supervised learning, focusing on the regression task. We will introduce one of the foundational methods of machine learning: \(k\)-nearest neighbors (KNN). With KNN as an example of a model used for the regression task, we will also look at data splitting and begin discussing overfitting and generalization.

Learning Objectives

After completing this week, you are expected to be able to:

  • Differentiate between supervised, unsupervised, and reinforcement learning.
  • Identify regression and classification tasks.
  • Use sklearn DummyRegressor as a baseline for comparison.
  • Calculate simple metrics to evaluate predictions from learned regressors.
  • Use k-nearest neighbors to make predictions.
  • Split data into train, validation, and test sets.
  • Modify a tuning parameter to control the flexibility of a model.
  • Avoid overfitting by tuning an a model through the use of a validation set.

Topics

  • Machine Learning Paradigms and Tasks
    • Supervised Learning
      • Classification
      • Regression
    • Unsupervised Learning
      • Density Estimation
      • Clustering
      • Anomaly Detection
      • Dimension Reduction
    • Reinforcement Learning
  • Baseline Methods
    • DummyRegressor
  • Supervised Learning Metrics
    • Regression
      • Root Mean Square Error (RMSE)
      • Mean Absolute Error (MAE)
      • Mean Absolute Percentage Error (MAPE)
      • Coefficient of Determination (\(R^2\))
      • Max Error
  • K-Nearest Neighbors (KNN) Regression
    • KNeighborsRegressor
  • Generalization
    • Overfitting
    • Underfitting
    • Generalization Gap
  • Train, Test, and Validation Datasets
    • train_test_split

Activities

Remember, homework and lab are released at the start of discussion of Friday. The homework and lab shown here, are the homework and lab related to the content for the week. The homework and lab that are due this week, are the previous homework and lab. Recall that deadlines for all assessments can be found on the homepage.

Week 01

  • Start: Tuesday, January 21
  • End: Friday, January 24

Summary

Welcome to CS 307! This week, you will become familiar with the course policies and setup your machine to complete homework and labs.

Learning Objectives

After completing this week, you are expected to be able to:

  • Understand the syllabus of the course.
  • Understand the objectives of the course.
  • Communicate with the course staff.
  • Use Python, Jupyter, and VSCode to produce code for homework, labs and exams.
  • Use PrairieLearn to complete homework, lab models, and exams.
  • Use Canvas to complete lab reports.

Topics

  • What is Machine Learning?
  • Computing Setup
  • CS 307 Course Policies

Activities