Course Content

CS 307 content will largely be structured around three quizzes and each quiz will correspond to three modules of content. Additionally, we will have an introductory and concluding module.

Introduction

The first group of modules (containing only one module) will serve as an introduction to CS 307.

Module 00

In this module, you will become familiar with the course and we will get your machine setup to complete homework and labs. We will then overview the fundamental machine learning tasks, introduce two very basic methods, and definite basic metrics for assessing supervised learning metrics.

Topics

  • Machine Learning Tasks
    • Supervised Learning
      • Classification
      • Regression
    • Unsupervised Learning
      • Density Estimation
      • Clustering
      • Novelty and Outlier Detection
      • Dimension Reduction
    • Reinforcement Learning
  • Baseline Methods
    • DummyClassifier
    • DummyRegressor
  • Supervised Learning Metrics
    • Regression
      • Root Mean Square Error (RMSE)
      • Mean Absolute Error (MAE)
      • Mean Absolute Percentage Error (MAPE)
      • Coefficient of Determination (\(R^2\))
      • Max Error
    • Classification
      • Accuracy
      • Misclassification

Learning Objectives

After completing this module, you are expected to be able to:

  • Understand the syllabus of the course.
  • Understand the objectives of the course.
  • Communicate with the course staff.
  • Use Python, Jupyter, and VSCode to produce code for labs, quizzes, and MPs.
  • Use PrairieLearn to complete homework, lab models, and MPs.
  • Use Canvas to complete lab reports.
  • Differentiate between supervised, unsupervised, and reinforcement learning.
  • Identify regression and classification tasks.
  • Use sklearn baseline models DummyClassifier and DummyRegressor.
  • Calculate metrics to evaluate predictions from regression and classification methods.

Slides, Scribbles, and Readings

Quiz 01

Quiz 01 will focus on two foundational nonparametric supervised learning methods: k-nearest neighbors (KNN) and decision trees. Both methods can be used for both classification and regression tasks. While discussing these methods, we will introduce the notion of generalization, and tools for model selection such as cross-validation.

Module 01

In Module 01 we will begin discussing supervised learning, both the regression and classification tasks. We will look at one of the foundational methods of machine learning: k-nearest neighbors. We will also introduce data splitting and overfitting.

Topics

  • K-Nearest Neighbors (KNN) Regression
    • KNeighborsRegressor
  • K-Nearest Neighbors (KNN) Classification
    • KNeighborsClassifier
  • Overfitting
  • Train, Test, and Validation Datasets
    • train_test_split
  • Object-Oriented Programming (OOP) in Python

Learning Objectives

  • Differentiate between regression and classification tasks.
  • Use k-nearest neighbors to make predictions for pre-processed data.
  • Understand how conditional probabilities relate to classifications.
  • Estimate and calculate conditional probabilities.
  • Use k-nearest neighbors to estimate conditional probabilities.
  • Split data into train, validation, and test sets.
  • Modify a tuning parameter to control the flexibility of a model.
  • Avoid overfitting by selecting an a model through the use of a validation set.

Slides, Scribbles, and Readings

Module 02

In Module 02 we will look at the bigger picture and focus on selecting tuning parameters and preprocessing data, especially heterogenous data stored in Pandas DataFrames objects. We’ll also discuss overfitting, generalization, and a concept related both: the bias-variance tradeoff.

Topics

  • Bias-Variance Tradeoff
  • Generalization
  • Cross-Validation
  • Preprocessing
  • sklearn API and Pipelines

Learning Objectives

  • Understand how model flexibility relates to the bias-variance tradeoff and thus model performance.
  • Tune models by manipulating their flexibility through the use of a tuning parameter to find a model that generalizes well.
  • Avoid overfitting by selecting a model of appropriate flexibility through the use of a validation set or cross-validation.
  • Use sklearn features such as Pipeline, ColumnTransformer, SimpleImputer, StandardScaler, OneHotEncoder and others to perform reproducible preprocessing.
  • Use GridSearchCV to tune models (select appropriate values of tuning parameters) with cross-validation.

Slides, Scribbles, and Readings

Module 03

In Module 03 we will introduce another nonparametric method for supervised learning: decision trees.

Topics

  • Regression Trees
  • Classification Trees

Learning Objectives

  • Understand how decision trees differ from KNN when determining closeness of data.
  • Find and evaluate decision tree splits for regression.
  • Find and evaluate decision tree splits for classification.
  • Use decision trees to make predictions for regression tasks using sklearn.
  • Use decision trees to make predictions for classification tasks using sklearn.
  • Use decision trees to estimate conditional probabilities for classification tasks using sklearn.
  • Tune the parameters of decision trees to avoid overfitting.

Slides, Scribbles, and Readings

Quiz 02

Quiz 02 will introduce linear methods for classification and regression, which will also present an opportunity to differentiate parametric and nonparametric methods. Then, we’ll modify existing methods that we have seen through the use of regularization and ensembles. Lastly, we’ll spend some time thinking about the specifics of evaluating binary classification, and some other miscellaneous practical concerns.

Module 04

In Module 04 we will introduce linear models for classification and regression, both parametric methods. We will begin to compare and contrast parametric and nonparametric methods.

Topics

  • Linear Models
    • Linear Regression
    • Logistic Regression
  • Parametric versus Nonparametric Models

Learning Objectives

  • Differentiate between parametric and nonparametric regression.
  • Use sklearn to fit linear regression models and make predictions for unseen data.
  • Estimate conditional probabilities with logistic regression.
  • Use sklearn to fit logistic regression models and make predictions for unseen data..
  • Preprocess data to add polynomial and interaction terms for use in linear models.
  • Understand what makes linear models linear and how both linear regression and logistic regression are linear model.

Slides, Scribbles, and Readings

Module 05

In Module 05 we will modify previously seen methods to potentially improve their performance. We’ll look at ensembles of trees and add regularization to regression.

Topics

  • Ensemble Methods
    • Random Forests
    • Boosted Models
  • Regularization
    • Lasso
    • Ridge

Learning Objectives

  • Understand how the ridge and lasso constraints lead to shrunken and spare estimates.
  • Use ridge regression to perform regression and classification.
  • Use lasso to perform regression and classification.
  • Understand how averaging the predictions from many trees (for example a random forest) can improve model performance.
  • Use a random forest to perform regression and classification.
  • Use boosting to perform regression and classification.

Slides, Scribbles, and Readings

Module 06

Coming soon!

Topics

  • Binary Classification
  • Model Evaluation
  • Practical Considerations

Learning Objectives

Coming soon!

Slides, Scribbles, and Readings

Coming soon!

Quiz 03

Coming soon!

Module 07

Coming soon!

Topics

  • Generative Models

Learning Objectives

Coming soon!

Slides, Scribbles, and Readings

Coming soon!

Module 08

  • Unsupervised Learning

Topics

Coming soon!

Learning Objectives

Coming soon!

Slides, Scribbles, and Readings

Coming soon!

Module 09

Coming soon!

Topics

  • Neural Networks

Learning Objectives

Coming soon!

Slides, Scribbles, and Readings

Coming soon!

Conclusion

Coming soon!

Module 10

Coming soon!

Topics

  • Deep Learning

Learning Objectives

Coming soon!

Slides, Scribbles, and Readings

Coming soon!