Course Content
Weekly Objectives, Topics, and Activities
This page tracks the content for each week of the course, in reverse chronological order.
Week 07
- Start: Monday, March 3
- End: Friday, March 7
Summary
This week, we will discuss binary classification in depth, in particular, metrics for evaluating binary classification models. We will also take a look at a linear models for classification, logistic regression, a parametric model. We will begin to compare and contrast parametric and nonparametric methods.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand the definitions of false positives, false negatives, and related metrics.
- Calculate metrics specific to binary classification.
- Evaluate models for binary classification.
- Differentiate between parametric and nonparametric regression.
- Estimate conditional probabilities with logistic regression.
- Use
sklearn
to fit logistic regression models and make predictions for unseen data. - Preprocess data to add polynomial and interaction terms for use in linear models.
Topics
- Binary Classification
- Binary Classification Metrics
- Model Evaluation
- Logistic Regression
- Parametric versus Nonparametric Models
Activities
Homework 06 and Lab 06 coming soon!
Week 06
- Start: Monday, February 24
- End: Friday, February 28
Summary
This week, we will introduce another nonparametric method for supervised learning: decision trees.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand how decision trees differ from KNN when determining similarity of data.
- Find and evaluate decision tree splits for regression.
- Find and evaluate decision tree splits for classification.
- Use decision trees to make predictions for regression tasks using
sklearn
. - Use decision trees to make predictions for classification tasks using
sklearn
. - Use decision trees to estimate conditional probabilities for classification tasks using
sklearn
. - Tune the parameters of decision trees to avoid overfitting.
Topics
- Regression Trees
- Classification Trees
Activities
Week 05
- Start: Monday, February 17
- End: Friday, February 21
Summary
This week, we will prepare for Exam 01! No new content will be introduced. Instead we will review and practice for the exam.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand the policies and procedures for Exam 01.
Topics
- No new topics this week!
Activities
Week 04
- Start: Monday, February 10
- End: Friday, February 14
Summary
This week, we will focus on model selection and related theory. We will introduce cross-validation as an important generic technique for model selection and hyperparameter tuning. We will also look at some theory related to generalization including the bias-variance tradeoff, model flexibility, and overfitting.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand how model flexibility relates to the bias-variance tradeoff and thus model performance.
- Tune models by manipulating their flexibility through the use of a tuning parameter to find a model that generalizes well.
- Avoid overfitting by selecting a model of appropriate flexibility through the use of a validation set or cross-validation.
- Use
GridSearchCV
to tune models with cross-validation.
Topics
- Generalization
- Model Flexibility
- Overfitting
- Bias-Variance Tradeoff
- Cross-Validation
- \(k\)-Fold Cross-Validation
GridSearchCV
Activities
Week 03
- Start: Monday, February 3
- End: Friday, February 7
Summary
This week, we will continue our discussion of supervised learning, switching our focus to the classification task. To start, we’ll review some basic probability, before using \(k\)-nearest neighbors (KNN) for classification tasks.
Learning Objectives
After completing this week, you are expected to be able to:
- Use
sklearn
DummyClassifier
as a baseline for comparison. - Calculate simple metrics to evaluate predictions from learned classifiers.
- Differentiate between regression and classification tasks.
- Use k-nearest neighbors to make predictions for pre-processed data.
- Understand how conditional probabilities relate to classifications.
- Estimate and calculate conditional probabilities.
- Use k-nearest neighbors to estimate conditional probabilities.
- Use
sklearn
features such asPipeline
,ColumnTransformer
,SimpleImputer
,StandardScaler
, andOneHotEncoder
to perform preprocessing while avoiding data leakage.
Topics
- Probability
- Conditional Probability
- Bayes’ Theorem
- K-Nearest Neighbors (KNN) Classification
KNeighborsClassifier
- Supervised Learning Metrics
- Classification
- Accuracy
- Misclassification
- Classification
- Preprocessing
- Numeric Scaling
- One-Hot Encoding
- Dummy Variables and Encoding
- Missing Data and Imputation
- Data Leakage
sklearn
API and Pipelines
Activities
Week 02
- Start: Monday, January 27
- End: Friday, January 31
Summary
This week, we will begin our discussion of supervised learning, focusing on the regression task. We will introduce one of the foundational methods of machine learning: \(k\)-nearest neighbors (KNN). With KNN as an example of a model used for the regression task, we will also look at data splitting and begin discussing overfitting and generalization.
Learning Objectives
After completing this week, you are expected to be able to:
- Differentiate between supervised, unsupervised, and reinforcement learning.
- Identify regression and classification tasks.
- Use
sklearn
DummyRegressor
as a baseline for comparison. - Calculate simple metrics to evaluate predictions from learned regressors.
- Use k-nearest neighbors to make predictions.
- Split data into train, validation, and test sets.
- Modify a tuning parameter to control the flexibility of a model.
- Avoid overfitting by tuning an a model through the use of a validation set.
Topics
- Machine Learning Paradigms and Tasks
- Supervised Learning
- Classification
- Regression
- Unsupervised Learning
- Density Estimation
- Clustering
- Anomaly Detection
- Dimension Reduction
- Reinforcement Learning
- Supervised Learning
- Baseline Methods
DummyRegressor
- Supervised Learning Metrics
- Regression
- Root Mean Square Error (RMSE)
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
- Coefficient of Determination (\(R^2\))
- Max Error
- Regression
- K-Nearest Neighbors (KNN) Regression
KNeighborsRegressor
- Generalization
- Overfitting
- Underfitting
- Generalization Gap
- Train, Test, and Validation Datasets
train_test_split
Activities
Week 01
- Start: Tuesday, January 21
- End: Friday, January 24
Summary
Welcome to CS 307! This week, you will become familiar with the course policies and setup your machine to complete homework and labs.
Learning Objectives
After completing this week, you are expected to be able to:
Topics
- What is Machine Learning?
- Computing Setup
- CS 307 Course Policies