Content
Weekly Objectives, Topics, and Activities
Week 01
- Start: Monday, August 26
- End: Friday, August 30
Summary
Welcome to CS 307! This week, you will become familiar with the course policies and we will get your machine setup to complete homework and labs.
Learning Objectives
After completing this week, you are expected to be able to:
Topics
- Big Questions in ML
- Compute Setup
- CS 307 Course Policy
- Lab Policy
- Homework Policy
Activities
- Reading: Getting Started
- Reading: Syllabus
- Reading: Computing Setup Guide
- Lecture Video: Welcome!
- In this video, Dave will guide you through everything you need to know about CS 307.
- Lecture Video: Computer Scientist Explains Machine Learning in 5 Levels of Difficulty
- In this video, Hilary Mason, an early and now long-time data science advocate, explains machine learning in a manner perfectly suited to kick-off your CS 307 journey. Hilary subtly introduces several broad themes that we will discuss throughout the course. You don’t need to learn any material from this video. It should actually leave you with more questions than answers! We’ll start to answer those questions next week!
- Tutorial Video: Homework in CS 307
- This video will guide you through the homework process that you will use in CS 307.
- Tutorial Video: Lab in CS 307
- This video will guide you through the lab process that you will use in CS 307.
- Assessment: Homework 00
- Assessment: Lab 00
Week 02
- Start: Tuesday, September 3
- End: Friday, September 26
Summary
In Week 02 we will begin our discussion of supervised learning, both the regression and classification tasks. We’ll start by introducing two very basic methods, and definite basic metrics for assessing supervised learning metrics. Then we will look at one of the foundational methods of machine learning: k-nearest neighbors. We will also introduce data splitting and overfitting.
Learning Objectives
After completing this week, you are expected to be able to:
- Differentiate between supervised, unsupervised, and reinforcement learning.
- Identify regression and classification tasks.
- Use
sklearn
baseline modelsDummyClassifier
andDummyRegressor
. - Calculate simple metrics to evaluate predictions from regression and classification methods.
- Differentiate between regression and classification tasks.
- Use k-nearest neighbors to make predictions for pre-processed data.
- Understand how conditional probabilities relate to classifications.
- Estimate and calculate conditional probabilities.
- Use k-nearest neighbors to estimate conditional probabilities.
- Split data into train, validation, and test sets.
- Modify a tuning parameter to control the flexibility of a model.
- Avoid overfitting by selecting an a model through the use of a validation set.
Topics
- Machine Learning Tasks
- Supervised Learning
- Classification
- Regression
- Unsupervised Learning
- Density Estimation
- Clustering
- Novelty and Outlier Detection
- Dimension Reduction
- Reinforcement Learning
- Supervised Learning
- Baseline Methods
DummyClassifier
DummyRegressor
- Supervised Learning Metrics
- Regression
- Root Mean Square Error (RMSE)
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
- Coefficient of Determination (\(R^2\))
- Max Error
- Classification
- Accuracy
- Misclassification
- Regression
- K-Nearest Neighbors (KNN) Regression
KNeighborsRegressor
- K-Nearest Neighbors (KNN) Classification
KNeighborsClassifier
- Overfitting
- Train, Test, and Validation Datasets
train_test_split
Activities
- Assessment: Homework 01
- Assessment: Lab 01
- Video: Machine Learning Tasks
- Slides: Simple Methods and Metrics
- Slides: KNN Regression and Data Splitting
- Slides: KNN Classification
- We’ll officially discuss these slides in a video next week, but you’ll need the main idea for Homework 01. The main idea is actually quite intuitive, and the math and probability is mostly review.
Week 03
- Start: Monday, September 9
- End: Friday, September 13
Summary
In Week 03 we will look at the bigger picture and focus on selecting tuning parameters and preprocessing data, especially heterogenous data stored in Pandas DataFrames
objects. We’ll also discuss overfitting, generalization, and a concept related both: the bias-variance tradeoff.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand how model flexibility relates to the bias-variance tradeoff and thus model performance.
- Tune models by manipulating their flexibility through the use of a tuning parameter to find a model that generalizes well.
- Avoid overfitting by selecting a model of appropriate flexibility through the use of a validation set or cross-validation.
- Use
sklearn
features such asPipeline
,ColumnTransformer
,SimpleImputer
,StandardScaler
,OneHotEncoder
and others to perform reproducible preprocessing. - Use
GridSearchCV
to tune models (select appropriate values of tuning parameters) with cross-validation.
Topics
- Bias-Variance Tradeoff
- Generalization
- Cross-Validation
- Preprocessing
sklearn
API and Pipelines
Activities
More activities coming soon! Notes will be released on Tuesday. Videos will be released on Wednesday.
Week 04
- Start: Monday, September 16
- End: Friday, September 20
Summary
In Week 04 we will introduce another nonparametric method for supervised learning: decision trees.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand how decision trees differ from KNN when determining similarity of data.
- Find and evaluate decision tree splits for regression.
- Find and evaluate decision tree splits for classification.
- Use decision trees to make predictions for regression tasks using
sklearn
. - Use decision trees to make predictions for classification tasks using
sklearn
. - Use decision trees to estimate conditional probabilities for classification tasks using
sklearn
. - Tune the parameters of decision trees to avoid overfitting.
Topics
- Regression Trees
- Classification Trees
Activities
Week 05
- Start: Monday, September 23
- End: Friday, September 27
Summary
In Week 05 we will prepare for Quiz 01!
Learning Objectives
After completing this week, you are expected to be able to:
- Understand the policies and procedures for Quiz 01.
Topics
- No new topics this week!
Activities
Week 06
- Start: Monday, September 30
- End: Friday, October 4
Summary
In Week 06 you will take Quiz 01!
Topics
- No new topics this week!
Activities
- Assessment: Homework 04
- Assessment: Lab 04
lab-04-notebook-live.ipynb
- This notebook contains code seen during discussion, as well as some additional code that will more-or-less complete the lab.
Week 07
- Start: Monday, October 7
- End: Friday, October 11
Summary
In Week 07 we will introduce linear models for classification and regression, both parametric methods. We will begin to compare and contrast parametric and nonparametric methods.
Learning Objectives
After completing this week, you are expected to be able to:
- Differentiate between parametric and nonparametric regression.
- Use
sklearn
to fit linear regression models and make predictions for unseen data. - Estimate conditional probabilities with logistic regression.
- Use
sklearn
to fit logistic regression models and make predictions for unseen data.. - Preprocess data to add polynomial and interaction terms for use in linear models.
- Understand what makes linear models linear and how both linear regression and logistic regression are linear models.
Topics
- Linear Models
- Linear Regression
- Logistic Regression
- Parametric versus Nonparametric Models
Activities
Week 08
- Start: Monday, October 14
- End: Friday, October 18
Summary
In Week 08 we will modify previously seen methods to potentially improve their performance. We’ll look at ensembles of trees and add regularization to regression.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand how the ridge and lasso constraints lead to shrunken and spare estimates.
- Use ridge regression to perform regression and classification.
- Use lasso to perform regression and classification.
- Understand how averaging the predictions from many trees (for example using a random forest) can improve model performance.
- Use a random forest to perform regression and classification.
- Use boosting to perform regression and classification.
Topics
- Ensemble Methods
- Random Forests
- Boosted Models
- Regularization
- Lasso
- Ridge
Activities
Week 09
- Start: Monday, October 21
- End: Friday, October 25
Summary
In Week 08 we will discuss binary classification in depth, in particular, metrics for evaluating binary classification models.
Learning Objectives
After completing this week, you are expected to be able to:
- Understand the definitions of false positives, false negatives, and related metrics.
- Calculate metrics specific to binary classification.
- Evaluate models for binary classification.
Topics
- Binary Classification
- Binary Classification Metrics
- Model Evaluation
Activities
Week 10
- Start: Monday, October 28
- End: Friday, November 1
Summary
In Week 10 we will prepare for Quiz 02!
Learning Objectives
After completing this week, you are expected to be able to:
- Understand the policies and procedures for Quiz 02.
Topics
- No new topics this week!