# Course Content

CS 307 content will largely be structured around three quizzes and each quiz will correspond to three modules of content. Additionally, we will have an introductory and concluding module.

## Introduction

The first group of modules (containing only one module) will serve as an introduction to CS 307.

### Module 00

In this module, you will become familiar with the course and we will get your machine setup to complete homework and labs. We will then overview the fundamental machine learning **tasks**, introduce two very basic **methods**, and definite basic **metrics** for assessing supervised learning metrics.

#### Topics

- Machine Learning Tasks
- Supervised Learning
- Classification
- Regression

- Unsupervised Learning
- Density Estimation
- Clustering
- Novelty and Outlier Detection
- Dimension Reduction

- Reinforcement Learning

- Supervised Learning
- Baseline Methods
`DummyClassifier`

`DummyRegressor`

- Supervised Learning Metrics
- Regression
- Root Mean Square Error (RMSE)
- Mean Absolute Error (MAE)
- Mean Absolute Percentage Error (MAPE)
- Coefficient of Determination (\(R^2\))
- Max Error

- Classification
- Accuracy
- Misclassification

- Regression

#### Learning Objectives

After completing this module, you are expected to be able to:

*Understand*the syllabus of the course.*Understand*the objectives of the course.*Communicate*with the course staff.*Use*Python, Jupyter, and VSCode to produce code for labs, quizzes, and MPs.*Use*PrairieLearn to complete homework, lab models, and MPs.*Use*Canvas to complete lab reports.*Differentiate*between supervised, unsupervised, and reinforcement learning.*Identify*regression and classification tasks.*Use*`sklearn`

baseline models`DummyClassifier`

and`DummyRegressor`

.*Calculate*metrics to evaluate predictions from regression and classification methods.

#### Slides, Scribbles, and Readings

## Quiz 01

**Quiz 01** will focus on two foundational nonparametric supervised learning methods: k-nearest neighbors (KNN) and decision trees. Both methods can be used for both classification and regression tasks. While discussing these methods, we will introduce the notion of generalization, and tools for model selection such as cross-validation.

### Module 01

In **Module 01** we will begin discussing supervised learning, both the regression and classification tasks. We will look at one of the foundational methods of machine learning: k-nearest neighbors. We will also introduce data splitting and overfitting.

#### Topics

- K-Nearest Neighbors (KNN) Regression
`KNeighborsRegressor`

- K-Nearest Neighbors (KNN) Classification
`KNeighborsClassifier`

- Overfitting
- Train, Test, and Validation Datasets
`train_test_split`

- Object-Oriented Programming (OOP) in Python

#### Learning Objectives

*Differentiate*between regression and classification tasks.*Use*k-nearest neighbors to make predictions for pre-processed data.*Understand*how conditional probabilities relate to classifications.*Estimate*and*calculate*conditional probabilities.*Use*k-nearest neighbors to estimate conditional probabilities.*Split*data into train, validation, and test sets.*Modify*a tuning parameter to control the flexibility of a model.*Avoid*overfitting by selecting an a model through the use of a validation set.

#### Slides, Scribbles, and Readings

### Module 02

In **Module 02** we will look at the bigger picture and focus on selecting tuning parameters and preprocessing data, especially heterogenous data stored in Pandas `DataFrames`

objects. We’ll also discuss overfitting, generalization, and a concept related both: the bias-variance tradeoff.

#### Topics

- Bias-Variance Tradeoff
- Generalization
- Cross-Validation
- Preprocessing
`sklearn`

API and Pipelines

#### Learning Objectives

*Understand*how model flexibility relates to the bias-variance tradeoff and thus model performance.*Tune*models by manipulating their flexibility through the use of a tuning parameter to find a model that generalizes well.*Avoid*overfitting by selecting a model of appropriate flexibility through the use of a validation set or cross-validation.*Use*`sklearn`

features such as`Pipeline`

,`ColumnTransformer`

,`SimpleImputer`

,`StandardScaler`

,`OneHotEncoder`

and others to perform reproducible preprocessing.*Use*`GridSearchCV`

to tune models (select appropriate values of tuning parameters) with cross-validation.

#### Slides, Scribbles, and Readings

### Module 03

In **Module 03** we will introduce another nonparametric method for supervised learning: decision trees.

#### Topics

- Regression Trees
- Classification Trees

#### Learning Objectives

*Understand*how decision trees differ from KNN when determining closeness of data.*Find*and*evaluate*decision tree splits for regression.*Find*and*evaluate*decision tree splits for classification.*Use*decision trees to make predictions for regression tasks using`sklearn`

.*Use*decision trees to make predictions for classification tasks using`sklearn`

.*Use*decision trees to estimate conditional probabilities for classification tasks using`sklearn`

.*Tune*the parameters of decision trees to avoid overfitting.

#### Slides, Scribbles, and Readings

## Quiz 02

**Quiz 02** will introduce linear methods for classification and regression, which will also present an opportunity to differentiate parametric and nonparametric methods. Then, we’ll modify existing methods that we have seen through the use of regularization and ensembles. Lastly, we’ll spend some time thinking about the specifics of evaluating binary classification, and some other miscellaneous practical concerns.

### Module 04

In **Module 04** we will introduce **linear models** for classification and regression, both parametric methods. We will begin to compare and contrast parametric and nonparametric methods.

#### Topics

- Linear Models
- Linear Regression
- Logistic Regression

- Parametric versus Nonparametric Models

#### Learning Objectives

*Differentiate*between parametric and nonparametric regression.*Use*`sklearn`

to fit linear regression models and make predictions for unseen data.*Estimate*conditional probabilities with logistic regression.*Use*`sklearn`

to fit logistic regression models and make predictions for unseen data..*Preprocess*data to add polynomial and interaction terms for use in linear models.*Understand*what makes linear models linear and how both linear regression and logistic regression are linear model.

#### Slides, Scribbles, and Readings

### Module 05

In **Module 05** we will modify previously seen methods to potentially improve their performance. We’ll look at **ensembles** of trees and add **regularization** to regression.

#### Topics

- Ensemble Methods
- Random Forests
- Boosted Models

- Regularization
- Lasso
- Ridge

#### Learning Objectives

*Understand*how the ridge and lasso constraints lead to shrunken and spare estimates.*Use*ridge regression to perform regression and classification.*Use*lasso to perform regression and classification.*Understand*how averaging the predictions from many trees (for example a random forest) can improve model performance.*Use*a random forest to perform regression and classification.*Use*boosting to perform regression and classification.

#### Slides, Scribbles, and Readings

### Module 06

In **Module 06** we will discuss binary classification in depth, in particular, metrics for evaluating binary classification models.

#### Topics

- Binary Classification
- Model Evaluation
- Practical Considerations

#### Learning Objectives

*Understand*the definitions of false positives, false negatives, and related metrics.*Calculate*metrics specific to binary classification.

#### Slides, Scribbles, and Readings

## Quiz 03

**Quiz 03** will wrap up our discussion of supervised learning with an introduction to generative models. Then, a brief detour to talk about unsupervised learning before diving into neural networks.

### Module 07

In **Module 07** we will introduce **generative models** for classification, with an emphasis on *Naive Bayes*.

#### Topics

- Generative Models
- Naive Bayes
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)

#### Learning Objectives

*Understand*the difference between discriminative and generative models.*Use*Naive Bayes models for the classification task.

#### Slides, Scribbles, and Readings

### Module 08

In **Module 08** we will introduce **unsupervised learning**. We will look at a variety of methods for the various subtasks: *dimension reduction*, *clustering*, *density estimation*, and *outlier detection*.

#### Topics

- Unsupervised Learning
- Dimension Reduction
- PCA

- Clustering
- k-Means
- Agglomerative Clustering
- DBSCAN

- Density Estimation
- Kernel Density Estimation
- Gaussian Mixture Models

- Outlier Detections
- One-Class SVM
- Isolation Forest

- Dimension Reduction

#### Learning Objectives

*Understand*the difference between supervised and unsupervised machine learning tasks.*Identify*supervised and unsupervised machine learning tasks.*Understand*and*identify*unsupervised learning subtasks: dimension reduction, clustering, density estimation, and outlier detection.*Use*principal components analysis (PCA) for dimension reduction.*Use*k-means and other methods for clustering.*Use*kernel density estimation and mixture models for density estimation.*Use*one-class SVM and isolation forest for outlier detection.

#### Slides, Scribbles, and Readings

### Module 09

Coming soon!

#### Topics

- Neural Networks

#### Learning Objectives

Coming soon!

#### Slides, Scribbles, and Readings

Coming soon!

## Conclusion

Coming soon!

### Module 10

Coming soon!

#### Topics

- Deep Learning

#### Learning Objectives

Coming soon!

#### Slides, Scribbles, and Readings

Coming soon!