- Start: Monday, September 8
- End: Saturday, September 13
Summary
This week, we will continue our discussion of supervised learning, switching our focus to the classification task. We will again focus on \(k\)-nearest neighbors (KNN), but now for the classification task.
Learning Objectives
After completing this week, you are expected to be able to:
- Use
sklearn
DummyClassifier
as a baseline for comparison. - Calculate simple metrics to evaluate predictions from learned classifiers.
- Differentiate between regression and classification tasks.
- Use k-nearest neighbors to make predictions for pre-processed data.
- Understand how conditional probabilities relate to classifications.
- Estimate and calculate conditional probabilities.
- Use k-nearest neighbors to estimate conditional probabilities.
- Use
sklearn
features such asPipeline
,ColumnTransformer
,SimpleImputer
,StandardScaler
, andOneHotEncoder
to perform preprocessing while avoiding data leakage.
Topics
- Probability
- Conditional Probability
- Bayes’ Theorem
- Baseline Methods
DummyClassifier
- K-Nearest Neighbors (KNN) Classification
KNeighborsClassifier
- Supervised Learning Metrics
- Classification
- Accuracy
- Misclassification
- Classification
- Preprocessing
- Numeric Scaling
- One-Hot Encoding
- Dummy Variables and Encoding
- Missing Data and Imputation
- Data Leakage
sklearn
API and Pipelines