The Whole Game

Using sklearn for Supervised Learning

This document will serve as a high-level sklearn tutorial and guide for the supervised learning task.

sklearn Website

# basics
import pandas as pd
import numpy as np

# data
from sklearn.datasets import load_breast_cancer

# machine learning
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import ParameterGrid
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score

Loading Data

To perform supervised learning in sklearn, data must first be loaded into Python. There are many, many ways to do so.

One of the most common tools to load data into Python for data science is the input-output functionality of pandas.

Input-Output | pandas API Reference

Two specific functions that we have seen and used are read_csv and read_parquet.

Data Splitting

After data has been loaded, and before any inspection or analysis, it should be train-test split to avoid data leakage. In sklearn, the train_test_split function is available to perform this task.

train_test_split | sklearn API Reference

# train-test split
train_data, test_data = train_test_split(
    full_data,
    test_size=0.20,
    random_state=42,
    stratify=full_data["Target"],
)

The above example assumes that full_data contains both the features and target, in this case with a target named Target.

The test_size parameter controls how much of the data is withheld for the test set.
The random_state parameter fixes the randomization, as the splitting is done at random by default, for reproducibility.
The stratify parameter is useful for splitting in classification tasks, especially those that may have severe class imbalance.

If the initial data contains columns for both the features and the target, it is necessary to further separate the data into an object that contains only the features (often called X) and object that contains only the target variable (often called y). This should be done to both the train and test data.

# create X and y for train
X_train = train_data.drop(columns="Target")
y_train = train_data["Target"]

# create X and y for test
X_test = test_data.drop(columns="Target")
y_test = test_data["Target"]

Sometimes, data can be loaded pre-separated into features (X) and target (y) objects. In that case, the splitting code looks slightly different, and it becomes important to be aware of the order of the objects returned.

# train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.25,
    random_state=42,
)

Available Models

There are many potential models that could be fit to the train data. Let’s create a rough categorization of the models that we have seen so far.

Baseline Models

Dummy Models
- DummyClassifier
- DummyRegressor

Basic Models

\(k\)-Nearest Neighbors
- KNeighborsClassifier
- KNeighborsRegressor
Decision Tree
- DecisionTreeClassifier
- DecisionTreeRegressor
Linear Models
- LogisticRegression
- LinearRegression

Regularized Models

Lasso
- LogisticRegression
- Lasso
Ridge
- LogisticRegression
- Ridge

Ensembles

Random Forest
- RandomForestClassifier
- RandomForestRegressor
Boosted Models
- HistGradientBoostingClassifier
- HistGradientBoostingRegressor

These are simply what we have seen, but there are many, many more potential models we could fit! However, this set of models will serve you well, as they are practically useful. More importantly, if you understand how to work with these models, you can easily work with any model available in sklearn.

Each of the above can be used for either regression (often containing Regressor in its name) or classification (often containing Classifier in its name) by using the specific class within sklearn for the desired task.¹

When comparing and contrasting these models, there are several questions you should ask?

What are the available tuning parameters?
- What is the relationship of these parameters to the model’s flexibility?
- Which parameters are most useful to tune?
Is the model a strong or a weak learner?
- Can a parameter be used to make the model strong or weak?
Can the model learn nonlinear relationships and interaction?
Is the model sensitive to the scaling of input features?
Is the model fast to train?
Is the model fast to predict?
What is learned and thus required to be stored?
- What is the size of the model when stored?

Model Fitting

The beauty of sklearn is its API design, and thus the similarity of usage of these models. Machine Learning models in sklearn are implemented as classes. Thus, generally before fitting a model, it first must be instantiated (created).

wine = load_breast_cancer(as_frame=True).frame

wine

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	...	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension	target
0	17.99	10.38	122.80	1001.0	0.11840	0.27760	0.30010	0.14710	0.2419	0.07871	...	17.33	184.60	2019.0	0.16220	0.66560	0.7119	0.2654	0.4601	0.11890	0
1	20.57	17.77	132.90	1326.0	0.08474	0.07864	0.08690	0.07017	0.1812	0.05667	...	23.41	158.80	1956.0	0.12380	0.18660	0.2416	0.1860	0.2750	0.08902	0
2	19.69	21.25	130.00	1203.0	0.10960	0.15990	0.19740	0.12790	0.2069	0.05999	...	25.53	152.50	1709.0	0.14440	0.42450	0.4504	0.2430	0.3613	0.08758	0
3	11.42	20.38	77.58	386.1	0.14250	0.28390	0.24140	0.10520	0.2597	0.09744	...	26.50	98.87	567.7	0.20980	0.86630	0.6869	0.2575	0.6638	0.17300	0
4	20.29	14.34	135.10	1297.0	0.10030	0.13280	0.19800	0.10430	0.1809	0.05883	...	16.67	152.20	1575.0	0.13740	0.20500	0.4000	0.1625	0.2364	0.07678	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
564	21.56	22.39	142.00	1479.0	0.11100	0.11590	0.24390	0.13890	0.1726	0.05623	...	26.40	166.10	2027.0	0.14100	0.21130	0.4107	0.2216	0.2060	0.07115	0
565	20.13	28.25	131.20	1261.0	0.09780	0.10340	0.14400	0.09791	0.1752	0.05533	...	38.25	155.00	1731.0	0.11660	0.19220	0.3215	0.1628	0.2572	0.06637	0
566	16.60	28.08	108.30	858.1	0.08455	0.10230	0.09251	0.05302	0.1590	0.05648	...	34.12	126.70	1124.0	0.11390	0.30940	0.3403	0.1418	0.2218	0.07820	0
567	20.60	29.33	140.10	1265.0	0.11780	0.27700	0.35140	0.15200	0.2397	0.07016	...	39.42	184.60	1821.0	0.16500	0.86810	0.9387	0.2650	0.4087	0.12400	0
568	7.76	24.54	47.92	181.0	0.05263	0.04362	0.00000	0.00000	0.1587	0.05884	...	30.37	59.16	268.6	0.08996	0.06444	0.0000	0.0000	0.2871	0.07039	1

569 rows × 31 columns

# train-test split
wine_train, wine_test = train_test_split(
    wine,
    test_size=0.20,
    random_state=42,
    stratify=wine["target"],
)

# create X and y for train
X_train = wine_train.drop(columns="target")
y_train = wine_train["target"]

# create X and y for test
X_test = wine_test.drop(columns="target")
y_test = wine_test["target"]

Let’s first use a random forest as an example. To quickly obtain a list of the available parameters, and their default values, we can use the get_params method.

RandomForestClassifier().get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'sqrt',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'monotonic_cst': None,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': None,
 'verbose': 0,
 'warm_start': False}

Note that what we’re really doing here is first instantiating an instance of the random forest class with RandomForestClassifier(), then obtain the parameters and the values used to initialize the random forest. It just so happens that we used all the default parameter values.

Instead of using the default values, let’s modify a couple, and give this instance a name, rf.

rf = RandomForestClassifier(
    n_estimators=25,
    max_depth=10,
    random_state=42,
)

We can again use get_params to check the parameter values, this time, verifying that n_estimators, max_depth, and random_state were set correctly.

rf.get_params()

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': 10,
 'max_features': 'sqrt',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'monotonic_cst': None,
 'n_estimators': 25,
 'n_jobs': None,
 'oob_score': False,
 'random_state': 42,
 'verbose': 0,
 'warm_start': False}

In addition to get_params, there are several other common and important methods for sklearn model classes.

get_params
fit
predict
predict_proba²
score

Importantly, the fit method must be called before any of predict, predict_proba, or score. What happens if a model is not fit before calling these other methods?

rf.predict(X_test)

---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 rf.predict(X_test)

File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/sklearn/ensemble/_forest.py:904, in ForestClassifier.predict(self, X)
    883 def predict(self, X):
    884     """
    885     Predict class for X.
    886 
   (...)
    902         The predicted classes.
    903     """
--> 904     proba = self.predict_proba(X)
    906     if self.n_outputs_ == 1:
    907         return self.classes_.take(np.argmax(proba, axis=1), axis=0)

File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/sklearn/ensemble/_forest.py:944, in ForestClassifier.predict_proba(self, X)
    922 def predict_proba(self, X):
    923     """
    924     Predict class probabilities for X.
    925 
   (...)
    942         classes corresponds to that in the attribute :term:`classes_`.
    943     """
--> 944     check_is_fitted(self)
    945     # Check data
    946     X = self._validate_X_predict(X)

File /opt/hostedtoolcache/Python/3.12.8/x64/lib/python3.12/site-packages/sklearn/utils/validation.py:1757, in check_is_fitted(estimator, attributes, msg, all_or_any)
   1754     return
   1756 if not _is_fitted(estimator, attributes, all_or_any):
-> 1757     raise NotFittedError(msg % {"name": type(estimator).__name__})

NotFittedError: This RandomForestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Oh no! An error. While the entire error message is lengthy, as always, you should read error message from the bottom to the top. Read the last line. Well look at that! This error message is excellent, and in rather plain terms, tells us the issue and how to fix it.

Annoyingly, when working in Jupyter, error messages are often truncated by default. This is an issue given the advice to read from the bottom up! Be sure to click scrollable element at the bottom of error messaging in Jupyter to allow you to scroll to the end of the message.

Let’s actually fit this model, which requires supply an X and y.

_ = rf.fit(X_train, y_train)

We assign the output the name _ to suppress the output, as the important information is stored in the class itself. If you’d like to see the output, simply run rf.

rf

RandomForestClassifier(max_depth=10, n_estimators=25, random_state=42)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Now that we’ve fit the model, we can use the other methods.

rf.predict(X_test)

array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0,
       1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0,
       0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1,
       1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
       1, 0, 1, 1])

The predict method takes an X as input, and outputs the predicted target for each sample of X.

rf.score(X_test, y_test)

0.956140350877193

The score methods takes both an X and y and then “scores” the result. How does it do this scoring? It depends on the model, but generally for classification it provides accuracy and for regression it uses \(R^2\).³ We generally suggest ignoring the score method. Instead, we recommend using the metric functions discussed later.

rf.predict_proba(X_test[:10])

array([[1.  , 0.  ],
       [0.  , 1.  ],
       [0.88, 0.12],
       [0.68, 0.32],
       [0.96, 0.04],
       [0.04, 0.96],
       [0.04, 0.96],
       [1.  , 0.  ],
       [1.  , 0.  ],
       [1.  , 0.  ]])

Lastly, for most classification methods, we can use predict_proba to obtain the estimated conditional probability for each category of the target, for each sample provided via X.

While we used a random forest as an example, except for setting n_estimators and max_depth, nothing we’ve done here is specific to random forests, and applies to all machine learning methods available in sklearn.

Metrics and Evaluation

Once models have been fit, they need to be evaluated. While model fitting is often more interesting to study, model evaluation is at least as important, if not more important. Choosing the appropriate evaluation strategy can make or break a model’s effectiveness in practice.

Within sklearn many potential metrics for evaluation are implemented. The User Guide provides an overview of the available metrics, while highlighting two ways the metrics can be used.

Metrics and Scoring | sklearn User Guide

The table in section 3.4.1.1 groups the metrics by their associated task, one of: classification, clustering, and regression. The Scoring and Function columns express how to utilize each metrics for scoring or as a function.

The function variant computes the metric given input data, which usually includes the truth (y_true) and the predicted values (y_pred). The scoring variant, which is simply a string, is used to define which metric will be used to evaluate models during cross-validation procedures such as GridSearchCV.

Let’s focus on the function version for now, and we’ll return to the scoring variant when we discuss tuning.

To demonstrate, we’ll first need some predictions from our learned model.

y_pred = rf.predict(X_test)

The parameters of the scoring functions follow a pattern: the true values first, the predicted values second. We reinforce this by first demonstrating the use of f1_score while naming the parameters.

f1_score(y_true=y_test, y_pred=y_pred)

0.9655172413793104

However in practice, the parameter names are usually suppressed.

f1_score(y_test, y_pred)

0.9655172413793104

We note this because many metrics are not affected by the order in which the true and predicted values are supplied, but this is not true in general. The order does matter.

Tuning and Searching

No pipelines or preprocessing will be needed for Quiz 03. As such, we skip those topics, but note that they are important in practice.

The most common approach to tuning a model in sklearn is a combination of cross-validation and a grid search.

According to the above user guide, a search consists of:

an estimator (regressor or classifier such as sklearn.svm.SVC());
a parameter space;
a method for searching or sampling candidates;
a cross-validation scheme; and
a score function.

These components are neatly combined via the GridSearchCV function.

GridSearchCV | sklearn API Reference

Let’s look at an example.

First, we’ll pick an estimator. Here, we’ve chosen a decision tree classifier.

dtc = DecisionTreeClassifier(random_state=42)

Note that we are setting random_state=42. While you certainly could tune this parameter, it would be a rather silly thing to do! Instead, we set this parameter, which will be fixed and used in the remainder of this example, to control the random elements of decision trees.

Next, we can specify the parameter space that we will search in. This effectively amounts to specifying the values of each parameter that will be considered.

dtc_grid = {
    "max_depth": [1, 3, 5, 15, 25, None],
    "splitter": ["best", "random"]
}

Within GridSearchCV, a “grid” of these values will be considered. In this case, that would be trying both the "best" and "random" splitter for each value of max_depth. To see the fully expanded grid, we can use ParameterGrid, which GridSearchCV uses internally.

pd.DataFrame(ParameterGrid(dtc_grid))

	max_depth	splitter
0	1.0	best
1	1.0	random
2	3.0	best
3	3.0	random
4	5.0	best
5	5.0	random
6	15.0	best
7	15.0	random
8	25.0	best
9	25.0	random
10	NaN	best
11	NaN	random

How should you decide which parameters to consider, and what values of those parameters to include in the grid? That’s a difficult question to answer. For any specific model, usually a few key parameters are useful, like \(k\) for KNN and max_depth for tree-based models. The usefulness of many parameters across many models is really an intuition that must be developed through practice and experimentation. Remember, that intuition is at best a heuristic, and there are not “rules” about which parameters and values should be used. The magnitudes of the values of the parameters are an important consideration. (Think \(k\) in KNN versus \(\lambda\) for regularized linear models.)

If you’re unsure where to start, start small. Try a “small” grid and iterate. The above example is both reasonably small, but for its purpose, effective. Only move to a “large” grid if necessary, and you have the time. The bigger the grid, the more compute time needed!

Our method for searching will be an exhaustive grid search.⁴ That is, we will simply try each parameter combination included in the grid. This is exactly what GridSearchCV does.

dtc_tuned = GridSearchCV(
    dtc,
    dtc_grid,
    cv=5,
    scoring=[
        "accuracy",
        "precision",
        "recall",
        "f1",
    ],
    refit="f1",
)

The first parameter (estimator), that we give dtc, specifies the estimator to be used.

The second parameter (param_grid), that we give dtc_grid, specifies the parameter space that will be searched.

The search method is implicit in the use of GridSearchCV.

The third parameter, cv, that we give 5, specifies the cross-validation scheme. In this case, we are using the (default) value 5, which specifies 5-fold cross-validation. In general, we highly recommend using 5 and not changing this value.

The fourth parameter scoring, specifies the scoring to be used within GridSearchCV. That is, it specifies the metrics that will be cross-validated. The potential values here are those listed in the previously referenced Metrics and Scoring section of the sklearn User Guide, specifically, in the “Scoring” column of Table 3.4.1.1. Here, we’re providing a list of scoring methods, or alternatively you can provide a single scoring method.

Given that we provided a list of scoring methods, we have also supplied a value for the refit parameter, in this case f1. Doing so tells GridSearchCV which of the scoring methods to use when choosing the “best” model. In this case, the model with the best \(F_1\) score will be selected. If a single scoring method is supplied to scoring, that method will be used as the scoring method for refit which needing to specify it.

The refit parameter is a nice feature of GridSearchCV. It “refits” the best model found to the provided training data. It then allows the use of methods like predict and predict_proba on the object return from GridSearchCV as if the model have been directly fit to the training data as we did above.

Before we can use predict, like models themselves, we first need to “fit” our GridSearchCV object with its fit method. Like before, we specify the X and y components of the training data.

dtc_tuned.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=DecisionTreeClassifier(random_state=42),
             param_grid={'max_depth': [1, 3, 5, 15, 25, None],
                         'splitter': ['best', 'random']},
             refit='f1', scoring=['accuracy', 'precision', 'recall', 'f1'])

To “see” the model selected, we can check the best_estimator_ attribute.

dtc_tuned.best_estimator_

DecisionTreeClassifier(max_depth=3, random_state=42, splitter='random')

To simply see the best parameter values that were select, we can access the best_param_ attribute.

dtc_tuned.best_params_

{'max_depth': 3, 'splitter': 'random'}

The best_estimator_ attribute contains a fitted version of the selected model.

dtc_tuned.best_estimator_.predict(X_test)[:10]

array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0])

However, it is generally not necessary to access best_estimator_! Instead, you can simply use methods like predict on a GridSearchCV object that has been fit.

dtc_tuned.predict(X_test)[:10]

array([0, 1, 0, 0, 0, 1, 1, 0, 0, 0])

For quizzes in CS 307, we require that the GridSearchCV object itself be submitted to the autograder. Submitted only the best_estimator_ object is not sufficient.

The cv_results_ attribute collects details of the scoring, including the mean and standard deviation of each scoring method supplied.

pd.DataFrame(dtc_tuned.cv_results_)

	mean_fit_time	std_fit_time	mean_score_time	std_score_time	param_max_depth	param_splitter	params	split0_test_accuracy	split1_test_accuracy	split2_test_accuracy	...	std_test_recall	rank_test_recall	split0_test_f1	split1_test_f1	split2_test_f1	split3_test_f1	split4_test_f1	mean_test_f1	std_test_f1	rank_test_f1
0	0.003214	0.000299	0.006699	0.000136	1	best	{'max_depth': 1, 'splitter': 'best'}	0.912088	0.912088	0.868132	...	0.023798	11	0.928571	0.929825	0.901639	0.902655	0.928571	0.918252	0.013162	11
1	0.001814	0.000016	0.006481	0.000064	1	random	{'max_depth': 1, 'splitter': 'random'}	0.890110	0.901099	0.901099	...	0.039072	12	0.910714	0.915888	0.918919	0.897196	0.865385	0.901620	0.019586	12
2	0.005306	0.000020	0.006503	0.000019	3	best	{'max_depth': 3, 'splitter': 'best'}	0.934066	0.923077	0.901099	...	0.021053	1	0.947368	0.938053	0.923077	0.925620	0.973913	0.941606	0.018376	7
3	0.001928	0.000010	0.006503	0.000108	3	random	{'max_depth': 3, 'splitter': 'random'}	0.967033	0.978022	0.879121	...	0.037791	3	0.974359	0.982456	0.902655	0.938053	0.956522	0.950809	0.028532	1
4	0.006913	0.000233	0.006534	0.000035	5	best	{'max_depth': 5, 'splitter': 'best'}	0.945055	0.945055	0.901099	...	0.017891	2	0.956522	0.955752	0.923077	0.929825	0.965517	0.946139	0.016576	3
5	0.002051	0.000041	0.006435	0.000039	5	random	{'max_depth': 5, 'splitter': 'random'}	0.923077	0.967033	0.901099	...	0.023798	7	0.938053	0.973451	0.918919	0.947368	0.955752	0.946709	0.018136	2
6	0.007616	0.000772	0.006528	0.000030	15	best	{'max_depth': 15, 'splitter': 'best'}	0.912088	0.901099	0.901099	...	0.024811	8	0.928571	0.918919	0.923077	0.913793	0.956522	0.928176	0.014981	8
7	0.002104	0.000055	0.006487	0.000102	15	random	{'max_depth': 15, 'splitter': 'random'}	0.912088	0.956044	0.923077	...	0.026257	4	0.928571	0.964286	0.939130	0.957265	0.928571	0.943565	0.014740	4
8	0.007677	0.000787	0.006752	0.000405	25	best	{'max_depth': 25, 'splitter': 'best'}	0.912088	0.901099	0.901099	...	0.024811	8	0.928571	0.918919	0.923077	0.913793	0.956522	0.928176	0.014981	8
9	0.002113	0.000063	0.006431	0.000017	25	random	{'max_depth': 25, 'splitter': 'random'}	0.912088	0.956044	0.923077	...	0.026257	4	0.928571	0.964286	0.939130	0.957265	0.928571	0.943565	0.014740	4
10	0.007571	0.000787	0.006592	0.000125	None	best	{'max_depth': None, 'splitter': 'best'}	0.912088	0.901099	0.901099	...	0.024811	8	0.928571	0.918919	0.923077	0.913793	0.956522	0.928176	0.014981	8
11	0.002171	0.000068	0.006445	0.000019	None	random	{'max_depth': None, 'splitter': 'random'}	0.912088	0.956044	0.923077	...	0.026257	4	0.928571	0.964286	0.939130	0.957265	0.928571	0.943565	0.014740	4

12 rows × 39 columns

Here, we’ve wrapped the results using pd.DataFrame() to make the results more readable. Additionally, we can inspect specific columns to further increase readability.

pd.DataFrame(dtc_tuned.cv_results_)[
    [
        "param_max_depth",
        "param_splitter",
        "mean_test_f1",
        "mean_test_accuracy",
    ]
]

	param_max_depth	param_splitter	mean_test_f1	mean_test_accuracy
0	1	best	0.918252	0.896703
1	1	random	0.901620	0.883516
2	3	best	0.941606	0.925275
3	3	random	0.950809	0.938462
4	5	best	0.946139	0.931868
5	5	random	0.946709	0.934066
6	15	best	0.928176	0.909890
7	15	random	0.943565	0.929670
8	25	best	0.928176	0.909890
9	25	random	0.943565	0.929670
10	None	best	0.928176	0.909890
11	None	random	0.943565	0.929670

Inspecting these results verifies that the highest \(F_1\) score is obtained with the best_params_ values of the parameters.

To round out this example, we calculate the test accuracy and \(F_1\) score with the chosen model.

y_pred = dtc_tuned.predict(X_test)

accuracy_score(y_test, y_pred)

0.9385964912280702

f1_score(y_test, y_pred)

0.951048951048951

Footnotes

A tricky exception is LogisticRegression which is used for the classification task.↩︎
The predict_proba method is only available for (some) classification methods.↩︎
The documentation for each model class in sklearn will specify the scorer used.↩︎
We have not explored alternatives here, but as an example, you could instead consider a random search across a grid, which sklearn can accomplish with RandomizedSearchCV.↩︎