Confusion matrix#
Inaccuracy matrix is a very important concept for evaluating classification models.
import numpy as np
import pandas as pd
import sklearn
from sklearn.datasets import make_classification
Example task#
Consider a binary classification problem. We have two classes, Positive and Negative.
Let be:
\(P\) is the number of positive observations in the sample;
\(N\) is the number of negative observations in the sample.
The following code generates possible outputs of the classification task:
\(y\) - real targets, 0 corresponds to negative 1 to positive;
\(t_i\) - score that indicates the probability that a particular object belongs to the positive class;
\(\hat{y}=\left[t_i>T\right]\) - final predicts that depends which depend on the cut-off threshold - \(T\). You can choose different \(T\) and it will fill the confusion matrix - it’s discussed in the following sections. For the example below, \(T\) is the mean of \(t_i=\overline{1,n}\).
scores, y = make_classification(
n_features=1,
n_informative=1,
n_redundant=0,
n_repeated=0,
n_clusters_per_class=1,
flip_y=0.3,
random_state=2
)
scores = scores.ravel()
pd.DataFrame({
"$$y$$" : y,
"$$t_i$$" : scores,
"$$\hat{y}$$":(scores>np.mean(scores)).astype("int")
}).head()
$$y$$ | $$t_i$$ | $$\hat{y}$$ | |
---|---|---|---|
0 | 0 | -0.644251 | 0 |
1 | 0 | -1.162201 | 0 |
2 | 1 | -0.624533 | 0 |
3 | 1 | 2.033877 | 1 |
4 | 1 | -1.012203 | 0 |
Idea#
Now, suppose we have formed some classifier. We have the following groups of observations.
True positive: observations that were positive in the sample and we correctly predicted them as positive. We will denote their number as \(TP\);
True negative: observations that were negative in the sample and we correcrly predicted then as negative. We will denote their number as \(TN\);
False positve: observations that were negative in the sample, but which we then mistakenly predicted to be positive. We will denote their number as \(FP\);
False negative: observations that were positive in the sample, but wich we then mistakenly predicted to be negative. We will denote their number as \(FN\).
So, if you put the actual value on the rows and the predicted value on the columns, you will get a confusion matrix.
Predicted \(N\) |
Predicted \(P\) |
|
---|---|---|
Actual \(N\) |
\(TN\) |
\(FP\) |
Actual \(P\) |
\(FN\) |
\(TP\) |
Also valuable is the representation of the confusion matrix using relative values.
Let be:
\(P^* = TP + FP\) - number of observations from the sample predicted as positive;
\(N^* = TN + FN\) - number of observations from the sample predicted as negative;
\(TNR = TN/N^*\) - true negative rate, the proportion of correct predictions among observations that are predicted negative;
\(FNR = FN/N^*\) - false negative rate, the proportion of incorrect predictions among observations that are predicted to be negative;
\(TPR = TP/P^*\) - true positive rate, the proportion of correct predictions among observations that are predicted to be positive;
\(FPR = FP/P^*\) - false positve rate, the proportion of incorrect predicitons among observations that are predicted to be negative.
So using these notations the confusion matrix can also be written:
Predicted \(N\) |
Predicted \(P\) |
|
---|---|---|
Actual \(N\) |
\(TNR\) |
\(FPR\) |
Actual \(P\) |
\(FNR\) |
\(TPR\) |
Here is an example of calculating the confusion matrix using sklearn.metrics.confusion_matrix
:
from sklearn import metrics
sklearn.metrics.confusion_matrix(y, scores > np.mean(scores))
array([[39, 7],
[17, 37]])
Confusion table#
Idea#
Many classification models allow to return a score that indicates the probability that a particular object belongs to the positive class. You can select the threshold above which you consider the object under consideration to be positive. Different treshold values will consequently produce different confusion matrixes.
The table that puts in correspondence to some selected threshold the table of contiguity will be called the confusion table.
\(T\) |
\(TN\) |
\(FP\) |
\(FN\) |
\(TP\) |
---|---|---|---|---|
\(t_1\) |
\(TN_1\) |
\(FP_1\) |
\(FN_1\) |
\(TP_1\) |
\(t_2\) |
\(TN_2\) |
\(FP_2\) |
\(FN_2\) |
\(TP_2\) |
… |
… |
… |
… |
… |
\(t_i\) |
\(TN_i\) |
\(FP_i\) |
\(FN_i\) |
\(TP_i\) |
… |
… |
… |
… |
… |
\(t_n\) |
\(TN_n\) |
\(FP_n\) |
\(FN_n\) |
\(TP_n\) |
Realisation#
I didn’t find ready realisation of the similar concept. So here is my own realisation.
The first thing that comes to mind is to use sklearn.metrix.confusion_matrix
for all needed \(t_i\). But this solution is extremely slow - an estimate of the complexity of the algorithm is \(O(nT')\), where \(n\) number of samples \(T'\) is the number of thresholds to check.
The following is a description of the algorithm that will work with complexity \(O(n)\) assuming that the observations are sorted in ascending order \(s_i\) and all thresholds are sorted in asceding order:
Text description#
As input we have three arrays:
\(\left\{y_1,y_2, ... , y_n\right\}\) - real classes of the observations, where: $\(y_i = \begin{cases} 0, \text{if i-th observation negative};\\ 1, \text{if i-th observation positive}. \end{cases}\)$
\(\left\{s_1,s_2, ..., s_n\right\}\) - scores of the observations;
\(\left\{t_1, t_2, ..., t_{T'}\right\}\) - the thresholds we’re interested in.
Let’s introduce cursors for the values we are interested in, and setup them values for the smallest rational threshold for a given problem:
\(TP'=P=\sum_{i=1}^n{y_i}\) - with the lowest threshold, all positive outcomes are truly classified as positive;
\(FP'=N = n -\sum_{i=1}^n{y_i}\) - with the lowest threshold, all negative outcomes are mistaken classified as positive;
\(TN'=0, FN'=0\) - with the lowest threshold there are no values that classified as negative;
\(i=1\) indexes \(y_i\) and \(s_i\);
\(j=1\) indexes \(t_j\).
Now let’s talk about the iterative procedure:
At each step we compare \(s_i\) and \(t_j\):
If \(s_i < t_j\):
If \(y_i=1\), then we add one to \(FN'\) and subtract one from \(TP'\);
If \(y_i=0\), then we add one to \(TN'\) and subtract one from \(FP'\);
Come the the next observation - add one to \(i\);
If \(s_i \geq t_j\):
Add the current values of \(TP', FP', TN', FN'\) to the confusion table row corresponding to \(t_j\);
Come to the next threshold, - add one to \(j\);
The algorithm stops when we’ve circled all thresholds \(j>T'\).
Python realisaton#
In this section is a python function that implements the algorithm below.
import numpy as np
def get_sorted_by_scores(y_true, y_score):
'''
Get sort y_true and y_score which
elements corresponds to each other
sorted by y_score.
Parameters
----------
y_true : numpy.array
array of outcomes;
y_scores : numpy.array
array of scores that defines order
of the outputs.
Returns
----------
y_true : numpy.array
y_true sorted according to y_score
y_scores : numpy array
sorted y_score;
'''
sorting = np.argsort(y_score)
return y_true[sorting], y_score[sorting]
def get_confusion_table(
y_true,
y_score,
thresholds=None
):
'''
Get confusion table - set of confusion
matrixes for given set of thresholds.
Parameters
----------
y_true : numpy.array
binary array that represents really classes of
observations;
y_score : numpy.array
array which elements corresponds to the y_true
and represents result scores of the classifier;
thresholds : numpy.array
set of thresholds. By default the same as y_score
Returns
---------
out : numpy.array
where each row is confusion matrix for particular
threshold. Each row contains values
threshold, TN, FP, FN, TP.
'''
if thresholds is None:
thresholds = y_score
# it's crusial for y_true
# and y_score to have same size
n = len(y_score)
if n != len(y_true):
raise ValueError(
"y_score and y_true must "
"have the same size. But got "
f"{n} and {len(y_true)}"
)
y_true, y_score = get_sorted_by_scores(y_true, y_score)
thresholds = (y_score if (thresholds is None) else np.sort(thresholds))
T_len = len(thresholds)
# setting up initial values for cursors
TP = np.sum(y_true)
FP = n-TP
TN=0;FN=0;i=0;j=0
res = []
while True:
if y_score[i] < thresholds[j]:
if y_true[i]: FN+=1;TP-=1
else: TN+=1;FP-=1
i+=1
else:
res.append([thresholds[j],TN,FP,FN,TP])
j+=1
if j >= T_len:
return np.array(res)
So let’s see how it works with the example we added earlier:
result = get_confusion_table(y, scores)
pd.DataFrame(
result[:, 1:],
index=pd.Index(result[:,0], name="$$T$$"),
columns=["$$TN$$", "$$FP$$", "$$FN$$", "$$TP$$"]
).astype("int")
$$TN$$ | $$FP$$ | $$FN$$ | $$TP$$ | |
---|---|---|---|---|
$$T$$ | ||||
-1.528063 | 0 | 46 | 0 | 54 |
-1.463307 | 0 | 46 | 1 | 53 |
-1.405160 | 1 | 45 | 1 | 53 |
-1.388968 | 2 | 44 | 1 | 53 |
-1.308240 | 3 | 43 | 1 | 53 |
... | ... | ... | ... | ... |
1.814459 | 45 | 1 | 50 | 4 |
1.834142 | 45 | 1 | 51 | 3 |
1.892121 | 45 | 1 | 52 | 2 |
1.917869 | 46 | 0 | 52 | 2 |
2.033877 | 46 | 0 | 53 | 1 |
100 rows × 4 columns
Tests#
For the example under consideration, everything looks great, but it’s good practice to make a simple example that can be easily handled in mind. Here’s the set of unit tests described and implemented for the above function.
Consider case:
\(y_i\in \left\{1,0,0,1,0\right\}\) - real classes of the observations;
\(s_i\in \left\{4,1,1,2,3\right\}\) - scores of the observations;
\(t_j \in \left\{2,3\right\}\) - the thresholds we’re interested in.
Let us consider the specified cut-off thresholds and indicate which of them observations at a given threshold belong to such a sector of the confusion matrix.
For \(t_1=2\):
\(i\) |
\(s_i\) |
\(y_i\) |
\(TN\) |
\(FP\) |
\(FN\) |
\(TP\) |
---|---|---|---|---|---|---|
1 |
4 |
1 |
× |
× |
× |
✓ |
2 |
1 |
0 |
✓ |
× |
× |
× |
3 |
1 |
0 |
✓ |
× |
× |
× |
4 |
2 |
1 |
× |
× |
× |
✓ |
5 |
3 |
0 |
× |
✓ |
× |
× |
Total |
2 |
1 |
0 |
2 |
For \(t_2=3\):
\(i\) |
\(s_i\) |
\(y_i\) |
\(TN\) |
\(FP\) |
\(FN\) |
\(TP\) |
---|---|---|---|---|---|---|
1 |
4 |
1 |
× |
× |
× |
✓ |
2 |
1 |
0 |
✓ |
× |
× |
× |
3 |
1 |
0 |
✓ |
× |
× |
× |
4 |
2 |
1 |
× |
× |
✓ |
× |
5 |
3 |
0 |
× |
✓ |
× |
× |
Total |
2 |
1 |
1 |
1 |
So let’s see if the results of our hand calculations are the same as the function’s output.
test_y = np.array([1,0,0,1,0])
test_scores = np.array([4,1,1,2,3])
test_thresholds = np.array([2,3])
result = get_confusion_table(test_y, test_scores, test_thresholds)
pd.DataFrame(
result[:, 1:],
index=pd.Index(result[:,0], name="$$T$$"),
columns=["$$TN$$", "$$FP$$", "$$FN$$", "$$TP$$"]
).astype("int")
$$TN$$ | $$FP$$ | $$FN$$ | $$TP$$ | |
---|---|---|---|---|
$$T$$ | ||||
2 | 2 | 1 | 0 | 2 |
3 | 2 | 1 | 1 | 1 |
Here are automatic test cases for the cases discussed earlier and some additional ones. So you don’t have to check everything yourself if you want to select something in the implementation of the function:
import unittest
class TestCounfusionTable(unittest.TestCase):
def test_array_sizes(self):
'''
If y_true and y_score have different
sizes if should raise ValueError.
'''
with self.assertRaises(ValueError):
get_confusion_table(
y_true=[1,2,3,4],
y_score=[3,2,1,4,3]
)
def test_basic_output(self):
'''
This is basic case for wich
we know actual answer.
'''
test_y = np.array([1,0,0,1,0])
test_scores = np.array([4,1,1,2,3])
test_thresholds = np.array([2,3])
exp_ans = np.array([
[2, 2, 1, 0, 2],
[3, 2, 1, 1, 1]
])
act_ans = get_confusion_table(
test_y, test_scores, test_thresholds
)
self.assertTrue(
(act_ans == exp_ans).all()
)
ans = unittest.main(argv=[''], verbosity=2, exit=False)
test_array_sizes (__main__.TestCounfusionTable)
If y_true and y_score have different ... ok
test_basic_output (__main__.TestCounfusionTable)
This is basic case for wich ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.002s
OK