Jaccard index#
A popular measure of the similarity of objects. I have encountered it as a quality measure in computer vision, more precisely, in the task of segmenting objects in a picture. So this page, for now, is purely an application of Jaccard’s measure in a pixel-by-pixel comparison of pictures.
Definition#
Let’s say we have a set of classes \(K\), a set of objects taking classes \(y = \{y_i\}, y_i \in K\), and a set of model predictions \(y' = \{y'_i\}, y'_i \in K\). So the Jaccard index that measures the similarity of \(y\) and \(y'\) for the class \(k\) can be written as
Where:
\([a]=\begin{cases} 1 - \text{if a statement true};\\ 0 - \text{if a statement false}. \end{cases}\)
Let us then speculate about the interpretation of the components of the presented formula:
Numenator \(\sum_{i=1}^{n} \left( \left[y_i=k\right]\left[y'_i=k\right]\right)\) - number of objects corresponding to class \(k\) that were predicted to be members of class \(k\);
Denominator \(\sum_{i=1}^n max\left(\left[y_i=k\right],\left[y'_i=k\right]\right)\) - number of object that was classified as \(k\) class or actually was.
So the sense of the formula is really simple - it’s part of the intersection in union.
Example#
Here is a simple example of calculating the Jaccard index for two dimensional arrays, which in our case simulate pictures:
import numpy as np
y_pred = np.array([
[1, 1, 2, 2, 2],
[1, 1, 2, 1, 2],
[1, 0, 0, 0, 0],
[2, 2, 2, 0, 0],
[2, 1, 1, 1, 2]
])
y_true = np.array([
[1, 1, 1, 2, 2],
[1, 1, 1, 2, 2],
[1, 1, 1, 2, 2],
[0, 0, 0, 2, 2],
[0, 0, 0, 2, 2]
])
unique_classes = np.intersect1d(np.unique(y_pred), np.unique(y_true))
result = {}
for val in unique_classes:
pred_class_pixels = (y_pred == val)
true_class_pixels = (y_true == val)
intersection = (pred_class_pixels & true_class_pixels)
union = (pred_class_pixels | true_class_pixels)
result[val] = intersection.sum()/union.sum()
result
{0: 0.0, 1: 0.38461538461538464, 2: 0.25}