recall@k#
\(recall_j@k\) gives a measure of how many of the relevant items are present in top \(k\) out of all the relevant items, where \(k\) is the number of recommendations generated for a \(j\)-th object. Or more formally:
Where:
items are sorted according to their preference for \(j\)-th object for the model under consideration;
\(\sum_{i=1}^k r_{ij}\) - number of relevant items in first \(k\) items;
\(\sum_{i=1}^n r_{ij}\) - total number of relevant items for \(j\)-th object.
import numpy as np
import pandas as pd
import unittest
from IPython.display import HTML
R_frame = pd.read_parquet("example.parquet")
Consider specific#
Let’s examine a specific object to gain a clear understanding of the situation and calculate the recall at 3 (\(recall@3\)) for it. We will compare models to discern any differences. In the following cell, we have extracted a subframe for the specific object and sorted it based on the results from the models. The example has been selected to highlight the disparity in \(recall@3\) between the models:
k = 3
obj = 4
model1_tab = R_frame.loc[
R_frame["object"] == obj,
[
"item",
"relevant",
"Random scores"
]
].sort_values(
"Random scores",
ascending=False
).set_index("item")
model2_tab = R_frame.loc[
R_frame["object"] == obj,
[
"item",
"relevant",
"KNN scores"
]
].sort_values(
"KNN scores",
ascending=False
).set_index("item")
model1_recall = (
model1_tab["relevant"].iloc[:k].sum()/
model1_tab["relevant"].sum()
)
model2_recall = (
model2_tab["relevant"].iloc[:k].sum()/
model2_tab["relevant"].sum()
)
display(HTML(
f"""
<div style='display: flex;justify-content: space-around;'>
<div>
{model1_tab.to_html()}
<p style='font-size:20px'>
recall@{k} - {round(model1_recall*100,2)}%
</p>
</div>
<div>
{model2_tab.to_html()}
<p style='font-size:20px'>
recall@{k} - {round(model2_recall*100,2)}%
</p>
</div>
</div>
"""
))
relevant | Random scores | |
---|---|---|
item | ||
3 | 1 | 2.465325 |
18 | 1 | 1.985386 |
8 | 0 | 1.656717 |
25 | 0 | 1.614408 |
19 | 1 | 1.447166 |
4 | 0 | 1.383232 |
16 | 1 | 1.339926 |
2 | 1 | 1.236205 |
29 | 0 | 1.134973 |
6 | 0 | 1.022516 |
9 | 0 | 0.667890 |
24 | 0 | 0.377753 |
5 | 0 | 0.346233 |
28 | 0 | 0.332350 |
13 | 0 | 0.313831 |
7 | 1 | 0.166810 |
17 | 0 | 0.029310 |
22 | 1 | -0.048041 |
15 | 0 | -0.221793 |
10 | 1 | -0.229947 |
20 | 1 | -0.287629 |
27 | 1 | -0.388728 |
23 | 1 | -0.480787 |
0 | 1 | -0.573113 |
12 | 0 | -0.639963 |
26 | 0 | -1.123104 |
11 | 0 | -1.129551 |
14 | 1 | -1.225836 |
1 | 0 | -1.320448 |
21 | 0 | -1.359311 |
recall@3 - 15.38%
relevant | KNN scores | |
---|---|---|
item | ||
0 | 1 | 0.913773 |
14 | 1 | 0.792041 |
3 | 1 | 0.779723 |
20 | 1 | 0.737135 |
16 | 1 | 0.735866 |
8 | 0 | 0.654573 |
10 | 1 | 0.653200 |
29 | 0 | 0.648329 |
4 | 0 | 0.646239 |
27 | 1 | 0.643070 |
9 | 0 | 0.641759 |
23 | 1 | 0.631478 |
7 | 1 | 0.561225 |
2 | 1 | 0.561225 |
19 | 1 | 0.551094 |
18 | 1 | 0.548907 |
22 | 1 | 0.534151 |
25 | 0 | 0.465849 |
15 | 0 | 0.453531 |
26 | 0 | 0.440866 |
11 | 0 | 0.410943 |
1 | 0 | 0.360683 |
17 | 0 | 0.354840 |
13 | 0 | 0.354840 |
28 | 0 | 0.354639 |
6 | 0 | 0.352548 |
24 | 0 | 0.278986 |
5 | 0 | 0.250546 |
12 | 0 | 0.190521 |
21 | 0 | 0.172158 |
recall@3 - 23.08%
Python code#
There is a function that represents the realisation of \(recall@k\) in python.
def recall_k(relevance_array, pred_score, k):
'''
The calculation of recall@k is a metric that measures
the proportion of relevant items present within the top k
recommendations out of all relevant elements. It signifies
the ability to identify and include relevant items in the
initial recommendations.
Parameters
----------
relevance_array : numpy.array
binary array marking observations that were relevant;
pred_score : numpy.array
predicted scores are expected to be
higher the more relevant item is.
Returns
----------
out : float
realisation of the metric.
'''
if len(relevance_array)!=len(pred_score):
raise ValueError(
"`relevance_array` and `pred_score` must be the same size"
)
elif len(relevance_array) < k:
raise ValueError(
"k is greater than the number of observations"
)
relevant_in_k = np.sum(
relevance_array[np.argsort(pred_score)[::-1]][:k]
)
relevant_total = np.sum(relevance_array)
return relevant_in_k/relevant_total
Here is some unitests for function below:
class TestRecall(unittest.TestCase):
def test_different_sizes(self):
'''
We must check that if the sizes of arrays with
relevance and prediction differ, an error must
be rased.
'''
with self.assertRaises(ValueError):
recall_k(
np.array([1, 1, 0]),
np.array([0.3, 0.2, 0.3, 0.2]),
1
)
def test_k_more_obs(self):
'''
K cannot be more than the number of observations
we are considering.
'''
with self.assertRaises(ValueError):
recall_k(
np.array([1, 1, 0, 0, 1]),
np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
10
)
def test_computions(self):
'''
Just basic test with known result
'''
real_ans = recall_k(
np.array([1, 1, 0, 0, 1]),
np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
3
)
exp_ans = 2/3
self.assertAlmostEqual(real_ans, exp_ans, delta=0.000001)
ans = unittest.main(argv=[''], verbosity=2, exit=False)
del TestRecall
test_computions (__main__.TestRecall)
Just basic test with known result ... ok
test_different_sizes (__main__.TestRecall)
We must check that if the sizes of arrays with ... ok
test_k_more_obs (__main__.TestRecall)
K cannot be more than the number of observations ... ok
----------------------------------------------------------------------
Ran 3 tests in 0.003s
OK
The following cell shows the code to calculate the recall for our example. We calculated it for each object, but then took the average.
show = R_frame.groupby("object").apply(
lambda object: pd.Series({
"recall for model 1" : recall_k(
relevance_array=object["relevant"].to_numpy(),
pred_score=object["Random scores"].to_numpy(),
k=4
),
"recall for model2" : recall_k(
relevance_array=object["relevant"].to_numpy(),
pred_score=object["KNN scores"].to_numpy(),
k=4
)
}),
include_groups=False
)
display(show)
display(show.mean().rename("mean value").to_frame().T)
recall for model 1 | recall for model2 | |
---|---|---|
object | ||
0 | 0.000000 | 0.307692 |
1 | 0.190476 | 0.095238 |
2 | 0.062500 | 0.250000 |
3 | 0.058824 | 0.235294 |
4 | 0.153846 | 0.307692 |
5 | 0.153846 | 0.230769 |
6 | 0.166667 | 0.222222 |
7 | 0.125000 | 0.250000 |
8 | 0.153846 | 0.153846 |
9 | 0.105263 | 0.210526 |
recall for model 1 | recall for model2 | |
---|---|---|
mean value | 0.117027 | 0.226328 |