precision@k#
\(precision_j@k\) is a fraction of relevant elements in the first \(k\) recommendations for \(j\)-th object. Or more formally:
Where:
items are sorted according to their preference for \(j\)-th object for the model under consideration;
\(\sum_{i=1}^k r_{ij}\) - number of relevant items in first \(k\) items for \(j\)-th object.
import numpy as np
import pandas as pd
import unittest
from IPython.display import HTML
R_frame = pd.read_parquet("example.parquet")
Consider specific#
Let’s examine a specific object to gain a clear understanding of the situation and calculate the recall at 3 (\(precision@3\)) for it. We will compare model 1 and model 2 to discern any differences. In the following cell, we have extracted a subframe for the specific object and sorted it based on the results from the models. The example has been selected to highlight the disparity in \(precision@3\) between the models:
k = 3
obj = 4
model1_tab = R_frame.loc[
R_frame["object"] == obj,
[
"item",
"relevant",
"Random scores"
]
].sort_values(
"Random scores",
ascending=False
).set_index("item")
model2_tab = R_frame.loc[
R_frame["object"] == obj,
[
"item",
"relevant",
"KNN scores"
]
].sort_values(
"KNN scores",
ascending=False
).set_index("item")
model1_recall = (
model1_tab["relevant"].iloc[:k].mean()
)
model2_recall = (
model2_tab["relevant"].iloc[:k].mean()
)
display(HTML(
f"""
<div style='display: flex;justify-content: space-around;'>
<div>
{model1_tab.to_html()}
<p style='font-size:20px'>
recall@{k} - {round(model1_recall*100,2)}%
</p>
</div>
<div>
{model2_tab.to_html()}
<p style='font-size:20px'>
recall@{k} - {round(model2_recall*100,2)}%
</p>
</div>
</div>
"""
))
relevant | Random scores | |
---|---|---|
item | ||
3 | 1 | 2.465325 |
18 | 1 | 1.985386 |
8 | 0 | 1.656717 |
25 | 0 | 1.614408 |
19 | 1 | 1.447166 |
4 | 0 | 1.383232 |
16 | 1 | 1.339926 |
2 | 1 | 1.236205 |
29 | 0 | 1.134973 |
6 | 0 | 1.022516 |
9 | 0 | 0.667890 |
24 | 0 | 0.377753 |
5 | 0 | 0.346233 |
28 | 0 | 0.332350 |
13 | 0 | 0.313831 |
7 | 1 | 0.166810 |
17 | 0 | 0.029310 |
22 | 1 | -0.048041 |
15 | 0 | -0.221793 |
10 | 1 | -0.229947 |
20 | 1 | -0.287629 |
27 | 1 | -0.388728 |
23 | 1 | -0.480787 |
0 | 1 | -0.573113 |
12 | 0 | -0.639963 |
26 | 0 | -1.123104 |
11 | 0 | -1.129551 |
14 | 1 | -1.225836 |
1 | 0 | -1.320448 |
21 | 0 | -1.359311 |
recall@3 - 66.67%
relevant | KNN scores | |
---|---|---|
item | ||
0 | 1 | 0.913773 |
14 | 1 | 0.792041 |
3 | 1 | 0.779723 |
20 | 1 | 0.737135 |
16 | 1 | 0.735866 |
8 | 0 | 0.654573 |
10 | 1 | 0.653200 |
29 | 0 | 0.648329 |
4 | 0 | 0.646239 |
27 | 1 | 0.643070 |
9 | 0 | 0.641759 |
23 | 1 | 0.631478 |
7 | 1 | 0.561225 |
2 | 1 | 0.561225 |
19 | 1 | 0.551094 |
18 | 1 | 0.548907 |
22 | 1 | 0.534151 |
25 | 0 | 0.465849 |
15 | 0 | 0.453531 |
26 | 0 | 0.440866 |
11 | 0 | 0.410943 |
1 | 0 | 0.360683 |
17 | 0 | 0.354840 |
13 | 0 | 0.354840 |
28 | 0 | 0.354639 |
6 | 0 | 0.352548 |
24 | 0 | 0.278986 |
5 | 0 | 0.250546 |
12 | 0 | 0.190521 |
21 | 0 | 0.172158 |
recall@3 - 100.0%
Python code#
There is a function that represents the realisation of \(precision@k\) in python.
def precision_k(relevance_array, pred_score, k):
'''
Calculation Precision@k. This is a metric used to
assess the accuracy of recommendations by calculating
the proportion of relevant items in the first k
recommendations out of all the items recommended.
It quantifies the precision and effectiveness of
the recommendation system in providing highly relevant
suggestions within the initial set of recommendations.
Parameters
----------
relevance_array : numpy.array
binary array marking observations that were relevant;
pred_score : numpy.array
predicted scores are expected to be
higher the more relevant item is.
Returns
----------
out : float
realisation of the metric.
'''
if len(relevance_array)!=len(pred_score):
raise ValueError(
"`relevance_array` and `pred_score` must be the same size"
)
elif len(relevance_array) < k:
raise ValueError(
"k is greater than the number of observations"
)
return np.mean(
relevance_array[np.argsort(pred_score)[::-1]][:k]
)
Here is some unitests for function below:
class TestPrecision(unittest.TestCase):
def test_different_sizes(self):
'''
We must check that if the sizes of arrays with
relevance and prediction differ, an error must
be rased.
'''
with self.assertRaises(ValueError):
precision_k(
np.array([1, 1, 0]),
np.array([0.3, 0.2, 0.3, 0.2]),
1
)
def test_k_more_obs(self):
'''
K cannot be more than the number of observations
we are considering.
'''
with self.assertRaises(ValueError):
precision_k(
np.array([1, 1, 0, 0, 1]),
np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
10
)
def test_computions(self):
'''
Just basic test with known result
'''
real_ans = precision_k(
np.array([1, 1, 0, 0, 1]),
np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
3
)
exp_ans = 2/3
self.assertAlmostEqual(real_ans, exp_ans, delta=0.000001)
ans = unittest.main(argv=[''], verbosity=2, exit=False)
del TestPrecision
test_computions (__main__.TestPrecision)
Just basic test with known result ... ok
test_different_sizes (__main__.TestPrecision)
We must check that if the sizes of arrays with ... ok
test_k_more_obs (__main__.TestPrecision)
K cannot be more than the number of observations ... ok
----------------------------------------------------------------------
Ran 3 tests in 0.004s
OK
The following cell shows the code to calculate the precision for our example. We calculated it for each object, but then took the average.
show = R_frame.groupby("object").apply(
lambda object: pd.Series({
"precision for model 1" : precision_k(
relevance_array=object["relevant"].to_numpy(),
pred_score=object["Random scores"].to_numpy(),
k=4
),
"precision for model2" : precision_k(
relevance_array=object["relevant"].to_numpy(),
pred_score=object["KNN scores"].to_numpy(),
k=4
)
}),
include_groups=False
)
display(show)
display(show.mean().rename("mean value").to_frame().T)
precision for model 1 | precision for model2 | |
---|---|---|
object | ||
0 | 0.00 | 1.00 |
1 | 1.00 | 0.50 |
2 | 0.25 | 1.00 |
3 | 0.25 | 1.00 |
4 | 0.50 | 1.00 |
5 | 0.50 | 0.75 |
6 | 0.75 | 1.00 |
7 | 0.50 | 1.00 |
8 | 0.50 | 0.50 |
9 | 0.50 | 1.00 |
precision for model 1 | precision for model2 | |
---|---|---|
mean value | 0.475 | 0.875 |