precision@k

precision@k#

\(precision_j@k\) is a fraction of relevant elements in the first \(k\) recommendations for \(j\)-th object. Or more formally:

\[precision_j@k = \frac{\sum_{i=1}^k r_{ij}}{k}\]

Where:

items are sorted according to their preference for \(j\)-th object for the model under consideration;
\(\sum_{i=1}^k r_{ij}\) - number of relevant items in first \(k\) items for \(j\)-th object.

import numpy as np
import pandas as pd

import unittest
from IPython.display import HTML

R_frame = pd.read_parquet("example.parquet")

Consider specific#

Let’s examine a specific object to gain a clear understanding of the situation and calculate the recall at 3 (\(precision@3\)) for it. We will compare model 1 and model 2 to discern any differences. In the following cell, we have extracted a subframe for the specific object and sorted it based on the results from the models. The example has been selected to highlight the disparity in \(precision@3\) between the models:

k = 3
obj = 4

model1_tab = R_frame.loc[
    R_frame["object"] == obj,
    [
        "item",
        "relevant",
        "Random scores"
    ]
].sort_values(
    "Random scores", 
    ascending=False
).set_index("item")

model2_tab = R_frame.loc[
    R_frame["object"] == obj,
    [
        "item",
        "relevant",
        "KNN scores"
    ]
].sort_values(
    "KNN scores", 
    ascending=False
).set_index("item")

model1_recall = (
    model1_tab["relevant"].iloc[:k].mean()
)
model2_recall = (
    model2_tab["relevant"].iloc[:k].mean()
)

display(HTML(
    f"""
    <div style='display: flex;justify-content: space-around;'>
        <div>
            {model1_tab.to_html()}
            <p style='font-size:20px'>
                recall@{k} - {round(model1_recall*100,2)}%
            </p>
        </div>
        <div>
            {model2_tab.to_html()}
            <p style='font-size:20px'>
                recall@{k} - {round(model2_recall*100,2)}%
            </p>
        </div>
    </div>
    """
))

	relevant	Random scores
item
3	1	2.465325
18	1	1.985386
8	0	1.656717
25	0	1.614408
19	1	1.447166
4	0	1.383232
16	1	1.339926
2	1	1.236205
29	0	1.134973
6	0	1.022516
9	0	0.667890
24	0	0.377753
5	0	0.346233
28	0	0.332350
13	0	0.313831
7	1	0.166810
17	0	0.029310
22	1	-0.048041
15	0	-0.221793
10	1	-0.229947
20	1	-0.287629
27	1	-0.388728
23	1	-0.480787
0	1	-0.573113
12	0	-0.639963
26	0	-1.123104
11	0	-1.129551
14	1	-1.225836
1	0	-1.320448
21	0	-1.359311

recall@3 - 66.67%

	relevant	KNN scores
item
0	1	0.913773
14	1	0.792041
3	1	0.779723
20	1	0.737135
16	1	0.735866
8	0	0.654573
10	1	0.653200
29	0	0.648329
4	0	0.646239
27	1	0.643070
9	0	0.641759
23	1	0.631478
7	1	0.561225
2	1	0.561225
19	1	0.551094
18	1	0.548907
22	1	0.534151
25	0	0.465849
15	0	0.453531
26	0	0.440866
11	0	0.410943
1	0	0.360683
17	0	0.354840
13	0	0.354840
28	0	0.354639
6	0	0.352548
24	0	0.278986
5	0	0.250546
12	0	0.190521
21	0	0.172158

recall@3 - 100.0%

Python code#

There is a function that represents the realisation of \(precision@k\) in python.

def precision_k(relevance_array, pred_score, k):
    '''
    Calculation Precision@k. This is a metric used to 
    assess the accuracy of recommendations by calculating 
    the proportion of relevant items in the first k 
    recommendations out of all the items recommended. 
    It quantifies the precision and effectiveness of 
    the recommendation system in providing highly relevant 
    suggestions within the initial set of recommendations.

    Parameters
    ----------
    relevance_array : numpy.array
        binary array marking observations that were relevant;
    pred_score : numpy.array
        predicted scores are expected to be 
        higher the more relevant item is.

    Returns
    ----------
    out : float
        realisation of the metric.
    '''
    if len(relevance_array)!=len(pred_score):
        raise ValueError(
            "`relevance_array` and `pred_score` must be the same size"
        )
    elif len(relevance_array) < k:
        raise ValueError(
            "k is greater than the number of observations"
        )
    return np.mean(
        relevance_array[np.argsort(pred_score)[::-1]][:k]
    )

Here is some unitests for function below:

class TestPrecision(unittest.TestCase):
    def test_different_sizes(self):
        '''
        We must check that if the sizes of arrays with 
        relevance and prediction differ, an error must 
        be rased.
        '''
        with self.assertRaises(ValueError):
            precision_k(
                np.array([1, 1, 0]),
                np.array([0.3, 0.2, 0.3, 0.2]),
                1
            )

    def test_k_more_obs(self):
        '''
        K cannot be more than the number of observations 
        we are considering.
        '''
        with self.assertRaises(ValueError):
            precision_k(
                np.array([1, 1, 0, 0, 1]),
                np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
                10
            )
    
    def test_computions(self):
        '''
        Just basic test with known result
        '''
        real_ans = precision_k(
            np.array([1, 1, 0, 0, 1]),
            np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
            3
        )
        exp_ans = 2/3
        self.assertAlmostEqual(real_ans, exp_ans, delta=0.000001)

ans = unittest.main(argv=[''], verbosity=2, exit=False)
del TestPrecision

test_computions (__main__.TestPrecision)
Just basic test with known result ... ok
test_different_sizes (__main__.TestPrecision)
We must check that if the sizes of arrays with ... ok
test_k_more_obs (__main__.TestPrecision)
K cannot be more than the number of observations ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.004s

OK

The following cell shows the code to calculate the precision for our example. We calculated it for each object, but then took the average.

show = R_frame.groupby("object").apply(
    lambda object: pd.Series({
        "precision for model 1" : precision_k(
            relevance_array=object["relevant"].to_numpy(),
            pred_score=object["Random scores"].to_numpy(),
            k=4
        ),
        "precision for model2" : precision_k(
            relevance_array=object["relevant"].to_numpy(),
            pred_score=object["KNN scores"].to_numpy(),
            k=4
        )
    }),
    include_groups=False
)
display(show)
display(show.mean().rename("mean value").to_frame().T)

	precision for model 1	precision for model2
object
0	0.00	1.00
1	1.00	0.50
2	0.25	1.00
3	0.25	1.00
4	0.50	1.00
5	0.50	0.75
6	0.75	1.00
7	0.50	1.00
8	0.50	0.50
9	0.50	1.00

	precision for model 1	precision for model2
mean value	0.475	0.875

precision@k

Contents

precision@k#

Consider specific#

Python code#