precision@k

precision@k#

\(precision_j@k\) is a fraction of relevant elements in the first \(k\) recommendations for \(j\)-th object. Or more formally:

\[precision_j@k = \frac{\sum_{i=1}^k r_{ij}}{k}\]

Where:

  • items are sorted according to their preference for \(j\)-th object for the model under consideration;

  • \(\sum_{i=1}^k r_{ij}\) - number of relevant items in first \(k\) items for \(j\)-th object.

import numpy as np
import pandas as pd

import unittest
from IPython.display import HTML

R_frame = pd.read_parquet("example.parquet")

Consider specific#

Let’s examine a specific object to gain a clear understanding of the situation and calculate the recall at 3 (\(precision@3\)) for it. We will compare model 1 and model 2 to discern any differences. In the following cell, we have extracted a subframe for the specific object and sorted it based on the results from the models. The example has been selected to highlight the disparity in \(precision@3\) between the models:

k = 3
obj = 4

model1_tab = R_frame.loc[
    R_frame["object"] == obj,
    [
        "item",
        "relevant",
        "Random scores"
    ]
].sort_values(
    "Random scores", 
    ascending=False
).set_index("item")

model2_tab = R_frame.loc[
    R_frame["object"] == obj,
    [
        "item",
        "relevant",
        "KNN scores"
    ]
].sort_values(
    "KNN scores", 
    ascending=False
).set_index("item")

model1_recall = (
    model1_tab["relevant"].iloc[:k].mean()
)
model2_recall = (
    model2_tab["relevant"].iloc[:k].mean()
)

display(HTML(
    f"""
    <div style='display: flex;justify-content: space-around;'>
        <div>
            {model1_tab.to_html()}
            <p style='font-size:20px'>
                recall@{k} - {round(model1_recall*100,2)}%
            </p>
        </div>
        <div>
            {model2_tab.to_html()}
            <p style='font-size:20px'>
                recall@{k} - {round(model2_recall*100,2)}%
            </p>
        </div>
    </div>
    """
))
relevant Random scores
item
3 1 2.465325
18 1 1.985386
8 0 1.656717
25 0 1.614408
19 1 1.447166
4 0 1.383232
16 1 1.339926
2 1 1.236205
29 0 1.134973
6 0 1.022516
9 0 0.667890
24 0 0.377753
5 0 0.346233
28 0 0.332350
13 0 0.313831
7 1 0.166810
17 0 0.029310
22 1 -0.048041
15 0 -0.221793
10 1 -0.229947
20 1 -0.287629
27 1 -0.388728
23 1 -0.480787
0 1 -0.573113
12 0 -0.639963
26 0 -1.123104
11 0 -1.129551
14 1 -1.225836
1 0 -1.320448
21 0 -1.359311

recall@3 - 66.67%

relevant KNN scores
item
0 1 0.913773
14 1 0.792041
3 1 0.779723
20 1 0.737135
16 1 0.735866
8 0 0.654573
10 1 0.653200
29 0 0.648329
4 0 0.646239
27 1 0.643070
9 0 0.641759
23 1 0.631478
7 1 0.561225
2 1 0.561225
19 1 0.551094
18 1 0.548907
22 1 0.534151
25 0 0.465849
15 0 0.453531
26 0 0.440866
11 0 0.410943
1 0 0.360683
17 0 0.354840
13 0 0.354840
28 0 0.354639
6 0 0.352548
24 0 0.278986
5 0 0.250546
12 0 0.190521
21 0 0.172158

recall@3 - 100.0%

Python code#

There is a function that represents the realisation of \(precision@k\) in python.

def precision_k(relevance_array, pred_score, k):
    '''
    Calculation Precision@k. This is a metric used to 
    assess the accuracy of recommendations by calculating 
    the proportion of relevant items in the first k 
    recommendations out of all the items recommended. 
    It quantifies the precision and effectiveness of 
    the recommendation system in providing highly relevant 
    suggestions within the initial set of recommendations.

    Parameters
    ----------
    relevance_array : numpy.array
        binary array marking observations that were relevant;
    pred_score : numpy.array
        predicted scores are expected to be 
        higher the more relevant item is.

    Returns
    ----------
    out : float
        realisation of the metric.
    '''
    if len(relevance_array)!=len(pred_score):
        raise ValueError(
            "`relevance_array` and `pred_score` must be the same size"
        )
    elif len(relevance_array) < k:
        raise ValueError(
            "k is greater than the number of observations"
        )
    return np.mean(
        relevance_array[np.argsort(pred_score)[::-1]][:k]
    )

Here is some unitests for function below:

class TestPrecision(unittest.TestCase):
    def test_different_sizes(self):
        '''
        We must check that if the sizes of arrays with 
        relevance and prediction differ, an error must 
        be rased.
        '''
        with self.assertRaises(ValueError):
            precision_k(
                np.array([1, 1, 0]),
                np.array([0.3, 0.2, 0.3, 0.2]),
                1
            )

    def test_k_more_obs(self):
        '''
        K cannot be more than the number of observations 
        we are considering.
        '''
        with self.assertRaises(ValueError):
            precision_k(
                np.array([1, 1, 0, 0, 1]),
                np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
                10
            )
    
    def test_computions(self):
        '''
        Just basic test with known result
        '''
        real_ans = precision_k(
            np.array([1, 1, 0, 0, 1]),
            np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
            3
        )
        exp_ans = 2/3
        self.assertAlmostEqual(real_ans, exp_ans, delta=0.000001)

ans = unittest.main(argv=[''], verbosity=2, exit=False)
del TestPrecision
test_computions (__main__.TestPrecision)
Just basic test with known result ... ok
test_different_sizes (__main__.TestPrecision)
We must check that if the sizes of arrays with ... ok
test_k_more_obs (__main__.TestPrecision)
K cannot be more than the number of observations ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.004s

OK

The following cell shows the code to calculate the precision for our example. We calculated it for each object, but then took the average.

show = R_frame.groupby("object").apply(
    lambda object: pd.Series({
        "precision for model 1" : precision_k(
            relevance_array=object["relevant"].to_numpy(),
            pred_score=object["Random scores"].to_numpy(),
            k=4
        ),
        "precision for model2" : precision_k(
            relevance_array=object["relevant"].to_numpy(),
            pred_score=object["KNN scores"].to_numpy(),
            k=4
        )
    }),
    include_groups=False
)
display(show)
display(show.mean().rename("mean value").to_frame().T)
precision for model 1 precision for model2
object
0 0.00 1.00
1 1.00 0.50
2 0.25 1.00
3 0.25 1.00
4 0.50 1.00
5 0.50 0.75
6 0.75 1.00
7 0.50 1.00
8 0.50 0.50
9 0.50 1.00
precision for model 1 precision for model2
mean value 0.475 0.875