recall@k

recall@k#

\(recall_j@k\) gives a measure of how many of the relevant items are present in top \(k\) out of all the relevant items, where \(k\) is the number of recommendations generated for a \(j\)-th object. Or more formally:

\[recall_j@k = \frac{\sum_{i=1}^k r_{ij}}{\sum_{i=1}^n r_{ij}}\]

Where:

items are sorted according to their preference for \(j\)-th object for the model under consideration;
\(\sum_{i=1}^k r_{ij}\) - number of relevant items in first \(k\) items;
\(\sum_{i=1}^n r_{ij}\) - total number of relevant items for \(j\)-th object.

import numpy as np
import pandas as pd

import unittest
from IPython.display import HTML

R_frame = pd.read_parquet("example.parquet")

Consider specific#

Let’s examine a specific object to gain a clear understanding of the situation and calculate the recall at 3 (\(recall@3\)) for it. We will compare models to discern any differences. In the following cell, we have extracted a subframe for the specific object and sorted it based on the results from the models. The example has been selected to highlight the disparity in \(recall@3\) between the models:

k = 3
obj = 4

model1_tab = R_frame.loc[
    R_frame["object"] == obj,
    [
        "item",
        "relevant",
        "Random scores"
    ]
].sort_values(
    "Random scores", 
    ascending=False
).set_index("item")

model2_tab = R_frame.loc[
    R_frame["object"] == obj,
    [
        "item",
        "relevant",
        "KNN scores"
    ]
].sort_values(
    "KNN scores", 
    ascending=False
).set_index("item")

model1_recall = (
    model1_tab["relevant"].iloc[:k].sum()/
    model1_tab["relevant"].sum()
)
model2_recall = (
    model2_tab["relevant"].iloc[:k].sum()/
    model2_tab["relevant"].sum()
)

display(HTML(
    f"""
    <div style='display: flex;justify-content: space-around;'>
        <div>
            {model1_tab.to_html()}
            <p style='font-size:20px'>
                recall@{k} - {round(model1_recall*100,2)}%
            </p>
        </div>
        <div>
            {model2_tab.to_html()}
            <p style='font-size:20px'>
                recall@{k} - {round(model2_recall*100,2)}%
            </p>
        </div>
    </div>
    """
))

	relevant	Random scores
item
3	1	2.465325
18	1	1.985386
8	0	1.656717
25	0	1.614408
19	1	1.447166
4	0	1.383232
16	1	1.339926
2	1	1.236205
29	0	1.134973
6	0	1.022516
9	0	0.667890
24	0	0.377753
5	0	0.346233
28	0	0.332350
13	0	0.313831
7	1	0.166810
17	0	0.029310
22	1	-0.048041
15	0	-0.221793
10	1	-0.229947
20	1	-0.287629
27	1	-0.388728
23	1	-0.480787
0	1	-0.573113
12	0	-0.639963
26	0	-1.123104
11	0	-1.129551
14	1	-1.225836
1	0	-1.320448
21	0	-1.359311

recall@3 - 15.38%

	relevant	KNN scores
item
0	1	0.913773
14	1	0.792041
3	1	0.779723
20	1	0.737135
16	1	0.735866
8	0	0.654573
10	1	0.653200
29	0	0.648329
4	0	0.646239
27	1	0.643070
9	0	0.641759
23	1	0.631478
7	1	0.561225
2	1	0.561225
19	1	0.551094
18	1	0.548907
22	1	0.534151
25	0	0.465849
15	0	0.453531
26	0	0.440866
11	0	0.410943
1	0	0.360683
17	0	0.354840
13	0	0.354840
28	0	0.354639
6	0	0.352548
24	0	0.278986
5	0	0.250546
12	0	0.190521
21	0	0.172158

recall@3 - 23.08%

Python code#

There is a function that represents the realisation of \(recall@k\) in python.

def recall_k(relevance_array, pred_score, k):
    '''
    The calculation of recall@k is a metric that measures 
    the proportion of relevant items present within the top k 
    recommendations out of all relevant elements. It signifies 
    the ability to identify and include relevant items in the 
    initial recommendations.
    
    Parameters
    ----------
    relevance_array : numpy.array
        binary array marking observations that were relevant;
    pred_score : numpy.array
        predicted scores are expected to be 
        higher the more relevant item is.

    Returns
    ----------
    out : float
        realisation of the metric.
    '''
    if len(relevance_array)!=len(pred_score):
        raise ValueError(
            "`relevance_array` and `pred_score` must be the same size"
        )
    elif len(relevance_array) < k:
        raise ValueError(
            "k is greater than the number of observations"
        )
    
    relevant_in_k = np.sum(
        relevance_array[np.argsort(pred_score)[::-1]][:k]
    )
    relevant_total = np.sum(relevance_array)
    return relevant_in_k/relevant_total

Here is some unitests for function below:

class TestRecall(unittest.TestCase):
    def test_different_sizes(self):
        '''
        We must check that if the sizes of arrays with 
        relevance and prediction differ, an error must 
        be rased.
        '''
        with self.assertRaises(ValueError):
            recall_k(
                np.array([1, 1, 0]),
                np.array([0.3, 0.2, 0.3, 0.2]),
                1
            )

    def test_k_more_obs(self):
        '''
        K cannot be more than the number of observations 
        we are considering.
        '''
        with self.assertRaises(ValueError):
            recall_k(
                np.array([1, 1, 0, 0, 1]),
                np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
                10
            )
    
    def test_computions(self):
        '''
        Just basic test with known result
        '''
        real_ans = recall_k(
            np.array([1, 1, 0, 0, 1]),
            np.array([0.4, 0.1, 0.2, 0.5, 0.3]),
            3
        )
        exp_ans = 2/3
        self.assertAlmostEqual(real_ans, exp_ans, delta=0.000001)
ans = unittest.main(argv=[''], verbosity=2, exit=False)
del TestRecall

test_computions (__main__.TestRecall)
Just basic test with known result ... ok
test_different_sizes (__main__.TestRecall)
We must check that if the sizes of arrays with ... ok
test_k_more_obs (__main__.TestRecall)
K cannot be more than the number of observations ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.003s

OK

The following cell shows the code to calculate the recall for our example. We calculated it for each object, but then took the average.

show = R_frame.groupby("object").apply(
    lambda object: pd.Series({
        "recall for model 1" : recall_k(
            relevance_array=object["relevant"].to_numpy(),
            pred_score=object["Random scores"].to_numpy(),
            k=4
        ),
        "recall for model2" : recall_k(
            relevance_array=object["relevant"].to_numpy(),
            pred_score=object["KNN scores"].to_numpy(),
            k=4
        )
    }),
    include_groups=False
)
display(show)
display(show.mean().rename("mean value").to_frame().T)

	recall for model 1	recall for model2
object
0	0.000000	0.307692
1	0.190476	0.095238
2	0.062500	0.250000
3	0.058824	0.235294
4	0.153846	0.307692
5	0.153846	0.230769
6	0.166667	0.222222
7	0.125000	0.250000
8	0.153846	0.153846
9	0.105263	0.210526

	recall for model 1	recall for model2
mean value	0.117027	0.226328

recall@k

Contents

recall@k#

Consider specific#

Python code#