Metrics

Contents

Metrics#

This section is for common metrics that have been designed or adapted specifically for recommendation systems.

This page describes how the task used as an example was generated.

import numpy as np
import pandas as pd

from IPython.display import HTML, Latex
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

from surprise.prediction_algorithms.slope_one import SlopeOne
from surprise.model_selection import cross_validate
from surprise.dataset import Dataset
from surprise.reader import Reader
from surprise.prediction_algorithms.knns import KNNBasic

header_template = "<p style='font-size:17px'>{}</p>"

Sources

Task#

The following cell has generated a taskt that we will use as an example. It is generated in the format <obtect/item> <-> relevance.

All the sources I checked describe how to estimate the performance of the models in the case of binary output, where there are “relevant” and “non-relevant”. Despite the fact that in life often occur and I have met with tasks where the pair object/item put in correlation to non-binary values (ratings or even preferences expressed in spent money), for simplicity in the beginning let’s consider the classical variant. So we’ll have following definition:

\[\begin{split}r_{ij} =\begin{cases} 1 - \text{i-th item is relevant for j-th object}\\ 0 - \text{other case}. \end{cases}, i=\overline{1,n}, j=\overline{1,m}.\end{split}\]

Where:

  • \(n\) - number of the items under consideration;

  • \(m\) - number of the objects under consideration.

r_width = 10
r_height = 30
np.random.seed(10)

R, c = make_blobs(
    n_samples=r_height,
    n_features=r_width,
    centers=3,
    random_state=10,
    cluster_std=5
)
R = np.round((R-R.min())/(R.max()-R.min())).astype(int)

# genrating combinations of object/item to be empty
combination_counts = 20
nan_combinations = np.concatenate(
    [
        np.random.randint(0, R.shape[0], [combination_counts,1]),
        np.random.randint(0, R.shape[1], [combination_counts,1])
    ],
    axis=1
)

R_frame = pd.Series(
    R.ravel(),
    index = pd.MultiIndex.from_tuples(
            [
                (j,i) 
                for j in np.arange(R.shape[1]) 
                for i in np.arange(R.shape[0])
            ],
            names = ["object", "item"]
    ),
    name = "relevant"
).reset_index()

R_frame.sample(10)
object item relevant
148 4 28 0
147 4 27 1
154 5 4 1
8 0 8 1
105 3 15 1
24 0 24 0
125 4 5 0
276 9 6 0
130 4 10 1
226 7 16 0

Solutions#

We need the results of some algorithms to calculate metrics for them. We’ll compare the results of the models:

  • Random model - just random scores for each item;

  • Model provided by surprise.prediction_algorithms.knns.KNNBasic.

reader = Reader(rating_scale=(0,1))
surp_dataset = Dataset.load_from_df(
    R_frame[["object", "item", 'relevant']], 
    reader
)
my_data_set = surp_dataset.build_full_trainset()

np.random.seed(10)
R_frame["Random scores"] = np.random.normal(size=len(R_frame))

model = KNNBasic(k=50,verbose=False)
model = model.fit(my_data_set)
R_frame["KNN scores"] = R_frame[["object", "item"]].apply(
    lambda row: model.predict(
        row["object"], row["item"]
    ).est, 
    axis = 1
)

Now, for each object \(j=\overline{1,m}\), we have two arrays of scores: \(S_{1j}\) and \(S_{2j}\), generated by the first and second models, respectively. The array for the \(M\)-th model should be represented as \(S_{M,j} = \{s_{M,j,1}, s_{M,j,2}, ..., s_{M,j,n}\}\). If \(s_{M,j,t} > s_{M,j,k}\), it indicates that the \(t\)-th item is considered more relevant than the \(k\)-th item for object \(j\), according to the \(M\)-th model.

So now we can order items according to relevance by model. So we can define orders of the items:

\[I_{M,j}=\{i_1, i_2, ... , i_n\}: k<t \Leftrightarrow s_{M,j,i_k} > s_{M,j,i_t}.\]

Or, in simple words, in \(I_{M,j}\) items go in descending order of preference for the \(j\)-th object according to the \(M\)-th model.

We also introduce the sequence of real relevance of elements in accordance with the model.

\[R'_{M,j}=\{r_{i_1,j}, r_{i_2,j}, ..., r_{i_n,j}\}\]

The following cell displays \(I_{1,j}\), \(I_{2,j}\) and \(R'_{1,j}\), \(R'_{2,j}\) for selected \(j\).

object_ind = 8
temp_object = R_frame[R_frame["object"] == object_ind]
display(temp_object)

get_order = lambda scores_name: ",".join(
    temp_object
    .sort_values(scores_name, ascending=False)["item"]
    .astype(str)
    .to_list()
)
get_relevances_order = lambda scores_name: ", ".join(
    temp_object
    .sort_values(
        scores_name, 
        ascending=False
    )["relevant"]
    .astype("str").to_list()
)

display(HTML(header_template.format("Random model")))
random_order = get_order("Random scores")
random_rel_order = get_relevances_order("Random scores")
display(Latex(f"$I_{{1,{object_ind}}}=\{{ {random_order} \}}$"))
display(Latex(f"$R'_{{1,{object_ind}}}=\{{ {random_rel_order} \}}$"))

display(HTML("<hr>"))

display(HTML(header_template.format("KNN model")))
KNN_order = get_order("KNN scores")
KNN_rel_order = get_relevances_order("KNN scores")
display(Latex(f"$I_{{2,{object_ind}}}=\{{ {KNN_order} \}}$"))
display(Latex(f"$R'_{{2,{object_ind}}}=\{{ {KNN_rel_order} \}}$"))

del object_ind, temp_object
object item relevant Random scores KNN scores
240 8 0 1 -2.017719 0.911927
241 8 1 0 0.540541 0.360036
242 8 2 1 -1.442299 0.563100
243 8 3 0 -1.608850 0.763004
244 8 4 0 -1.006569 0.638666
245 8 5 0 -0.257534 0.253009
246 8 6 0 0.730507 0.348827
247 8 7 1 -1.698401 0.563100
248 8 8 1 1.674076 0.673895
249 8 9 0 1.163724 0.638666
250 8 10 1 -0.132574 0.653047
251 8 11 1 -0.290246 0.430756
252 8 12 0 -0.953532 0.186026
253 8 13 0 0.588041 0.350962
254 8 14 0 0.068801 0.771011
255 8 15 0 1.412064 0.454776
256 8 16 1 -0.686216 0.737111
257 8 17 0 0.547944 0.350962
258 8 18 1 -0.036383 0.555094
259 8 19 1 -0.847016 0.552959
260 8 20 1 1.902304 0.738984
261 8 21 0 0.279605 0.172943
262 8 22 1 0.620255 0.537218
263 8 23 1 -1.068568 0.631957
264 8 24 0 -0.722621 0.272497
265 8 25 0 0.084140 0.462782
266 8 26 0 -0.584455 0.443032
267 8 27 1 0.602022 0.642905
268 8 28 0 0.438365 0.354960
269 8 29 0 -0.782343 0.644799

Random model

\[I_{1,8}=\{ 20,8,15,9,6,22,27,13,17,1,28,21,25,14,18,10,5,11,26,16,24,29,19,12,4,23,2,3,7,0 \}\]
\[R'_{1,8}=\{ 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1 \}\]

KNN model

\[I_{2,8}=\{ 0,14,3,20,16,8,10,29,27,4,9,23,7,2,18,19,22,25,15,26,11,1,28,17,13,6,24,5,12,21 \}\]
\[R'_{2,8}=\{ 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 \}\]

As you can see, according to the KNN model, relevant itmes are more likely to get higher scores.

R_frame.to_parquet("metrics/example.parquet")

NDCG#

\(NDCG@k\) is a popular way to measure ranking algorithms. Its formula is:

\[NDCG@k=\frac{DCG@k}{IDCG@k}\]

Where:

\[DCG@k=\sum_{i=1}^k \frac{r_i}{log_2{(i+1)}}\]
  • \(r_i\): relevance of the item that is \(i\)-th according to the range under estimation.

  • \(k\): the number of top relevant elements considered.

\(IDCG\) is “ideal” \(DCG\). It means it just like \(DCG\) but in case of ideal order, so it can be written as:

\[IDCG@k=\sum_{j=1}^k \frac{r'_j}{log_2{(j+1)}}\]

Here \(r'_j\) is reoder of the \(r_i\) according to the real relevance, so \(r'_j \geq r'_{j+1}\)

Consider the following example:

Suppose we have a ranking algorithm that produced the following relevance scores for 5 items:

\[r_i = \left\{ 10, 20, 3, 7, 10 \right\}\]

\(DCG@3\) in this case will take value:

\[DCG@3 = \frac{10}{log_2 2} + \frac{20}{log_2 3} + \frac{3}{log_2 4} = 24.1186\]

The ideal ranking in this case would take the form \(r_j = \left\{20, 10, 10, 7, 3\right\}\), so computing \(IDCG@3\) will look like:

\[IDCG@3 = \frac{20}{log_2 2} + \frac{10}{log_2 3} + \frac{10}{log_2 4} = 31.3093\]

Finally \(NDCG@3=24.1186/31.3039=0.7705\).