Metrics#
This section is for common metrics that have been designed or adapted specifically for recommendation systems.
This page describes how the task used as an example was generated.
import numpy as np
import pandas as pd
from IPython.display import HTML, Latex
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from surprise.prediction_algorithms.slope_one import SlopeOne
from surprise.model_selection import cross_validate
from surprise.dataset import Dataset
from surprise.reader import Reader
from surprise.prediction_algorithms.knns import KNNBasic
header_template = "<p style='font-size:17px'>{}</p>"
Sources
Article on TSD Evaluation Metrics for Recommendation Systems — An Overview;
Task#
The following cell has generated a taskt that we will use as an example. It is generated in the format <obtect/item> <-> relevance
.
All the sources I checked describe how to estimate the performance of the models in the case of binary output, where there are “relevant” and “non-relevant”. Despite the fact that in life often occur and I have met with tasks where the pair object/item put in correlation to non-binary values (ratings or even preferences expressed in spent money), for simplicity in the beginning let’s consider the classical variant. So we’ll have following definition:
Where:
\(n\) - number of the items under consideration;
\(m\) - number of the objects under consideration.
r_width = 10
r_height = 30
np.random.seed(10)
R, c = make_blobs(
n_samples=r_height,
n_features=r_width,
centers=3,
random_state=10,
cluster_std=5
)
R = np.round((R-R.min())/(R.max()-R.min())).astype(int)
# genrating combinations of object/item to be empty
combination_counts = 20
nan_combinations = np.concatenate(
[
np.random.randint(0, R.shape[0], [combination_counts,1]),
np.random.randint(0, R.shape[1], [combination_counts,1])
],
axis=1
)
R_frame = pd.Series(
R.ravel(),
index = pd.MultiIndex.from_tuples(
[
(j,i)
for j in np.arange(R.shape[1])
for i in np.arange(R.shape[0])
],
names = ["object", "item"]
),
name = "relevant"
).reset_index()
R_frame.sample(10)
object | item | relevant | |
---|---|---|---|
148 | 4 | 28 | 0 |
147 | 4 | 27 | 1 |
154 | 5 | 4 | 1 |
8 | 0 | 8 | 1 |
105 | 3 | 15 | 1 |
24 | 0 | 24 | 0 |
125 | 4 | 5 | 0 |
276 | 9 | 6 | 0 |
130 | 4 | 10 | 1 |
226 | 7 | 16 | 0 |
Solutions#
We need the results of some algorithms to calculate metrics for them. We’ll compare the results of the models:
Random model - just random scores for each item;
Model provided by
surprise.prediction_algorithms.knns.KNNBasic
.
reader = Reader(rating_scale=(0,1))
surp_dataset = Dataset.load_from_df(
R_frame[["object", "item", 'relevant']],
reader
)
my_data_set = surp_dataset.build_full_trainset()
np.random.seed(10)
R_frame["Random scores"] = np.random.normal(size=len(R_frame))
model = KNNBasic(k=50,verbose=False)
model = model.fit(my_data_set)
R_frame["KNN scores"] = R_frame[["object", "item"]].apply(
lambda row: model.predict(
row["object"], row["item"]
).est,
axis = 1
)
Now, for each object \(j=\overline{1,m}\), we have two arrays of scores: \(S_{1j}\) and \(S_{2j}\), generated by the first and second models, respectively. The array for the \(M\)-th model should be represented as \(S_{M,j} = \{s_{M,j,1}, s_{M,j,2}, ..., s_{M,j,n}\}\). If \(s_{M,j,t} > s_{M,j,k}\), it indicates that the \(t\)-th item is considered more relevant than the \(k\)-th item for object \(j\), according to the \(M\)-th model.
So now we can order items according to relevance by model. So we can define orders of the items:
Or, in simple words, in \(I_{M,j}\) items go in descending order of preference for the \(j\)-th object according to the \(M\)-th model.
We also introduce the sequence of real relevance of elements in accordance with the model.
The following cell displays \(I_{1,j}\), \(I_{2,j}\) and \(R'_{1,j}\), \(R'_{2,j}\) for selected \(j\).
object_ind = 8
temp_object = R_frame[R_frame["object"] == object_ind]
display(temp_object)
get_order = lambda scores_name: ",".join(
temp_object
.sort_values(scores_name, ascending=False)["item"]
.astype(str)
.to_list()
)
get_relevances_order = lambda scores_name: ", ".join(
temp_object
.sort_values(
scores_name,
ascending=False
)["relevant"]
.astype("str").to_list()
)
display(HTML(header_template.format("Random model")))
random_order = get_order("Random scores")
random_rel_order = get_relevances_order("Random scores")
display(Latex(f"$I_{{1,{object_ind}}}=\{{ {random_order} \}}$"))
display(Latex(f"$R'_{{1,{object_ind}}}=\{{ {random_rel_order} \}}$"))
display(HTML("<hr>"))
display(HTML(header_template.format("KNN model")))
KNN_order = get_order("KNN scores")
KNN_rel_order = get_relevances_order("KNN scores")
display(Latex(f"$I_{{2,{object_ind}}}=\{{ {KNN_order} \}}$"))
display(Latex(f"$R'_{{2,{object_ind}}}=\{{ {KNN_rel_order} \}}$"))
del object_ind, temp_object
object | item | relevant | Random scores | KNN scores | |
---|---|---|---|---|---|
240 | 8 | 0 | 1 | -2.017719 | 0.911927 |
241 | 8 | 1 | 0 | 0.540541 | 0.360036 |
242 | 8 | 2 | 1 | -1.442299 | 0.563100 |
243 | 8 | 3 | 0 | -1.608850 | 0.763004 |
244 | 8 | 4 | 0 | -1.006569 | 0.638666 |
245 | 8 | 5 | 0 | -0.257534 | 0.253009 |
246 | 8 | 6 | 0 | 0.730507 | 0.348827 |
247 | 8 | 7 | 1 | -1.698401 | 0.563100 |
248 | 8 | 8 | 1 | 1.674076 | 0.673895 |
249 | 8 | 9 | 0 | 1.163724 | 0.638666 |
250 | 8 | 10 | 1 | -0.132574 | 0.653047 |
251 | 8 | 11 | 1 | -0.290246 | 0.430756 |
252 | 8 | 12 | 0 | -0.953532 | 0.186026 |
253 | 8 | 13 | 0 | 0.588041 | 0.350962 |
254 | 8 | 14 | 0 | 0.068801 | 0.771011 |
255 | 8 | 15 | 0 | 1.412064 | 0.454776 |
256 | 8 | 16 | 1 | -0.686216 | 0.737111 |
257 | 8 | 17 | 0 | 0.547944 | 0.350962 |
258 | 8 | 18 | 1 | -0.036383 | 0.555094 |
259 | 8 | 19 | 1 | -0.847016 | 0.552959 |
260 | 8 | 20 | 1 | 1.902304 | 0.738984 |
261 | 8 | 21 | 0 | 0.279605 | 0.172943 |
262 | 8 | 22 | 1 | 0.620255 | 0.537218 |
263 | 8 | 23 | 1 | -1.068568 | 0.631957 |
264 | 8 | 24 | 0 | -0.722621 | 0.272497 |
265 | 8 | 25 | 0 | 0.084140 | 0.462782 |
266 | 8 | 26 | 0 | -0.584455 | 0.443032 |
267 | 8 | 27 | 1 | 0.602022 | 0.642905 |
268 | 8 | 28 | 0 | 0.438365 | 0.354960 |
269 | 8 | 29 | 0 | -0.782343 | 0.644799 |
Random model
KNN model
As you can see, according to the KNN model, relevant itmes are more likely to get higher scores.
R_frame.to_parquet("metrics/example.parquet")
NDCG#
\(NDCG@k\) is a popular way to measure ranking algorithms. Its formula is:
Where:
\(r_i\): relevance of the item that is \(i\)-th according to the range under estimation.
\(k\): the number of top relevant elements considered.
\(IDCG\) is “ideal” \(DCG\). It means it just like \(DCG\) but in case of ideal order, so it can be written as:
Here \(r'_j\) is reoder of the \(r_i\) according to the real relevance, so \(r'_j \geq r'_{j+1}\)
Consider the following example:
Suppose we have a ranking algorithm that produced the following relevance scores for 5 items:
\(DCG@3\) in this case will take value:
The ideal ranking in this case would take the form \(r_j = \left\{20, 10, 10, 7, 3\right\}\), so computing \(IDCG@3\) will look like:
Finally \(NDCG@3=24.1186/31.3039=0.7705\).