Surprise#
Surprise
is a Python library for building recommendation systems. Here are some basic ideas about working with this library.
import pandas as pd
from surprise import KNNWithMeans, Dataset, Reader
The following cell defines the matrix that will be used as an example on this page. Read more about relevance matrix and recommender systems ideas in the ranking task section.
data = {
"Item 1": [1, 0, None, 1, 4, 3],
"Item 2": [23, 2, 10, 8, 2, 10],
"Item 3": [None, 0, 11, 2, 15, 0],
"Item 4": [53, 4, None, 23, 3, None],
"Item 5": [50, 0, None, None, 4, 23],
}
users = pd.Index(
[
"User 1",
"User 2",
"User 3",
"User 4",
"User 5",
"User 6"
],
name = "Users"
)
R_matrix = pd.DataFrame(data, index=users)
R_matrix.columns.name = "Items"
R_matrix
Items | Item 1 | Item 2 | Item 3 | Item 4 | Item 5 |
---|---|---|---|---|---|
Users | |||||
User 1 | 1.0 | 23 | NaN | 53.0 | 50.0 |
User 2 | 0.0 | 2 | 0.0 | 4.0 | 0.0 |
User 3 | NaN | 10 | 11.0 | NaN | NaN |
User 4 | 1.0 | 8 | 2.0 | 23.0 | NaN |
User 5 | 4.0 | 2 | 15.0 | 3.0 | 4.0 |
User 6 | 3.0 | 10 | 0.0 | NaN | 23.0 |
To use with surprise, the matrix should be converted to a table with the format as in the following cell:
R_frame = R_matrix.stack().rename("ratings").reset_index()
R_frame
Users | Items | ratings | |
---|---|---|---|
0 | User 1 | Item 1 | 1.0 |
1 | User 1 | Item 2 | 23.0 |
2 | User 1 | Item 4 | 53.0 |
3 | User 1 | Item 5 | 50.0 |
4 | User 2 | Item 1 | 0.0 |
5 | User 2 | Item 2 | 2.0 |
6 | User 2 | Item 3 | 0.0 |
7 | User 2 | Item 4 | 4.0 |
8 | User 2 | Item 5 | 0.0 |
9 | User 3 | Item 2 | 10.0 |
10 | User 3 | Item 3 | 11.0 |
11 | User 4 | Item 1 | 1.0 |
12 | User 4 | Item 2 | 8.0 |
13 | User 4 | Item 3 | 2.0 |
14 | User 4 | Item 4 | 23.0 |
15 | User 5 | Item 1 | 4.0 |
16 | User 5 | Item 2 | 2.0 |
17 | User 5 | Item 3 | 15.0 |
18 | User 5 | Item 4 | 3.0 |
19 | User 5 | Item 5 | 4.0 |
20 | User 6 | Item 1 | 3.0 |
21 | User 6 | Item 2 | 10.0 |
22 | User 6 | Item 3 | 0.0 |
23 | User 6 | Item 5 | 23.0 |
But it also needs to be transformed into the surprise type dataset. It uses readers
which define details of how the data should be interpreted.
By default, the dataset should contain 3 columns in the following order: user id, item id and ratings.
The following cell performs such a transformation for the example considered:
reader = Reader(
rating_scale=(
R_matrix.min().min(),
R_matrix.max().max()
)
)
data_set = Dataset.load_from_df(
df=R_frame,
reader=reader
)
But that’s not enough for your first model. So you need to create a surprise.trainset.Trainset
instance.
train_set = data_set.build_full_trainset()
type(train_set)
surprise.trainset.Trainset
Such a model can now be fitted:
model = KNNWithMeans(k=2).fit(train_set)
Computing the msd similarity matrix...
Done computing similarity matrix.
To get a prediction, you need to specify user and item as defined in the user/item identifier columns. The prediction will be in a specific dtype, you need to refer to the est
field:
pred = model.predict(uid="User 6", iid="Item 1")
display(pred)
display(pred.est)
Prediction(uid='User 6', iid='Item 1', r_ui=None, est=2.75, details={'actual_k': 2, 'was_impossible': False})
2.75
Or to get predictions for all items for User 6
:
pd.Series(
[
model.predict(uid="User 6", iid=f"Item {i}").est
for i in range(1, 6)
],
index = R_matrix.columns,
name="estimations"
).to_frame()
estimations | |
---|---|
Items | |
Item 1 | 2.750000 |
Item 2 | 9.750000 |
Item 3 | 0.416667 |
Item 4 | 23.126198 |
Item 5 | 22.900328 |