Surprise

Surprise#

Surprise is a Python library for building recommendation systems. Here are some basic ideas about working with this library.

import pandas as pd
from surprise import KNNWithMeans, Dataset, Reader

The following cell defines the matrix that will be used as an example on this page. Read more about relevance matrix and recommender systems ideas in the ranking task section.

data = {
    "Item 1": [1, 0, None, 1, 4, 3],
    "Item 2": [23, 2, 10, 8, 2, 10],
    "Item 3": [None, 0, 11, 2, 15, 0],
    "Item 4": [53, 4, None, 23, 3, None],
    "Item 5": [50, 0, None, None, 4, 23], 
}
users = pd.Index(
    [
        "User 1", 
        "User 2", 
        "User 3", 
        "User 4", 
        "User 5",
        "User 6"
    ],
    name = "Users"
)

R_matrix = pd.DataFrame(data, index=users)
R_matrix.columns.name = "Items"
R_matrix
Items Item 1 Item 2 Item 3 Item 4 Item 5
Users
User 1 1.0 23 NaN 53.0 50.0
User 2 0.0 2 0.0 4.0 0.0
User 3 NaN 10 11.0 NaN NaN
User 4 1.0 8 2.0 23.0 NaN
User 5 4.0 2 15.0 3.0 4.0
User 6 3.0 10 0.0 NaN 23.0

To use with surprise, the matrix should be converted to a table with the format as in the following cell:

R_frame = R_matrix.stack().rename("ratings").reset_index()
R_frame
Users Items ratings
0 User 1 Item 1 1.0
1 User 1 Item 2 23.0
2 User 1 Item 4 53.0
3 User 1 Item 5 50.0
4 User 2 Item 1 0.0
5 User 2 Item 2 2.0
6 User 2 Item 3 0.0
7 User 2 Item 4 4.0
8 User 2 Item 5 0.0
9 User 3 Item 2 10.0
10 User 3 Item 3 11.0
11 User 4 Item 1 1.0
12 User 4 Item 2 8.0
13 User 4 Item 3 2.0
14 User 4 Item 4 23.0
15 User 5 Item 1 4.0
16 User 5 Item 2 2.0
17 User 5 Item 3 15.0
18 User 5 Item 4 3.0
19 User 5 Item 5 4.0
20 User 6 Item 1 3.0
21 User 6 Item 2 10.0
22 User 6 Item 3 0.0
23 User 6 Item 5 23.0

But it also needs to be transformed into the surprise type dataset. It uses readers which define details of how the data should be interpreted.

By default, the dataset should contain 3 columns in the following order: user id, item id and ratings.

The following cell performs such a transformation for the example considered:

reader = Reader(
    rating_scale=(
        R_matrix.min().min(),
        R_matrix.max().max()
    )
)
data_set = Dataset.load_from_df(
    df=R_frame, 
    reader=reader
)

But that’s not enough for your first model. So you need to create a surprise.trainset.Trainset instance.

train_set = data_set.build_full_trainset()
type(train_set)
surprise.trainset.Trainset

Such a model can now be fitted:

model = KNNWithMeans(k=2).fit(train_set)
Computing the msd similarity matrix...
Done computing similarity matrix.

To get a prediction, you need to specify user and item as defined in the user/item identifier columns. The prediction will be in a specific dtype, you need to refer to the est field:

pred = model.predict(uid="User 6", iid="Item 1")
display(pred)
display(pred.est)
Prediction(uid='User 6', iid='Item 1', r_ui=None, est=2.75, details={'actual_k': 2, 'was_impossible': False})
2.75

Or to get predictions for all items for User 6:

pd.Series(
    [
        model.predict(uid="User 6", iid=f"Item {i}").est
        for i in range(1, 6)
    ],
    index = R_matrix.columns,
    name="estimations"
).to_frame()
estimations
Items
Item 1 2.750000
Item 2 9.750000
Item 3 0.416667
Item 4 23.126198
Item 5 22.900328