# Surprise

[`Surprise`](https://surprise.readthedocs.io/en/stable/) is a Python library for building recommendation systems. Here are some basic ideas about working with this library.

In [1]:
import pandas as pd
from surprise import KNNWithMeans, Dataset, Reader

The following cell defines the matrix that will be used as an example on this page. Read more about relevance matrix and recommender systems ideas in the [ranking task section](../../data_science/ranking_task.md).

In [2]:
data = {
    "Item 1": [1, 0, None, 1, 4, 3],
    "Item 2": [23, 2, 10, 8, 2, 10],
    "Item 3": [None, 0, 11, 2, 15, 0],
    "Item 4": [53, 4, None, 23, 3, None],
    "Item 5": [50, 0, None, None, 4, 23], 
}
users = pd.Index(
    [
        "User 1", 
        "User 2", 
        "User 3", 
        "User 4", 
        "User 5",
        "User 6"
    ],
    name = "Users"
)

R_matrix = pd.DataFrame(data, index=users)
R_matrix.columns.name = "Items"
R_matrix

Items,Item 1,Item 2,Item 3,Item 4,Item 5
Users,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
User 1,1.0,23,,53.0,50.0
User 2,0.0,2,0.0,4.0,0.0
User 3,,10,11.0,,
User 4,1.0,8,2.0,23.0,
User 5,4.0,2,15.0,3.0,4.0
User 6,3.0,10,0.0,,23.0


To use with surprise, the matrix should be converted to a table with the format as in the following cell:

In [3]:
R_frame = R_matrix.stack().rename("ratings").reset_index()
R_frame

Unnamed: 0,Users,Items,ratings
0,User 1,Item 1,1.0
1,User 1,Item 2,23.0
2,User 1,Item 4,53.0
3,User 1,Item 5,50.0
4,User 2,Item 1,0.0
5,User 2,Item 2,2.0
6,User 2,Item 3,0.0
7,User 2,Item 4,4.0
8,User 2,Item 5,0.0
9,User 3,Item 2,10.0


But it also needs to be transformed into the surprise type dataset. It uses `readers` which define details of how the data should be interpreted.

By default, the dataset should contain 3 columns in the following order: user id, item id and ratings.

The following cell performs such a transformation for the example considered:

In [4]:
reader = Reader(
    rating_scale=(
        R_matrix.min().min(),
        R_matrix.max().max()
    )
)
data_set = Dataset.load_from_df(
    df=R_frame, 
    reader=reader
)

But that's not enough for your first model. So you need to create a `surprise.trainset.Trainset` instance.

In [5]:
train_set = data_set.build_full_trainset()
type(train_set)

surprise.trainset.Trainset

Such a model can now be fitted:

In [6]:
model = KNNWithMeans(k=2).fit(train_set)

Computing the msd similarity matrix...
Done computing similarity matrix.


To get a prediction, you need to specify user and item as defined in the user/item identifier columns. The prediction will be in a specific dtype, you need to refer to the `est` field:

In [11]:
pred = model.predict(uid="User 6", iid="Item 1")
display(pred)
display(pred.est)

Prediction(uid='User 6', iid='Item 1', r_ui=None, est=2.75, details={'actual_k': 2, 'was_impossible': False})

2.75

Or to get predictions for all items for `User 6`:

In [19]:
pd.Series(
    [
        model.predict(uid="User 6", iid=f"Item {i}").est
        for i in range(1, 6)
    ],
    index = R_matrix.columns,
    name="estimations"
).to_frame()

Unnamed: 0_level_0,estimations
Items,Unnamed: 1_level_1
Item 1,2.75
Item 2,9.75
Item 3,0.416667
Item 4,23.126198
Item 5,22.900328
