Seaborn

Seaborn#

seaborn is a wrapper around matplotlib that helps you to build more complex plots with far fewer lines of code.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import sklearn.datasets

Box plot#

The box plot is a kind of plot for which seaaborn is truly a game changer. It wraps up the most common configuration details into just one function: seaborn.boxplot. Check the seaborn.boxplot page in the oficial documentation.


The following cell generates a typical dataset that is supposed to be explained with boxplots.

X, y = sklearn.datasets.make_classification(
    n_samples=120, n_features=2, n_informative=2, n_redundant=0,
    n_classes=3, n_clusters_per_class=1, random_state=42
)
df = pd.DataFrame({
    "Value": X[:, 0],
    "Category": [f"C{i}" for i in y],
    "Subgroup": pd.cut(X[:, 1], bins=2, labels=["Low", "High"])
})

The following cell draws an ugly plot but, it shows the potential you can achieve with just one command.

sns.boxplot(
    data=df,
    x="Category",
    y="Value",
    hue="Subgroup",
    palette="Set2",
    width=0.2,
    fliersize=20,
    gap=8,
    linecolor="red"
)
plt.show()
../_images/1a2c3811b887788cddc64f81190a8fbf8b209eb040cce1f5595f084ad572bbcf.png

Fliers#

A really typicall issue when dealing with boxplots is poor representation of the general distribution due to a small number of ouliers that change the scale of the entire plot.

To prevent the representation of the fliers, use the showfliers=False argument.


The following cell generates a sample that generally follows some distribution, but there is a small number of fliers that ruin the scale representation the general distribution.

One close to the other are represented plots with removed fliers and not.

Show close to each other are represented plots with and without removed fliers.

sample = np.concatenate([
    np.random.normal(0, 1, size=500),
    np.random.normal(200, 10, size=10)
])
plt.subplot(121)
sns.boxplot(sample)
plt.subplot(122)
sns.boxplot(sample, showfliers=False)
plt.show()
../_images/741746355ba5d3d0e08a4f3e3a2cafaf42c13cfcc7f6edcf1409f80a2e4a75ed.png

Regression plot#

The sns.regplot function builds a scatter plot that also includes the linear model fitted to the given data.


The next code builds a linear regression-like dataset and applies sns.regplot to it.

sample_size = 200
x = np.random.normal(0, 5, size=sample_size)
y = 0.3*x + np.random.normal(0, 1, size=sample_size)
sns.regplot(x=x, y=y)
plt.show()
../_images/0ef21e04249ef9e09235cc11647b170970faae3043006013120722d3c4a99555.png

Alongside the regular scatter plot, there is a corresponding linear regression model with confidence intervals.