Data visualisation

Data visualisation#

Pandas has some integration with matplotlib. It’s difficult to create advanced plots from pandas, but for purposes of instant visualisation it can be helpful.

plot#

pandas.DataFrame has a plot function that is only for line plots using variables from the dataframe. You can specify:

  • x variable name for x-axis, x-axis title will be the same;

  • y variable name for y-axis, y-axis title will be the same;

  • figsize to adjust the size of the diagram;

  • Many parameters used in the classic matplotlib.pyplot.plot function.

import numpy as np
import pandas as pd


x = np.arange(0, 10, 0.1)
df = pd.DataFrame({
    "x" : x, 'y':x*3 + np.random.normal(0, 1, len(x)),
})

ans = df.plot(
    x="x", y="y",
    figsize = (14,5),
    grid = True
)
../_images/420f981fd5dd5957a07f732276351f75804417e0cc87433891c1ce0370ea70a2.png

hist#

You can create a histogram based on the values of some pandas.Series from this object only.

The arguments are really close to matplotlib.hist except that:

  • figsize you can set figure size just from that function.

So in the following example, I use all these features to show the skewness of a normally distributed variable.

import numpy as np
import pandas as pd

vis_ser = pd.Series(np.random.normal(0, 1, 1000), name = "some variable")
ans = vis_ser.hist(bins = 20, figsize = (3, 3))
../_images/eda71e73d19406d711ccea96a35692a717ba94b1cdec4ee8bafb49bcfbdd865b.png