Data selecting

Contents

Data selecting#

import pandas as pd
from random import shuffle

Loc slicing#

loc in Pandas supports a slicing mechanism, i.e. all records between upper and lower bounds of the slice are selected in the order they appear in the dataframe/serices under consideration.


The following cell creates a pandas series with a categorial index - this allows us to show cases where slices are used in loc.

sample_size = 10
ser = pd.Series(
    [i for i in range(sample_size)],
    index = [chr(i) for i in range(ord("a"), ord("a") + sample_size)]
)
ser
a    0
b    1
c    2
d    3
e    4
f    5
g    6
h    7
i    8
j    9
dtype: int64

The following cell uses slices with the categorical index.

ser.loc["c":"f"]
c    2
d    3
e    4
f    5
dtype: int64

Result is pretty easy to predict.

The following cell truncates data in the series.

shuffle(ser.index.values)
ser.index = ser.index.values
ser
g    0
b    1
d    2
e    3
c    4
i    5
j    6
h    7
f    8
a    9
dtype: int64

Next code just uses regular slice:

ser.loc["c":"f"]
c    4
i    5
j    6
h    7
f    8
dtype: int64

Slice logic doesn’t follow the order of the letters in the alphabet, but follow the order of the elements in the series.