Data selecting#
import pandas as pd
from random import shuffle
Loc slicing#
loc
in Pandas supports a slicing mechanism, i.e. all records between upper and lower bounds of the slice are selected in the order they appear in the dataframe/serices under consideration.
The following cell creates a pandas series with a categorial index - this allows us to show cases where slices are used in loc
.
sample_size = 10
ser = pd.Series(
[i for i in range(sample_size)],
index = [chr(i) for i in range(ord("a"), ord("a") + sample_size)]
)
ser
a 0
b 1
c 2
d 3
e 4
f 5
g 6
h 7
i 8
j 9
dtype: int64
The following cell uses slices with the categorical index.
ser.loc["c":"f"]
c 2
d 3
e 4
f 5
dtype: int64
Result is pretty easy to predict.
The following cell truncates data in the series.
shuffle(ser.index.values)
ser.index = ser.index.values
ser
g 0
b 1
d 2
e 3
c 4
i 5
j 6
h 7
f 8
a 9
dtype: int64
Next code just uses regular slice:
ser.loc["c":"f"]
c 4
i 5
j 6
h 7
f 8
dtype: int64
Slice logic doesn’t follow the order of the letters in the alphabet, but follow the order of the elements in the series.