Array splitting

Array splitting#

Here we consider numpy functions to split arrays.

import numpy as np

split#

Read more in the official documentation.

The following example shows 2 splits for 20 observations.

arr = np.random.normal(0, 1, [20, 3])
res = np.split(arr, 2)
for i, r in enumerate(res):
    print(f"{i+1} with {r.shape[0]} rows")
1 with 10 rows
2 with 10 rows

The same example but with 10 splits.

arr = np.random.normal(0, 1, [20, 3])
res = np.split(arr, 10)
for i, r in enumerate(res):
    print(f"{i+1} with {r.shape[0]} rows")
1 with 2 rows
2 with 2 rows
3 with 2 rows
4 with 2 rows
5 with 2 rows
6 with 2 rows
7 with 2 rows
8 with 2 rows
9 with 2 rows
10 with 2 rows

Exceptions#

There are some cases where this won’t work:

  • Dividing an array into a number of parts that do not equal the input array.

  • Dividing an array into more parts than there are observations in your array - technically, it’s the partial option of the previous point, but I have to mention it.

Here is an example of dividing 20 rows into the 3 subarrays.

try:
    arr = np.random.normal(0, 1, [20, 3])
    res = np.split(arr, 3)
except Exception as e:
    print("We got the error:", e)
We got the error: array split does not result in an equal division

And the same error if we are trying to divide 3 rows into 50 subarrays.

try:
    arr = np.random.normal(0, 1, [3, 3])
    res = np.split(arr, 50)
except Exception as e:
    print("We got the error:", e)
We got the error: array split does not result in an equal division

array_split#

Read more in official doucmentation. The main feature of this function is that it allows you not to worry about the size of the input array. So we define \(l\) the number of elements in the array we want to split and \(n\) the number of result arrays.

So in result will be:

  • First \((l \mod n)\) arrays will have size \(\lfloor \frac{l}{n} \rfloor + 1\).

  • And other \(1-(l \mod n)\) arrays will have size \(\lfloor \frac{l}{n} \rfloor\).

Suppose we have \(l=20; n=3\). So we will have \(20 \mod 3=2\) subarrays of size \(\lfloor \frac{l}{n} \rfloor + 1= \lfloor \frac{20}{3} \rfloor + 1=7\) and the other of size \(\lfloor \frac{20}{3} \rfloor = 6\). The same result in the code:

arr = np.random.normal(0, 1, [20, 3])
res = np.array_split(arr, 3)
for i, r in enumerate(res):
    print(f"{i+1} with {r.shape[0]} rows")
1 with 7 rows
2 with 7 rows
3 with 6 rows

Suppose we have \(l=40; n=3\). So we will have \(40 \mod 3=1\) subarrays of size \(\lfloor \frac{l}{n} \rfloor + 1= \lfloor \frac{40}{3} \rfloor + 1=14\) and the other of size \(\lfloor \frac{40}{3} \rfloor = 13\). The same result in the code:

arr = np.random.normal(0, 1, [40, 3])
res = np.array_split(arr, 3)
for i, r in enumerate(res):
    print(f"{i+1} with {r.shape[0]} rows")
1 with 14 rows
2 with 13 rows
3 with 13 rows

And actually cases where \(l<n\) will all follow the same rule:

arr = np.random.normal(0, 1, [20, 3])
res = np.array_split(arr, 50)
for i, r in enumerate(res):
    print(f"{i+1} with {r.shape[0]} rows")
1 with 1 rows
2 with 1 rows
3 with 1 rows
4 with 1 rows
5 with 1 rows
6 with 1 rows
7 with 1 rows
8 with 1 rows
9 with 1 rows
10 with 1 rows
11 with 1 rows
12 with 1 rows
13 with 1 rows
14 with 1 rows
15 with 1 rows
16 with 1 rows
17 with 1 rows
18 with 1 rows
19 with 1 rows
20 with 1 rows
21 with 0 rows
22 with 0 rows
23 with 0 rows
24 with 0 rows
25 with 0 rows
26 with 0 rows
27 with 0 rows
28 with 0 rows
29 with 0 rows
30 with 0 rows
31 with 0 rows
32 with 0 rows
33 with 0 rows
34 with 0 rows
35 with 0 rows
36 with 0 rows
37 with 0 rows
38 with 0 rows
39 with 0 rows
40 with 0 rows
41 with 0 rows
42 with 0 rows
43 with 0 rows
44 with 0 rows
45 with 0 rows
46 with 0 rows
47 with 0 rows
48 with 0 rows
49 with 0 rows
50 with 0 rows