Intro#
This page introduces to the algorithms and data structures section.
Prefix sum#
Prefix sum for sequence \(x_1, x_2, \ldots, x_n\) is a sequence \(y_1, y_2, \ldots, y_{n+1}\).
Where: \(y_i = \sum^{i-1}_{j=1} x_j, i = \overline{1, n+1}\)
From this the statement \(y_i = y_{i-1} + x_{i-1}\) will be correct.
The following cell shows the prefix sum idea in table format.
\(i\) |
\(1\) |
\(2\) |
\(3\) |
\(4\) |
\(\ldots\) |
\(n\) |
\(n+1\) |
---|---|---|---|---|---|---|---|
\(x_i\) |
\(x_1\) |
\(x_2\) |
\(x_3\) |
\(x_4\) |
\(\ldots\) |
\(x_n\) |
|
\(y_i\) |
\(0\) |
\(x_1\) |
\(x_1 + x_2 = y_2 + x_2\) |
\(x_1 + x_2 + x_3 = y_3 + x_3\) |
\(\ldots\) |
\(\sum^{n-1}_{j=1} x_j = y_{n-1} + x_{n-1}\) |
\(\sum^{n}_{j=1} x_j = y_n + x_n\) |
The prefix sum can be computed from the original sequence in \(O(n)\) time.
The following cell illustrates the implementation of the counting prefix sum.
def count_prefix_sum(seq: list[float | int]):
ans: list[float | int] = [0]
for x in seq:
ans.append(ans[-1] + x)
return ans
count_prefix_sum([1, 2, 3, 4])
[0, 1, 3, 6, 10]
Sum of subset#
Calculate the sum of the elements of the sequence from the \(a\)-th to the \(b\)-th position using formula:
If you have the prefix sum precomputed, you can compute the sum of a subset with complexity of \(O(1)\).
To prove this just describe the difference \(y_{b+1} - y_a\):
Transformation#
The prefix sum is a really useful data structure. To fulfill its potential, you must apply a transformation to the original set - \(f(x)\).
For example, if the task requires counting the number of elements \(x_i > k\) for \(i=\overline{a,b}\), then cae apply the following transformation:
Count and use the prefix sum for the transformed sequence.
Two pointers#
In the two-pointers technique uses two indexes (pointers) that access the elements of an array during the same iteration.
Example of task:
There is a sorted array of numbers. You need to count all pairs of the elements, \((a,b)\), for which the statemene \(a - b > D\) is true.
The following cell shows the approach to solve this using the two pointers technique.
def find_pair(lst: list[int], D: int):
l, r = 0, 1
count = 0
lst_len = len(lst)
while True:
if r >= lst_len:
break
if (lst[r] - lst[l] > D) and (l < r):
count += (lst_len - r)
l += 1
else:
r += 1
return count
The idea is to compare items under indices \(l\) and \(r\).
If \(x_r - x_l > D\), then count all pairs \((l,r), (l, r+1), \ldots (l, N)\), as garanteeed that \(x_{r+t} - x_l > D\) due to \(x_{r+t} > x_{r}\). then move \(l\) pointer.
Otherwise just move \(r\).
It ensures \(O(2N)\) compexity of the algorithm.
The following cell shows the result of the applying the algorithm.
lst = [1, 2, 3, 4, 5, 6]
find_pair(lst, 2)
3
Binary search#
Binary search is a fast way to find an item in a sorted list.
Here’s how it works:
Look at the middle element.
If it’s what you want, you’re done.
If it’s smaller than what you want, search the right half.
If it’s bigger, search the left half.
Repeat until you find the item or the list is empty.
Binary search is quick because it halves the list each step. It works only if the list is already sorted.
For a more detailed and strict description, check the binary search page.
The following code shows an implementation of binary search to find the index of a given element. At each step, it updates either the left (l
) or right (r
) boundary based on the value at (l + r) // 2
. Each step brings the search closer to the result.
def find_index(lst: list[int], num: int):
l, r = -1, len(lst)
while l + 1 < r:
p = (l + r) // 2
if lst[p] == num:
return p
elif lst[p] < num:
l = p
else:
r = p
return None
The following cell shows that it works.
find_index([1, 3, 7, 12, 88, 104], 1)
0
Event sort#
The event sort approach is used for tasks involving events that occur at different times (or any other orderable entity). It typically assumes a system with multiple states, where events are used to transition between these states.
The following picture is a typical illustration of kind of tasks that are usually solved with an even sort:
The typical input is an array of \((t_i, t'_i)\) pairs, representing the start and end of some process. The values \(t_i\) and \(t'_i\) can have a variety of physical meanings — for example, the time a customer spends in a shop or on a website, distances between objects, and so on.
The main idea of the approach is to iterate over the \((t_i, t'_i)\) pairs and create a sequence of events \((\tau_j, \text{type}_j, \overline{x}_j)\), where:
\(\tau_j\) is the timestamp of the event.
\(\text{type}_j\) indicates the type of the event — typically one value for the start of a process and another for the end.
\(\overline{x}_j\) is a vector containing additional information about the event, which varies depending on the specific task.
The array of the vectors of the events \((\tau_j, \text{type}_j, \overline{x}_j)\) is sorted. Usually it supposes lower \(\tau_j\) to go first. Ordering by \(\text{type}_j\) is also important, so you have to assign lower marks to the events that are supposed to be processed first.
The second cycle iterates though the sorted array of the events and performs some computations based on their contents.
Data structures#
A data structure is a way of organizing and storing data in memory. Different data structures are beneficial in different situations: some allow dynamic expansion of data, others provide faster access, and some offer features that are especially useful for specific tasks.
The following table represents the basic data structures:
Data Structure |
Description |
Example Use Case |
---|---|---|
List |
Mutable ordered collection |
Storing ordered values |
Tuple |
Immutable ordered collection |
Returning multiple values from a function |
Set |
Unordered collection of unique elements |
Removing duplicates, membership testing |
Dictionary |
Key-value mapping |
Associative arrays, fast key-based access |
Queue |
FIFO structure (First-In, First-Out) |
Task scheduling, event processing |
Stack |
LIFO structure (Last-In, First-Out) |
Backtracking, undo operations |
Deque |
Double-ended queue |
Fast insertion/removal from both ends |
Array |
Sequence of fixed-type values |
Efficient numeric data storage |
Linked List |
Elements linked by references to the next item |
Frequent insertions/removals in the middle |
Hash Table |
Key-based access via hash function |
Underlying structure for dictionaries |
Tree |
Hierarchical structure with parent-child relationships |
Representing hierarchical data, parsing XML |
Check more details on some of the data structures at the Data structures page.