Intro

Intro#

This page introduces to the algorithms and data structures section.

Prefix sum#

Prefix sum for sequence \(x_1, x_2, \ldots, x_n\) is a sequence \(y_1, y_2, \ldots, y_{n+1}\).

Where: \(y_i = \sum^{i-1}_{j=1} x_j, i = \overline{1, n+1}\)

From this the statement \(y_i = y_{i-1} + x_{i-1}\) will be correct.

The following cell shows the prefix sum idea in table format.

\(i\)	\(1\)	\(2\)	\(3\)	\(4\)	\(\ldots\)	\(n\)	\(n+1\)
\(x_i\)	\(x_1\)	\(x_2\)	\(x_3\)	\(x_4\)	\(\ldots\)	\(x_n\)
\(y_i\)	\(0\)	\(x_1\)	\(x_1 + x_2 = y_2 + x_2\)	\(x_1 + x_2 + x_3 = y_3 + x_3\)	\(\ldots\)	\(\sum^{n-1}_{j=1} x_j = y_{n-1} + x_{n-1}\)	\(\sum^{n}_{j=1} x_j = y_n + x_n\)

The prefix sum can be computed from the original sequence in \(O(n)\) time.

The following cell illustrates the implementation of the counting prefix sum.

def count_prefix_sum(seq: list[float | int]):
    ans: list[float | int] = [0]
    for x in seq:
        ans.append(ans[-1] + x)
    return ans

count_prefix_sum([1, 2, 3, 4])

[0, 1, 3, 6, 10]

Sum of subset#

Calculate the sum of the elements of the sequence from the \(a\)-th to the \(b\)-th position using formula:

\[\sum_{i=a}^b x_i = y_{b + 1} - y_a\]

If you have the prefix sum precomputed, you can compute the sum of a subset with complexity of \(O(1)\).

To prove this just describe the difference \(y_{b+1} - y_a\):

\[y_{b+1} - y_a = \sum_{i=1}^{b} x_i - \sum_{i=1}^{a-1} x_i = \sum_{i=a}^b x_i\]

Transformation#

The prefix sum is a really useful data structure. To fulfill its potential, you must apply a transformation to the original set - \(f(x)\).

For example, if the task requires counting the number of elements \(x_i > k\) for \(i=\overline{a,b}\), then cae apply the following transformation:

\[\begin{split} f(x_i) = \begin{cases} 1, x_i > k; \\ 0, x_i \leq k. \end{cases} \end{split}\]

Count and use the prefix sum for the transformed sequence.

Two pointers#

In the two-pointers technique uses two indexes (pointers) that access the elements of an array during the same iteration.

Example of task:

There is a sorted array of numbers. You need to count all pairs of the elements, \((a,b)\), for which the statemene \(a - b > D\) is true.

The following cell shows the approach to solve this using the two pointers technique.

def find_pair(lst: list[int], D: int):
    l, r = 0, 1
    count = 0
    lst_len = len(lst)
    while True:
        if r >= lst_len:
            break
        if (lst[r] - lst[l] > D) and (l < r):
            count += (lst_len - r)
            l += 1
        else:
            r += 1
    return count

The idea is to compare items under indices \(l\) and \(r\).

If \(x_r - x_l > D\), then count all pairs \((l,r), (l, r+1), \ldots (l, N)\), as garanteeed that \(x_{r+t} - x_l > D\) due to \(x_{r+t} > x_{r}\). then move \(l\) pointer.

Otherwise just move \(r\).

It ensures \(O(2N)\) compexity of the algorithm.

The following cell shows the result of the applying the algorithm.

lst = [1, 2, 3, 4, 5, 6]
find_pair(lst, 2)

Binary search#

Binary search is a fast way to find an item in a sorted list.

Here’s how it works:

Look at the middle element.
If it’s what you want, you’re done.
If it’s smaller than what you want, search the right half.
If it’s bigger, search the left half.
Repeat until you find the item or the list is empty.

Binary search is quick because it halves the list each step. It works only if the list is already sorted.

For a more detailed and strict description, check the binary search page.

The following code shows an implementation of binary search to find the index of a given element. At each step, it updates either the left (l) or right (r) boundary based on the value at (l + r) // 2. Each step brings the search closer to the result.

def find_index(lst: list[int], num: int):
    l, r = -1, len(lst)
    while l + 1 < r:
        p = (l + r) // 2
        if lst[p] == num:
            return p
        elif lst[p] < num:
            l = p
        else:
            r = p
    return None

The following cell shows that it works.

find_index([1, 3, 7, 12, 88, 104], 1)

Event sort#

The event sort approach is used for tasks involving events that occur at different times (or any other orderable entity). It typically assumes a system with multiple states, where events are used to transition between these states.

The following picture is a typical illustration of kind of tasks that are usually solved with an even sort:

The typical input is an array of \((t_i, t'_i)\) pairs, representing the start and end of some process. The values \(t_i\) and \(t'_i\) can have a variety of physical meanings — for example, the time a customer spends in a shop or on a website, distances between objects, and so on.

The main idea of the approach is to iterate over the \((t_i, t'_i)\) pairs and create a sequence of events \((\tau_j, \text{type}_j, \overline{x}_j)\), where:

\(\tau_j\) is the timestamp of the event.
\(\text{type}_j\) indicates the type of the event — typically one value for the start of a process and another for the end.
\(\overline{x}_j\) is a vector containing additional information about the event, which varies depending on the specific task.

The array of the vectors of the events \((\tau_j, \text{type}_j, \overline{x}_j)\) is sorted. Usually it supposes lower \(\tau_j\) to go first. Ordering by \(\text{type}_j\) is also important, so you have to assign lower marks to the events that are supposed to be processed first.

The second cycle iterates though the sorted array of the events and performs some computations based on their contents.

Data structures#

A data structure is a way of organizing and storing data in memory. Different data structures are beneficial in different situations: some allow dynamic expansion of data, others provide faster access, and some offer features that are especially useful for specific tasks.

The following table represents the basic data structures:

Data Structure	Description	Example Use Case
List	Mutable ordered collection	Storing ordered values
Tuple	Immutable ordered collection	Returning multiple values from a function
Set	Unordered collection of unique elements	Removing duplicates, membership testing
Dictionary	Key-value mapping	Associative arrays, fast key-based access
Queue	FIFO structure (First-In, First-Out)	Task scheduling, event processing
Stack	LIFO structure (Last-In, First-Out)	Backtracking, undo operations
Deque	Double-ended queue	Fast insertion/removal from both ends
Array	Sequence of fixed-type values	Efficient numeric data storage
Linked List	Elements linked by references to the next item	Frequent insertions/removals in the middle
Hash Table	Key-based access via hash function	Underlying structure for dictionaries
Tree	Hierarchical structure with parent-child relationships	Representing hierarchical data, parsing XML

Check more details on some of the data structures at the Data structures page.