Binary search#
Binary search is an algorithm used to find a specific element within an ordered sequence. At each step, it divides the sequence into two equal sub-sequences and continuest searching in only one of them. This page provides an accurate description.
from typing import Any, Callable
Formal description#
Suppose you are processing the sequence: \(x_1, x_2, \ldots, x_{k-1}, x_k, x_{k+1}, \ldots, x_{n-1}, x_n\). The element you are trying to find is \(x_k\).
In this context, the fact that the sequence is ordered means that one can define a function \(f\) such that it takes the value 0 for all elements before \(x_k\) and 1 for all elements from \(x_k\) onward. More formally,
Define variables that indicate the beginning and the end of the interval beingconsidered: \(l_0 = 0, r_0 = n\).
At each step, the middle point of the interval is counted: \(p_j = \left\lfloor \frac{l_j + r_j}{2} \right\rfloor\). Depending on the value of \(f(x_{p_j})\), we decide whether to move \(l\) or \(r\):
Procedure continues until \(l_j + 1 < r_j\). The fact \(l_j + 1 = r_j\) garantees that \(x_{r_j} = x_k\)
Note: A very common mistake in binary search is to forget to add \(1\) in the stopping condition \(l_j + 1 < r_j\). Omitting it can lead to an infinite loop. At some point, you may reach the situation where \(l_j = k - 1\) and \(r_j = k\). Then:
So,
As a result, \(r_j = r_{j - 1}\) and \(l_j = l_{j - 1}\) — the search interval does not shrink, and the loop runs forever.
The following cell defines a generalised implemention of the binary search:
def first_entry(sequence: list[Any], fun: Callable[[int], int]) -> int:
'''
Implementation of the first entry binary search.
Parameters
----------
sequence: list[Any]
Sequence of elements to be processed.
fun: Callable
Function that takes value True for appropriate side of the sequence.
Returns
-------
The first index when fun takes value True.
'''
l, r = -1, len(sequence)
while l + 1 < r:
p = (l + r) // 2
if fun(sequence[p]):
l = p
else:
r = p
return r
The following cell shows how the function implemented earlier used to find a specific number (12) in a list of integers.
first_entry(
[1, 7, 9, 10, 12, 12, 16, 21],
lambda x: x < 12
)
4
Last entry#
Binary search algorithm in different sources can be defined slightly different, but important is the one option - last occurance approach.
The previously defined algorithm, in the case where there are multiple elements with the same value, \(x_k = x_{k+1} = x_{k+2} = \ldots\), will return \(k\), which corresponds to the first occurrence of an element for which \(f(x) = 1\).
The widely accepted alternative last occurrence approach sugests that \(f\) is defined as:
Accordingly, the following alterations are expected to be implemented in the mechanisms employed for the computation of interval borders and the midpoint of an interval:
Such changes occur in the final steps of the algorithm:
When \(\frac{l_j + r_j}{2} = k + \varepsilon\), where \(\varepsilon \in (0, 1)\), we have \(p_j = \lceil k + \varepsilon \rceil = k + 1\). Since \(f(x_{k+1}) = 0\), the right boundary is moved to the \((k+1)\)-th position: \(r_j = k + 1\).
When \(\frac{l_j + r_j}{2} = k - 1 + \varepsilon\), where \(\varepsilon \in (0, 1)\), we have \(p_j = \lceil k - 1 + \varepsilon \rceil = k\). Since \(f(x_k) = 1\), the left boundary is moved to the \(k\)-th position: \(l_j = k\).
After all the steps, we have \(l_j = k\) and \(r_j = k + 1\). Even if there are multiple identical values, \(x_{k - l} = x_{k - l + 1} = \ldots = x_{k - 1} = x_k\), the algorithm will still point, by the final left boundary, to the \(k\)-th element — the first occurrence of the desired element.
The next code defines universal implementation of the last entry approach:
def last_entry(sequence: list[Any], fun: Callable[[int], int]):
'''
Implementation of the last entry binary search.
Parameters
----------
sequence: list[Any]
Sequence of elements to be processed.
fun: Callable
Function that takes value True for appropriate side of the sequence.
Returns
-------
The last index when fun takes value True.
'''
l, r = -1, len(sequence)
while l + 1 < r:
p = (l + r + 1) // 2
if fun(sequence[p]):
l = p
else:
r = p
return l
Following cell shows, use of the function defined earlier to find index of the 12 value in the list:
last_entry(
[5, 7, 9, 12, 12, 14, 16, 18, 21, 22],
lambda x: x <= 12
)
4
It returns 4, which is the second entry of the value we are looking for.
Search by answer#
The binary search by answer is an important method for solving a set of tasks.
It is used for tasks that require finding the minimum or maximum value under certain conditions. The conditions must be able to be represented as a binary function that changes value once at a certain point.
More formally, for the binary function \(f(x)\) is required to find \(\min_{f(x)=True}{(x)}\).
The best way to demonstrate this approach is to show the tasks that can be solved using this approach.
Given a sequence of numbers ordered in ascending order: \(a_1, a_2, \ldots, a_n\). Required to find \(L\) non-overlapping \(c\)-elements subsets of numbers such that the maximum difference is minimal:
Here, \(i_j, j=\overline{1,n}\) is the first index of the \(j\)-th group.
The binary search solution involves searching for smallest \(\Delta\) that allows the formation of \(L\) non-overlapping \(c\)-element subsets.
The following cell realizes the function that checks whether it is possible to find such subsets for given \(\Delta\) (D
).
def check_groups(lst: list[int], L: int, c: int, D: int) -> bool:
"""
Check if it possible to form L groups of c elements
"""
i = 0
counter = 0
lst_len = len(lst)
while i < lst_len - c + 1:
if lst[i + c - 1] - lst[i] <= D:
counter += 1
i += c
else:
i += 1
if counter >= L:
return True
return False
Consider set of numbers \(\left\{1, 5, 15, 20, 20\right\}\) and parameters \(L=2\), \(c=2\). For \(\Delta=4\) it still possible to find subsets \(\left\{1, 5\right\}, \left\{20, 20\right\}\):
check_groups(
lst=[1, 5, 15, 20, 20], L=2, c=2, D=4
)
True
However, for \(\Delta = 1\), the subset \(\left\{1, 5\right\}\) is incorrect - it’s impossible to find the corresponding subsets.
check_groups(
lst=[1, 5, 15, 20, 20], L=2, c=2, D=1
)
False
Now, just binserach for the first \(\Delta\) that corresponds to the requirements:
def solution(lst: list[int], L: int, c: int) -> int:
left, right = 0, len(lst)
while left + 1 < right:
D = (left + right) // 2
if check_groups(lst=lst, L=L, c=c, D=D):
right = D
else:
left = D
return right
solution(lst=[1, 5, 15, 20, 20], L=2, c=2)
4