Window functions (OVER)#
Check following sources:
SQL window functions tutorial on mode.com.
Window functions article on official postgres documentation.
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
In fact, you are doing something like aggregation, but the results are not collapsed by variable aggregation.
Syntax#
If you are using a window function in the SELECT
block of your query, you can declare a new output column using the following syntax:
sql<function> OVER (<relationship>) AS <result name>.
This allows you to compute the result of the window function and assign it to a new column with the specified <result name>
.
Functions can be quite different, but actually there are only two components of <relationship>
: PARTITION
and ORDER BY
, so consider them at the beginning.
Partition#
Within the OVER
clause, the PARTITION BY
clause enables you to define variables whose values will be treated as groups. The window functions will be applied independently to each of these defined groups.
Consider the simplest possible example: for each record, calculate the sum of the some_value
by the group
of the row.
--postgreSQL
SELECT
some_value, "group",
SUM(some_value) OVER (PARTITION BY "group") AS result
FROM
(
VALUES
(3, 'C', 'X'),
(3, 'C', 'Y'),
(1, 'C', 'X'),
(4, 'B', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
SELECT 5
some_value | group | result |
---|---|---|
3 | A | 3 |
4 | B | 4 |
3 | C | 7 |
3 | C | 7 |
1 | C | 7 |
Now, let’s explore a more exotic example to better understand the flexibility of this tool. In this case, for each unique some_value
we correspond emumeration of the possible values of group
that ocur with that options of some_value
.
--postgreSQL
SELECT
some_value, "group",
string_agg("group", ', ') OVER (PARTITION BY some_value) AS result
FROM (
VALUES
(3, 'C', 'X'),
(3, 'B', 'Y'),
(1, 'C', 'X'),
(4, 'B', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
SELECT 5
some_value | group | result |
---|---|---|
1 | C | C |
3 | C | C, B, A |
3 | B | C, B, A |
3 | A | C, B, A |
4 | B | B |
Several variables#
When you specify two or more variables within the PARTITION BY
clause, you create partitions based on a composite of those columns.
This example demonstrates how to calculate the sum of some_value
for every unique combination of group
and group2
.
--postgreSQL
SELECT
some_value, "group", group2,
SUM(some_value) OVER (PARTITION BY "group", group2) AS result
FROM (
VALUES
(3, 'C', 'X'),
(3, 'B', 'Y'),
(1, 'C', 'X'),
(4, 'B', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
SELECT 5
some_value | group | group2 | result |
---|---|---|---|
3 | A | X | 3 |
3 | B | Y | 7 |
4 | B | Y | 7 |
3 | C | X | 4 |
1 | C | X | 4 |
Order by#
The ORDER BY
clause defines the order of application of the window function. Each unique value of the ordering variable in the aggregation will contain only values corresponding to the previous values according to the order.
The following examples apply to the computation of sums of some_value
in the order defined by group
.
--postgreSQL
SELECT
some_value, "group",
SUM(some_value) OVER (ORDER BY "group") AS sv
FROM (
VALUES
(3, 'C', 'X'),
(3, 'B', 'Y'),
(1, 'C', 'X'),
(4, 'B', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
SELECT 5
some_value | group | sv |
---|---|---|
3 | A | 3 |
3 | B | 10 |
4 | B | 10 |
3 | C | 14 |
1 | C | 14 |
So for A
we only have the sum of the some_values
of the rows corresponding to the A
group. But for B
we have the sum on both A
and B
groups and so on.
Partition and order#
You can combine PARTITION BY
and ORDER BY
within a single OVER
clause. This allows you to compute aggregations for each distinct combination of variables specified in PARTITION BY
, while accumulating the aggregation only based on the unique values defined in the ORDER BY
clause.
This example uses group2
for the PARTITION BY
clause and group
for the ORDER BY
clause within the OVER
clause.
--postgreSQL
SELECT
some_value, "group", group2,
SUM(some_value) OVER (PARTITION BY group2 ORDER BY "group") AS sv
FROM (
VALUES
(3, 'C', 'X'),
(3, 'B', 'Y'),
(1, 'B', 'X'),
(4, 'C', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
SELECT 5
some_value | group | group2 | sv |
---|---|---|---|
3 | A | X | 3 |
1 | B | X | 4 |
3 | C | X | 7 |
3 | B | Y | 3 |
4 | C | Y | 7 |
As a result, each unique combination of group
and group2
receives its own calculated value. Within each group2
value, the group
values are ordered, and the aggregation accumulates only within that specific group2
value. This means that different group2
values have independent sequences for the aggregation.
Functions#
While window functions can utilize aggregation functions, there are specialized functions specifically designed for use with window functions. We’ll explore these dedicated window functions in this section. For more detailed information, please refer to the specific page.
There is a table that lists some popular window functions.
Function |
Description |
---|---|
|
Assigns a unique sequential integer to rows within a partition. |
|
Assigns a rank to each row with gaps in ranking for ties. |
|
Assigns a rank to each row without gaps in ranking for ties. |
|
Divides the result set into |
|
Accesses data from the next row in the result set. |
|
Accesses data from the previous row in the result set. |
|
Returns the first value in an ordered partition. |
|
Returns the last value in an ordered partition. |
|
Returns the nth value in an ordered partition. |
The following cell shows failed attempt to apply LAG
functions without using OVER
clause.
--postgreSQL
SELECT LAG(group2)
FROM (
VALUES
(3, 'C', 'X'),
(3, 'B', 'Y'),
(1, 'B', 'X'),
(4, 'C', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
Traceback (most recent call last):
File "/home/fedor/Documents/code/knowledge/src/kernel.py", line 116, in do_execute
self._execute_sql(code=code)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^
File "/home/fedor/Documents/code/knowledge/src/kernel.py", line 84, in _execute_sql
messages, tables = runner.execute(code)
~~~~~~~~~~~~~~^^^^^^
File "/home/fedor/Documents/code/knowledge/src/runners/runners.py", line 69, in execute
cursor.execute(code.encode())
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "/home/fedor/.virtualenvironments/knowledge/lib/python3.13/site-packages/psycopg/cursor.py", line 97, in execute
raise ex.with_traceback(None)
psycopg.errors.WrongObjectType: window function lag requires an OVER clause
LINE 2: SELECT LAG(group2)
^
The error message indicates that you cannot use window functions without specifying an OVER
clause.
Rows between#
ClickHouse has one more option to specify the logic of the window function: ROWS BETWEEN <top border> AND <bottom border>
. This allows you to define a relative range of rows that will be aggregated in the window function.
Instead of <top border>
and <bottom border>
, you supposed to use <n> PRECEDING
to refer to the previous rows and <n> FOLLOWING
to refer to the following rows.
The following example shows how to generate an array for each row, which includes preceding, following, and surrounding values.
--ClickHouse
CREATE
TEMPORARY TABLE IF NOT EXISTS temp (some_value Int32)
ENGINE = Memory;
INSERT INTO temp VALUES
(3), (3), (1), (4), (3);
SELECT
some_value,
groupArray(some_value) OVER (ROWS BETWEEN 2 PRECEDING AND 0 PRECEDING) preceding,
groupArray(some_value) OVER (ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING) following,
groupArray(some_value) OVER (ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) around
FROM temp;
some_value | preceding | following | around |
---|---|---|---|
3 | [3] | [3, 1] | [3, 3] |
3 | [3, 3] | [1, 4] | [3, 3, 1] |
1 | [3, 3, 1] | [4, 3] | [3, 1, 4] |
4 | [3, 1, 4] | [3] | [1, 4, 3] |
3 | [1, 4, 3] | [] | [4, 3] |