Window functions (OVER)#
Check following sources:
SQL window functions tutorial on mode.com.
Window functions article on official postgres documentation.
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.
In fact, you are doing something like aggregation, but the results are not collapsed by variable aggregation.
Let’s consider an example for this page. We have a numeric variable called some_variable
, on which we will apply various functions. Also, we have a group
variable that represents the relationship between rows.
docker run -d --rm\
--name window_functions_pg\
-e POSTGRES_PASSWORD=postgres\
postgres:15.4 &> /dev/null
docker run -d --rm\
--name window_functions_click\
clickhouse/clickhouse-server &> /dev/null
sleep 5
Note Don’t forget to stop the container.
docker stop window_functions_pg window_functions_click
window_functions_example
Syntax#
If you are using a window function in the SELECT
block of your query, you can declare a new output column using the following syntax:
sql<function> OVER (<relationship>) AS <result name>.
This allows you to compute the result of the window function and assign it to a new column with the specified <result name>
.
Functions can be quite different, but actually there are only two components of <relationship>
: PARTITION
and ORDER BY
, so consider them at the beginning.
Partition#
Within the OVER
clause, the PARTITION BY
clause enables you to define variables whose values will be treated as groups. The window functions will be applied independently to each of these defined groups.
Consider the simplest possible example: for each record, calculate the sum of the some_value
by the group
of the row.
docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT
some_value, "group",
SUM(some_value) OVER (PARTITION BY "group") AS result
FROM
(
VALUES
(3, 'C', 'X'),
(3, 'C', 'Y'),
(1, 'C', 'X'),
(4, 'B', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
some_value | group | result
------------+-------+--------
3 | A | 3
4 | B | 4
3 | C | 7
3 | C | 7
1 | C | 7
(5 rows)
Now, let’s explore a more exotic example to better understand the flexibility of this tool. In this case, for each unique some_value
we correspond emumeration of the possible values of group
that ocur with that options of some_value
.
docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT
some_value, "group",
string_agg("group", ', ') OVER (PARTITION BY some_value) AS result
FROM (
VALUES
(3, 'C', 'X'),
(3, 'B', 'Y'),
(1, 'C', 'X'),
(4, 'B', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
some_value | group | result
------------+-------+---------
1 | C | C
3 | C | C, B, A
3 | B | C, B, A
3 | A | C, B, A
4 | B | B
(5 rows)
Several variables#
When you specify two or more variables within the PARTITION BY
clause, you create partitions based on a composite of those columns.
This example demonstrates how to calculate the sum of some_value
for every unique combination of group
and group2
.
docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT
some_value, "group", group2,
SUM(some_value) OVER (PARTITION BY "group", group2) AS result
FROM (
VALUES
(3, 'C', 'X'),
(3, 'B', 'Y'),
(1, 'C', 'X'),
(4, 'B', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
some_value | group | group2 | result
------------+-------+--------+--------
3 | A | X | 3
3 | B | Y | 7
4 | B | Y | 7
3 | C | X | 4
1 | C | X | 4
(5 rows)
Order by#
The ORDER BY
clause defines the order of application of the window function. Each unique value of the ordering variable in the aggregation will contain only values corresponding to the previous values according to the order.
The following examples apply to the computation of sums of some_value
in the order defined by group
.
docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT
some_value, "group",
SUM(some_value) OVER (ORDER BY "group") AS sv
FROM (
VALUES
(3, 'C', 'X'),
(3, 'B', 'Y'),
(1, 'C', 'X'),
(4, 'B', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
some_value | group | sv
------------+-------+----
3 | A | 3
3 | B | 10
4 | B | 10
3 | C | 14
1 | C | 14
(5 rows)
So for A
we only have the sum of the some_values
of the rows corresponding to the A
group. But for B
we have the sum on both A
and B
groups and so on.
Partition and order#
You can combine PARTITION BY
and ORDER BY
within a single OVER
clause. This allows you to compute aggregations for each distinct combination of variables specified in PARTITION BY
, while accumulating the aggregation only based on the unique values defined in the ORDER BY
clause.
This example uses group2
for the PARTITION BY
clause and group
for the ORDER BY
clause within the OVER
clause.
docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT
some_value, "group", group2,
SUM(some_value) OVER (PARTITION BY group2 ORDER BY "group") AS sv
FROM (
VALUES
(3, 'C', 'X'),
(3, 'B', 'Y'),
(1, 'B', 'X'),
(4, 'C', 'Y'),
(3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
some_value | group | group2 | sv
------------+-------+--------+----
3 | A | X | 3
1 | B | X | 4
3 | C | X | 7
3 | B | Y | 3
4 | C | Y | 7
(5 rows)
As a result, each unique combination of group
and group2
receives its own calculated value. Within each group2
value, the group
values are ordered, and the aggregation accumulates only within that specific group2
value. This means that different group2
values have independent sequences for the aggregation.
Functions#
While window functions can utilize aggregation functions, there are specialized functions specifically designed for use with window functions. We’ll explore these dedicated window functions in this section. For more detailed information, please refer to the specific page.
There is a table that lists some popular window functions.
Function |
Description |
---|---|
|
Assigns a unique sequential integer to rows within a partition. |
|
Assigns a rank to each row with gaps in ranking for ties. |
|
Assigns a rank to each row without gaps in ranking for ties. |
|
Divides the result set into |
|
Accesses data from the next row in the result set. |
|
Accesses data from the previous row in the result set. |
|
Returns the first value in an ordered partition. |
|
Returns the last value in an ordered partition. |
|
Returns the nth value in an ordered partition. |
The following cell shows failed attempt to apply LAG
functions without using OVER
clause.
docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT LAG(group2) FROM example_table;
EOF
ERROR: relation "example_table" does not exist
LINE 1: SELECT LAG(group2) FROM example_table;
^
The error message indicates that you cannot use window functions without specifying an OVER
clause.
Rows between#
ClickHouse has one more option to specify the logic of the window function: ROWS BETWEEN <top border> AND <bottom border>
. This allows you to define a relative range of rows that will be aggregated in the window function.
Instead of <top border>
and <bottom border>
, you supposed to use <n> PRECEDING
to refer to the previous rows and <n> FOLLOWING
to refer to the following rows.
The following example shows how to generate an array for each row, which includes preceding, following, and surrounding values.
docker exec window_functions_click clickhouse-client -q "
CREATE
TEMPORARY TABLE temp (some_value Int32)
ENGINE = Memory;
INSERT INTO temp VALUES
(3), (3), (1), (4), (3);
SELECT
some_value,
groupArray(some_value) OVER (ROWS BETWEEN 2 PRECEDING AND 0 PRECEDING) preceding,
groupArray(some_value) OVER (ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING) following,
groupArray(some_value) OVER (ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) around
FROM temp
FORMAT Pretty;
"
┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┓
┃ some_value ┃ preceding ┃ following ┃ around ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━┩
1. │ 3 │ [3] │ [3,1] │ [3,3] │
├────────────┼───────────┼───────────┼─────────┤
2. │ 3 │ [3,3] │ [1,4] │ [3,3,1] │
├────────────┼───────────┼───────────┼─────────┤
3. │ 1 │ [3,3,1] │ [4,3] │ [3,1,4] │
├────────────┼───────────┼───────────┼─────────┤
4. │ 4 │ [3,1,4] │ [3] │ [1,4,3] │
├────────────┼───────────┼───────────┼─────────┤
5. │ 3 │ [1,4,3] │ [] │ [4,3] │
└────────────┴───────────┴───────────┴─────────┘