Window functions (OVER)

Window functions (OVER)#

Check following sources:

SQL window functions tutorial on mode.com.
Window functions article on official postgres documentation.

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

In fact, you are doing something like aggregation, but the results are not collapsed by variable aggregation.

Syntax#

If you are using a window function in the SELECT block of your query, you can declare a new output column using the following syntax:

sql<function> OVER (<relationship>) AS <result name>.

This allows you to compute the result of the window function and assign it to a new column with the specified <result name>.

Functions can be quite different, but actually there are only two components of <relationship>: PARTITION and ORDER BY , so consider them at the beginning.

Partition#

Within the OVER clause, the PARTITION BY clause enables you to define variables whose values will be treated as groups. The window functions will be applied independently to each of these defined groups.

Consider the simplest possible example: for each record, calculate the sum of the some_value by the group of the row.

--postgreSQL
SELECT 
    some_value, "group",
    SUM(some_value) OVER (PARTITION BY "group") AS result
FROM 
(
    VALUES
    (3, 'C', 'X'),
    (3, 'C', 'Y'),
    (1, 'C', 'X'),
    (4, 'B', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);

SELECT 5

some_value	group	result
3	A	3
4	B	4
3	C	7
3	C	7
1	C	7

Now, let’s explore a more exotic example to better understand the flexibility of this tool. In this case, for each unique some_value we correspond emumeration of the possible values of group that ocur with that options of some_value.

--postgreSQL
SELECT
    some_value, "group",
    string_agg("group", ', ') OVER (PARTITION BY some_value) AS result
FROM (
    VALUES
    (3, 'C', 'X'),
    (3, 'B', 'Y'),
    (1, 'C', 'X'),
    (4, 'B', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);

SELECT 5

some_value	group	result
1	C	C
3	C	C, B, A
3	B	C, B, A
3	A	C, B, A
4	B	B

Several variables#

When you specify two or more variables within the PARTITION BY clause, you create partitions based on a composite of those columns.

This example demonstrates how to calculate the sum of some_value for every unique combination of group and group2.

--postgreSQL
SELECT
    some_value, "group", group2, 
    SUM(some_value) OVER (PARTITION BY "group", group2) AS result
FROM (
    VALUES
    (3, 'C', 'X'),
    (3, 'B', 'Y'),
    (1, 'C', 'X'),
    (4, 'B', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);

SELECT 5

some_value	group	group2	result
3	A	X	3
3	B	Y	7
4	B	Y	7
3	C	X	4
1	C	X	4

Order by#

The ORDER BY clause defines the order of application of the window function. Each unique value of the ordering variable in the aggregation will contain only values corresponding to the previous values according to the order.

The following examples apply to the computation of sums of some_value in the order defined by group.

--postgreSQL
SELECT 
    some_value, "group",
    SUM(some_value) OVER (ORDER BY "group") AS sv
FROM (
    VALUES
    (3, 'C', 'X'),
    (3, 'B', 'Y'),
    (1, 'C', 'X'),
    (4, 'B', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);

SELECT 5

some_value	group	sv
3	A	3
3	B	10
4	B	10
3	C	14
1	C	14

So for A we only have the sum of the some_values of the rows corresponding to the A group. But for B we have the sum on both A and B groups and so on.

Partition and order#

You can combine PARTITION BY and ORDER BY within a single OVER clause. This allows you to compute aggregations for each distinct combination of variables specified in PARTITION BY, while accumulating the aggregation only based on the unique values defined in the ORDER BY clause.

This example uses group2 for the PARTITION BY clause and group for the ORDER BY clause within the OVER clause.

--postgreSQL
SELECT 
    some_value, "group", group2,
    SUM(some_value) OVER (PARTITION BY group2 ORDER BY "group") AS sv
FROM (
    VALUES
    (3, 'C', 'X'),
    (3, 'B', 'Y'),
    (1, 'B', 'X'),
    (4, 'C', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);

SELECT 5

some_value	group	group2	sv
3	A	X	3
1	B	X	4
3	C	X	7
3	B	Y	3
4	C	Y	7

As a result, each unique combination of group and group2 receives its own calculated value. Within each group2 value, the group values are ordered, and the aggregation accumulates only within that specific group2 value. This means that different group2 values have independent sequences for the aggregation.

Functions#

While window functions can utilize aggregation functions, there are specialized functions specifically designed for use with window functions. We’ll explore these dedicated window functions in this section. For more detailed information, please refer to the specific page.

There is a table that lists some popular window functions.

Function	Description
`ROW_NUMBER()`	Assigns a unique sequential integer to rows within a partition.
`RANK()`	Assigns a rank to each row with gaps in ranking for ties.
`DENSE_RANK()`	Assigns a rank to each row without gaps in ranking for ties.
`NTILE(n)`	Divides the result set into `n` parts and assigns a bucket number.
`LEAD()`	Accesses data from the next row in the result set.
`LAG()`	Accesses data from the previous row in the result set.
`FIRST_VALUE()`	Returns the first value in an ordered partition.
`LAST_VALUE()`	Returns the last value in an ordered partition.
`NTH_VALUE()`	Returns the nth value in an ordered partition.

The following cell shows failed attempt to apply LAG functions without using OVER clause.

--postgreSQL
SELECT LAG(group2)
FROM (
    VALUES
    (3, 'C', 'X'),
    (3, 'B', 'Y'),
    (1, 'B', 'X'),
    (4, 'C', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);

Traceback (most recent call last):
  File "/home/fedor/Documents/code/knowledge/src/kernel.py", line 116, in do_execute
    self._execute_sql(code=code)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/fedor/Documents/code/knowledge/src/kernel.py", line 84, in _execute_sql
    messages, tables = runner.execute(code)
                       ~~~~~~~~~~~~~~^^^^^^
  File "/home/fedor/Documents/code/knowledge/src/runners/runners.py", line 69, in execute
    cursor.execute(code.encode())
    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/home/fedor/.virtualenvironments/knowledge/lib/python3.13/site-packages/psycopg/cursor.py", line 97, in execute
    raise ex.with_traceback(None)
psycopg.errors.WrongObjectType: window function lag requires an OVER clause
LINE 2: SELECT LAG(group2)
               ^

The error message indicates that you cannot use window functions without specifying an OVER clause.

Rows between#

ClickHouse has one more option to specify the logic of the window function: ROWS BETWEEN <top border> AND <bottom border>. This allows you to define a relative range of rows that will be aggregated in the window function.

Instead of <top border> and <bottom border>, you supposed to use <n> PRECEDING to refer to the previous rows and <n> FOLLOWING to refer to the following rows.

The following example shows how to generate an array for each row, which includes preceding, following, and surrounding values.

--ClickHouse
CREATE 
TEMPORARY TABLE IF NOT EXISTS temp (some_value Int32) 
ENGINE = Memory;

INSERT INTO temp VALUES
    (3), (3), (1), (4), (3);

SELECT 
    some_value,
    groupArray(some_value) OVER (ROWS BETWEEN 2 PRECEDING AND 0 PRECEDING) preceding,
    groupArray(some_value) OVER (ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING) following,
    groupArray(some_value) OVER (ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) around 
FROM temp;

some_value	preceding	following	around
3	[3]	[3, 1]	[3, 3]
3	[3, 3]	[1, 4]	[3, 3, 1]
1	[3, 3, 1]	[4, 3]	[3, 1, 4]
4	[3, 1, 4]	[3]	[1, 4, 3]
3	[1, 4, 3]	[]	[4, 3]