Window functions (OVER)#

Check following sources:

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

In fact, you are doing something like aggregation, but the results are not collapsed by variable aggregation.

Let’s consider an example for this page. We have a numeric variable called some_variable, on which we will apply various functions. Also, we have a group variable that represents the relationship between rows.

docker run -d --rm\
    --name window_functions_pg\
    -e POSTGRES_PASSWORD=postgres\
    postgres:15.4 &> /dev/null

docker run -d --rm\
    --name window_functions_click\
    clickhouse/clickhouse-server &> /dev/null

sleep 5

Note Don’t forget to stop the container.

docker stop window_functions_pg window_functions_click
window_functions_example

Syntax#

If you are using a window function in the SELECT block of your query, you can declare a new output column using the following syntax:

sql<function> OVER (<relationship>) AS <result name>.

This allows you to compute the result of the window function and assign it to a new column with the specified <result name>.

Functions can be quite different, but actually there are only two components of <relationship>: PARTITION and ORDER BY , so consider them at the beginning.

Partition#

Within the OVER clause, the PARTITION BY clause enables you to define variables whose values will be treated as groups. The window functions will be applied independently to each of these defined groups.


Consider the simplest possible example: for each record, calculate the sum of the some_value by the group of the row.

docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT 
    some_value, "group",
    SUM(some_value) OVER (PARTITION BY "group") AS result
FROM 
(
    VALUES
    (3, 'C', 'X'),
    (3, 'C', 'Y'),
    (1, 'C', 'X'),
    (4, 'B', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
 some_value | group | result 
------------+-------+--------
          3 | A     |      3
          4 | B     |      4
          3 | C     |      7
          3 | C     |      7
          1 | C     |      7
(5 rows)

Now, let’s explore a more exotic example to better understand the flexibility of this tool. In this case, for each unique some_value we correspond emumeration of the possible values of group that ocur with that options of some_value.

docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT 
    some_value, "group",
    string_agg("group", ', ') OVER (PARTITION BY some_value) AS result
FROM (
    VALUES
    (3, 'C', 'X'),
    (3, 'B', 'Y'),
    (1, 'C', 'X'),
    (4, 'B', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
 some_value | group | result  
------------+-------+---------
          1 | C     | C
          3 | C     | C, B, A
          3 | B     | C, B, A
          3 | A     | C, B, A
          4 | B     | B
(5 rows)

Several variables#

When you specify two or more variables within the PARTITION BY clause, you create partitions based on a composite of those columns.


This example demonstrates how to calculate the sum of some_value for every unique combination of group and group2.

docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT 
    some_value, "group", group2, 
    SUM(some_value) OVER (PARTITION BY "group", group2) AS result
FROM (
    VALUES
    (3, 'C', 'X'),
    (3, 'B', 'Y'),
    (1, 'C', 'X'),
    (4, 'B', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
 some_value | group | group2 | result 
------------+-------+--------+--------
          3 | A     | X      |      3
          3 | B     | Y      |      7
          4 | B     | Y      |      7
          3 | C     | X      |      4
          1 | C     | X      |      4
(5 rows)

Order by#

The ORDER BY clause defines the order of application of the window function. Each unique value of the ordering variable in the aggregation will contain only values corresponding to the previous values according to the order.


The following examples apply to the computation of sums of some_value in the order defined by group.

docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT 
    some_value, "group",
    SUM(some_value) OVER (ORDER BY "group") AS sv
FROM (
    VALUES
    (3, 'C', 'X'),
    (3, 'B', 'Y'),
    (1, 'C', 'X'),
    (4, 'B', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
 some_value | group | sv 
------------+-------+----
          3 | A     |  3
          3 | B     | 10
          4 | B     | 10
          3 | C     | 14
          1 | C     | 14
(5 rows)

So for A we only have the sum of the some_values of the rows corresponding to the A group. But for B we have the sum on both A and B groups and so on.

Partition and order#

You can combine PARTITION BY and ORDER BY within a single OVER clause. This allows you to compute aggregations for each distinct combination of variables specified in PARTITION BY, while accumulating the aggregation only based on the unique values defined in the ORDER BY clause.


This example uses group2 for the PARTITION BY clause and group for the ORDER BY clause within the OVER clause.

docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT 
    some_value, "group", group2,
    SUM(some_value) OVER (PARTITION BY group2 ORDER BY "group") AS sv
FROM (
    VALUES
    (3, 'C', 'X'),
    (3, 'B', 'Y'),
    (1, 'B', 'X'),
    (4, 'C', 'Y'),
    (3, 'A', 'X')
) AS temp(some_value, "group", group2);
EOF
 some_value | group | group2 | sv 
------------+-------+--------+----
          3 | A     | X      |  3
          1 | B     | X      |  4
          3 | C     | X      |  7
          3 | B     | Y      |  3
          4 | C     | Y      |  7
(5 rows)

As a result, each unique combination of group and group2 receives its own calculated value. Within each group2 value, the group values are ordered, and the aggregation accumulates only within that specific group2 value. This means that different group2 values have independent sequences for the aggregation.

Functions#

While window functions can utilize aggregation functions, there are specialized functions specifically designed for use with window functions. We’ll explore these dedicated window functions in this section. For more detailed information, please refer to the specific page.

There is a table that lists some popular window functions.

Function

Description

ROW_NUMBER()

Assigns a unique sequential integer to rows within a partition.

RANK()

Assigns a rank to each row with gaps in ranking for ties.

DENSE_RANK()

Assigns a rank to each row without gaps in ranking for ties.

NTILE(n)

Divides the result set into n parts and assigns a bucket number.

LEAD()

Accesses data from the next row in the result set.

LAG()

Accesses data from the previous row in the result set.

FIRST_VALUE()

Returns the first value in an ordered partition.

LAST_VALUE()

Returns the last value in an ordered partition.

NTH_VALUE()

Returns the nth value in an ordered partition.


The following cell shows failed attempt to apply LAG functions without using OVER clause.

docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT LAG(group2) FROM example_table;
EOF
ERROR:  relation "example_table" does not exist
LINE 1: SELECT LAG(group2) FROM example_table;
                                ^

The error message indicates that you cannot use window functions without specifying an OVER clause.

Rows between#

ClickHouse has one more option to specify the logic of the window function: ROWS BETWEEN <top border> AND <bottom border>. This allows you to define a relative range of rows that will be aggregated in the window function.

Instead of <top border> and <bottom border>, you supposed to use <n> PRECEDING to refer to the previous rows and <n> FOLLOWING to refer to the following rows.


The following example shows how to generate an array for each row, which includes preceding, following, and surrounding values.

docker exec window_functions_click clickhouse-client -q "
CREATE 
TEMPORARY TABLE temp (some_value Int32) 
ENGINE = Memory;

INSERT INTO temp VALUES
    (3), (3), (1), (4), (3);

SELECT 
    some_value,
    groupArray(some_value) OVER (ROWS BETWEEN 2 PRECEDING AND 0 PRECEDING) preceding,
    groupArray(some_value) OVER (ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING) following,
    groupArray(some_value) OVER (ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) around 
FROM temp
FORMAT Pretty;
"
   ┏━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┓
   ┃ some_value ┃ preceding ┃ following ┃ around  ┃
   ┡━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━┩
1. │          3 │ [3]       │ [3,1]     │ [3,3]   │
   ├────────────┼───────────┼───────────┼─────────┤
2. │          3 │ [3,3]     │ [1,4]     │ [3,3,1] │
   ├────────────┼───────────┼───────────┼─────────┤
3. │          1 │ [3,3,1]   │ [4,3]     │ [3,1,4] │
   ├────────────┼───────────┼───────────┼─────────┤
4. │          4 │ [3,1,4]   │ [3]       │ [1,4,3] │
   ├────────────┼───────────┼───────────┼─────────┤
5. │          3 │ [1,4,3]   │ []        │ [4,3]   │
   └────────────┴───────────┴───────────┴─────────┘