Values shifting (LAG/LEAD)#
This page focuses on functions that allow shifting arrays: LAG
and LEAD
.
The result of applying LAG
and LEAD
can be represented as:
For
LAG
: \(x'_i = x_{i-s}\)For
LEAD
: \(x'_i = x_{i+s}\)
Where \(s\) is the number of cells we want to shift the values.
On this page, we dive into the options provided by different databases. In the following cell, both ClickHouse and Postgres are started.
docker run -d --rm\
--name lag_lead_pg\
-e POSTGRES_PASSWORD=postgres\
postgres:15.4 &> /dev/null
docker run --rm -d \
--name lag_lead_click \
clickhouse/clickhouse-server &> /dev/null
sleep 5
Note: Don’t forget to stop the container afterward.
docker stop lag_lead_pg lag_lead_click
lag_lead_pg
lag_lead_click
Partition by#
If you include a PARTITION BY
clause with the LAG
window function, it will define groups of values within which the shifting operation will take place.
This example demonstrates shifting the group
values within groups defined by group2
.
docker exec -i window_functions_pg psql -U postgres -d postgres << EOF
SELECT
"group", group2,
LAG("group") OVER (PARTITION BY group2)
FROM (
VALUES
(3, 'C', 'X'),
(3, 'C', 'Y'),
(1, 'C', 'X'),
(4, 'B', 'Y'),
(3, 'A', 'X'),
(1, 'D', 'Y')
) AS temp(some_value, "group", group2);
EOF
group | group2 | lag
-------+--------+-----
C | X |
C | X | C
A | X | C
C | Y |
B | Y | C
D | Y | B
(6 rows)
Clickhouse#
Clickhouse by default don’t have LEAD
/LAG
functions, but you can reproduce the effect by using syntax:
any(<column name>) OVER (ROWS BETWEEN <offset> PRECEDING AND <offset> PRECEDING)
forLAG
function.any(<column name>) OVER (ROWS BETWEEN <offset> FOLLOWING AND <offset> FOLLOWING)
forLEAD
function.
Here is an example of a LAG
shift with an offset of 1 and a LEAD
shift with an offset of 2.
docker exec lag_lead_click clickhouse-client -q "
CREATE TEMPORARY TABLE temp (original_column Int32) ENGINE = Memory;
INSERT INTO temp VALUES
(4), (3), (3), (1), (3), (3), (3), (5), (5), (1);
SELECT
original_column,
any(original_column) OVER (
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
) AS lag_value,
any(original_column) OVER (
ROWS BETWEEN 2 FOLLOWING AND 2 FOLLOWING
) AS lead_value
FROM temp
FORMAT Pretty;
"
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ original_column ┃ lag_value ┃ lead_value ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━┩
1. │ 4 │ 0 │ 3 │
├─────────────────┼───────────┼────────────┤
2. │ 3 │ 4 │ 1 │
├─────────────────┼───────────┼────────────┤
3. │ 3 │ 3 │ 3 │
├─────────────────┼───────────┼────────────┤
4. │ 1 │ 3 │ 3 │
├─────────────────┼───────────┼────────────┤
5. │ 3 │ 1 │ 3 │
├─────────────────┼───────────┼────────────┤
6. │ 3 │ 3 │ 5 │
├─────────────────┼───────────┼────────────┤
7. │ 3 │ 3 │ 5 │
├─────────────────┼───────────┼────────────┤
8. │ 5 │ 3 │ 1 │
├─────────────────┼───────────┼────────────┤
9. │ 5 │ 5 │ 0 │
├─────────────────┼───────────┼────────────┤
10. │ 1 │ 5 │ 0 │
└─────────────────┴───────────┴────────────┘
Note: Instead of empty values, records without previous or next values are replaced with zeros—this is quite typical for ClickHouse.