Expand array#
Sometimes you need to deal with arrays of arrays or, more scientifically, a column containing arrays in each record.
You can perform this operation using:
The
UNNEST(<column>)
function in postgres.The
arrayJoin(<column>)
functoin in clickhouse.
Basic example#
The following cell demonstrates array expansion in PostgreSQL using the unnest
function.
--postgreSQL
SELECT
col1,
unnest(col2) AS unnestted_value
FROM (
VALUES
(10, '{1,2,3}'::int[]),
(7, '{3,4,1}'::int[])
) AS temp(col1, col2);
SELECT 6
col1 | unnestted_value |
---|---|
10 | 1 |
10 | 2 |
10 | 3 |
7 | 3 |
7 | 4 |
7 | 1 |
The following cell performs the same operation in ClickHouse using the arrayJoin
function. The result is identical.
--ClickHouse
CREATE TEMPORARY TABLE IF NOT EXISTS temp (col1 Int32, col2 Array(Int32)) ENGINE = Memory;
INSERT INTO temp VALUES
(10, [1, 2, 3]),
(7, [3, 4, 1]);
SELECT
col1,
arrayJoin(col2)
FROM temp;
col1 | arrayJoin(col2) |
---|---|
10 | 1 |
10 | 2 |
10 | 3 |
7 | 3 |
7 | 4 |
7 | 1 |
Multicolumn unnest#
Using multiple unnest
operations in the same query leads to unexpected results. Instead of generating separate records for each combination of array elements, it produces records with pairwise matches, skipping elements when the arrays for the same record have different lengths.
The following cell shows such an example - the second record of col3
contains fewer values than the second record of col2
, but the query tries to unnest them sulmiteniously.
--postgreSQL
SELECT
col1,
unnest(col2) AS col2_unnested,
unnest(col3) AS col3_unnested
FROM (
VALUES
(10, '{1,2,3}'::int[], '{"a", "b", "c"}'::text[]),
(7, '{3,4,1}'::int[], '{"a", "b"}'::text[])
) AS temp(col1, col2, col3);
SELECT 6
col1 | col2_unnested | col3_unnested |
---|---|---|
10 | 1 | a |
10 | 2 | b |
10 | 3 | c |
7 | 3 | a |
7 | 4 | b |
7 | 1 |
If you want to get a separate record for each combination of unnested values, just use subquery.
So in the following example, there is a separate record for each combination of values in the corresponding col2
and col3
records.
--postgreSQL
WITH
example_table(col1, col2, col3) AS (
VALUES
(10, '{1,2,3}'::int[], '{"a", "b", "c"}'::text[]),
(7, '{3,4,1}'::int[], '{"a", "b"}'::text[])
)
SELECT col1, col2, UNNEST(col3)
FROM (
SELECT
col1, UNNEST(col2) AS col2, col3
FROM example_table
) AS t;
SELECT 15
col1 | col2 | unnest |
---|---|---|
10 | 1 | a |
10 | 1 | b |
10 | 1 | c |
10 | 2 | a |
10 | 2 | b |
10 | 2 | c |
10 | 3 | a |
10 | 3 | b |
10 | 3 | c |
7 | 3 | a |
7 | 3 | b |
7 | 4 | a |
7 | 4 | b |
7 | 1 | a |
7 | 1 | b |
Alternative solution, using artificial cross-joining to deploy the array into a relational format.
Note now I don’t have time, but in general it would be great to compare the performance of this and solution with subquery.
--postgreSQL
WITH
example_table(col1, col2, col3) AS (
VALUES
(10, '{1,2,3}'::int[], '{"a", "b", "c"}'::text[]),
(7, '{3,4,1}'::int[], '{"a", "b"}'::text[])
)
SELECT col1, unnested_col2, UNNEST(col3) AS col3
FROM example_table, UNNEST(col2) AS unnested_col2;
SELECT 15
col1 | unnested_col2 | col3 |
---|---|---|
10 | 1 | a |
10 | 1 | b |
10 | 1 | c |
10 | 2 | a |
10 | 2 | b |
10 | 2 | c |
10 | 3 | a |
10 | 3 | b |
10 | 3 | c |
7 | 3 | a |
7 | 3 | b |
7 | 4 | a |
7 | 4 | b |
7 | 1 | a |
7 | 1 | b |