Expand array

Expand array#

Sometimes you need to deal with arrays of arrays or, more scientifically, a column containing arrays in each record.

You can perform this operation using:

  • The UNNEST(<column>) function in postgres.

  • The arrayJoin(<column>) functoin in clickhouse.

Basic example#

The following cell demonstrates array expansion in PostgreSQL using the unnest function.

--postgreSQL
SELECT 
    col1, 
    unnest(col2) AS unnestted_value
FROM (
    VALUES
    (10, '{1,2,3}'::int[]),
    (7, '{3,4,1}'::int[])
) AS temp(col1, col2);
SELECT 6
col1 unnestted_value
10 1
10 2
10 3
7 3
7 4
7 1

The following cell performs the same operation in ClickHouse using the arrayJoin function. The result is identical.

--ClickHouse
CREATE TEMPORARY TABLE IF NOT EXISTS temp (col1 Int32, col2 Array(Int32)) ENGINE = Memory;

INSERT INTO temp VALUES
    (10, [1, 2, 3]),
    (7, [3, 4, 1]);

SELECT
    col1,
    arrayJoin(col2)
FROM temp;
col1 arrayJoin(col2)
10 1
10 2
10 3
7 3
7 4
7 1

Multicolumn unnest#

Using multiple unnest operations in the same query leads to unexpected results. Instead of generating separate records for each combination of array elements, it produces records with pairwise matches, skipping elements when the arrays for the same record have different lengths.


The following cell shows such an example - the second record of col3 contains fewer values than the second record of col2, but the query tries to unnest them sulmiteniously.

--postgreSQL
SELECT
    col1, 
    unnest(col2) AS col2_unnested,
    unnest(col3) AS col3_unnested
FROM (
    VALUES
    (10, '{1,2,3}'::int[], '{"a", "b", "c"}'::text[]),
    (7, '{3,4,1}'::int[], '{"a", "b"}'::text[])
) AS temp(col1, col2, col3);
SELECT 6
col1 col2_unnestedcol3_unnested
10 1a
10 2b
10 3c
7 3a
7 4b
7 1

If you want to get a separate record for each combination of unnested values, just use subquery.


So in the following example, there is a separate record for each combination of values in the corresponding col2 and col3 records.

--postgreSQL
WITH 
example_table(col1, col2, col3) AS (
    VALUES
    (10, '{1,2,3}'::int[], '{"a", "b", "c"}'::text[]),
    (7, '{3,4,1}'::int[], '{"a", "b"}'::text[])
)

SELECT col1, col2, UNNEST(col3)
FROM (
    SELECT 
        col1, UNNEST(col2) AS col2, col3 
    FROM example_table
) AS t;
SELECT 15
col1 col2unnest
10 1a
10 1b
10 1c
10 2a
10 2b
10 2c
10 3a
10 3b
10 3c
7 3a
7 3b
7 4a
7 4b
7 1a
7 1b

Alternative solution, using artificial cross-joining to deploy the array into a relational format.

Note now I don’t have time, but in general it would be great to compare the performance of this and solution with subquery.

--postgreSQL
WITH 
example_table(col1, col2, col3) AS (
    VALUES
    (10, '{1,2,3}'::int[], '{"a", "b", "c"}'::text[]),
    (7, '{3,4,1}'::int[], '{"a", "b"}'::text[])
)

SELECT col1, unnested_col2, UNNEST(col3) AS col3
FROM example_table, UNNEST(col2) AS unnested_col2;
SELECT 15
col1 unnested_col2col3
10 1a
10 1b
10 1c
10 2a
10 2b
10 2c
10 3a
10 3b
10 3c
7 3a
7 3b
7 4a
7 4b
7 1a
7 1b