Collect array

Collect array#

Here, we take a closer look at functions that vary across SQL dialects, allowing columns in a database to be transformed into an array that fits in one table cell.

The following cell starts the databases we’ll use in Docker. For now, they are:

  • PostgreSQL

  • ClickHouse

docker run --rm -d\
    -e POSTGRES_PASSWORD=docker_app \
    --name collect_array_postgres_examples \
    postgres:15.4 &> /dev/null
docker run --rm -d \
    --name collect_array_click_examples \
    clickhouse/clickhouse-server &> /dev/null
sleep 5

Note: Don’t forget to stop all containers once finished.

docker stop collect_array_postgres_examples collect_array_click_examples
collect_array_postgres_examples
collect_array_click_examples

Syntax#

Different databases have different function that truns column to array:

  • In clikchouse you have to use groupArray function.

  • In postgreSQL you have to use array_agg function.

You can take elements from an array using []. Note: In PostgreSQL, you have to wrap the object in additional () to apply [] to the value.


The following cell shows the application of the groupArray function to the some_value column, along with extracting specific positions from the resulting array.

docker exec collect_array_click_examples clickhouse-client -q "
CREATE TEMPORARY TABLE temp (some_value Int32) ENGINE = Memory;

INSERT INTO temp VALUES
    (4), (3), (3), (1), (3), (3), (3), (5), (5), (1);

SELECT 
    groupArray(some_value), 
    groupArray(some_value)[1], 
    groupArray(some_value)[2] 
FROM temp
FORMAT Pretty;
"
   ┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   ┃ groupArray(a)         ┃ arrayElement(groupArray(a), 1) ┃ arrayElement(groupArray(a), 2) ┃
   ┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
1. │ [4,3,3,1,3,3,3,5,5,1] │                              4 │                              3 │
   └───────────────────────┴────────────────────────────────┴────────────────────────────────┘

The following cell represents an analogous example in PostgreSQL:

docker exec -i collect_array_postgres_examples psql -U postgres -d postgres << EOF
SELECT
    array_agg(some_value),
    (array_agg(some_value))[1] AS element1,
    (array_agg(some_value))[2] AS element2
FROM (
    VALUES
    (4), (3), (3), (1), (3), (3), (3), (5), (5), (1)
) AS temp(some_value);
EOF
       array_agg       | element1 | element2 
-----------------------+----------+----------
 {4,3,3,1,3,3,3,5,5,1} |        4 |        3
(1 row)

Index out of range#

When trying to access an out-of-range position in this command, the common behavior across SQL dialects is to return a reserved value. However, the specific behavior can vary between different SQL dialects.


The following examples demonstrate attempts to access out-of-range positions in different SQL dialects.

Here is an example from PostgreSQL, which shows that PostgreSQL returns NULL in such cases.

docker exec -i collect_array_postgres_examples psql -U postgres -d postgres << EOF
SELECT 
    (array_agg(some_value))[5] IS NULL fifth_element_is_null
FROM (
    VALUES
    (4), (3), (3)
) AS temp(some_value);
EOF
 fifth_element_is_null 
-----------------------
 t
(1 row)

In ClickHouse, it returns reserved values to replace non-existent values. The following cell shows that instead of a non-existent element of the array, ClickHouse returns:

  • 0 for numeric columns.

  • An empty string '' for string columns.

docker exec collect_array_click_examples clickhouse-client -q "
CREATE TEMPORARY TABLE temp (numeric_value Int, string_value String) ENGINE = Memory;

INSERT INTO temp VALUES
    (1, 'a'), (2, 'b'), (3, 'c');


SELECT 
    groupArray(numeric_value)[5] numeric_index_out,
    groupArray(string_value)[5] string_index_out
FROM temp
FORMAT Pretty;
"
   ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
   ┃ numeric_index_out ┃ string_index_out ┃
   ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
1. │                 0 │                  │
   └───────────────────┴──────────────────┘

Ordered array#

Sometimes you’ll need arrays in a specific order. Different SQL dialects achieve this in various ways. In this section, we’ll explore some of them.


In PostgreSQL, you can simply use the ORDER BY clause inside array_agg. The following cell shows the difference between unordered_result and ordered_result.

docker exec -i collect_array_postgres_examples psql -U postgres -d postgres << EOF
SELECT 
    array_agg(categories) unordered_result,
    array_agg(categories ORDER BY order_value) ordered_result
FROM (
    VALUES
    (2, 'a'), (3, 'b'), (1, 'c')
) AS temp(order_value, categories);
EOF
 unordered_result | ordered_result 
------------------+----------------
 {a,b,c}          | {c,a,b}
(1 row)

In ClickHouse, you can achieve this output using the arraySort function. Find out more about the array sort function in the corresponding section of the official documentation. The following cell shows the example: unserted_result is just application of the arraySort function, sorted_result uses arraySort to sort categories over order_value.

docker exec collect_array_click_examples clickhouse-client -q "
CREATE TEMPORARY TABLE temp (order_value Int, categories String) ENGINE = Memory;

INSERT INTO temp VALUES
    (3, 'b'), (1, 'c'), (2, 'a');

SELECT 
    groupArray(categories) unsorted_result,
    arraySort(
        (x, y) -> y, 
        groupArray(categories),  
        groupArray(order_value)
    ) AS sorted_result
FROM temp
FORMAT Pretty;
"
   ┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
   ┃ unsorted_result ┃ sorted_result ┃
   ┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
1. │ ['b','c','a']   │ ['c','a','b'] │
   └─────────────────┴───────────────┘