Collect array#
Here, we take a closer look at functions that vary across SQL dialects, allowing columns in a database to be transformed into an array that fits in one table cell.
The following cell starts the databases we’ll use in Docker. For now, they are:
PostgreSQL
ClickHouse
docker run --rm -d\
-e POSTGRES_PASSWORD=docker_app \
--name collect_array_postgres_examples \
postgres:15.4 &> /dev/null
docker run --rm -d \
--name collect_array_click_examples \
clickhouse/clickhouse-server &> /dev/null
sleep 5
Note: Don’t forget to stop all containers once finished.
docker stop collect_array_postgres_examples collect_array_click_examples
collect_array_postgres_examples
collect_array_click_examples
Syntax#
Different databases have different function that truns column to array:
In clikchouse you have to use
groupArray
function.In postgreSQL you have to use
array_agg
function.
You can take elements from an array using []
. Note: In PostgreSQL, you have to wrap the object in additional ()
to apply []
to the value.
The following cell shows the application of the groupArray
function to the some_value
column, along with extracting specific positions from the resulting array.
docker exec collect_array_click_examples clickhouse-client -q "
CREATE TEMPORARY TABLE temp (some_value Int32) ENGINE = Memory;
INSERT INTO temp VALUES
(4), (3), (3), (1), (3), (3), (3), (5), (5), (1);
SELECT
groupArray(some_value),
groupArray(some_value)[1],
groupArray(some_value)[2]
FROM temp
FORMAT Pretty;
"
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ groupArray(a) ┃ arrayElement(groupArray(a), 1) ┃ arrayElement(groupArray(a), 2) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
1. │ [4,3,3,1,3,3,3,5,5,1] │ 4 │ 3 │
└───────────────────────┴────────────────────────────────┴────────────────────────────────┘
The following cell represents an analogous example in PostgreSQL:
docker exec -i collect_array_postgres_examples psql -U postgres -d postgres << EOF
SELECT
array_agg(some_value),
(array_agg(some_value))[1] AS element1,
(array_agg(some_value))[2] AS element2
FROM (
VALUES
(4), (3), (3), (1), (3), (3), (3), (5), (5), (1)
) AS temp(some_value);
EOF
array_agg | element1 | element2
-----------------------+----------+----------
{4,3,3,1,3,3,3,5,5,1} | 4 | 3
(1 row)
Index out of range#
When trying to access an out-of-range position in this command, the common behavior across SQL dialects is to return a reserved value. However, the specific behavior can vary between different SQL dialects.
The following examples demonstrate attempts to access out-of-range positions in different SQL dialects.
Here is an example from PostgreSQL, which shows that PostgreSQL returns NULL in such cases.
docker exec -i collect_array_postgres_examples psql -U postgres -d postgres << EOF
SELECT
(array_agg(some_value))[5] IS NULL fifth_element_is_null
FROM (
VALUES
(4), (3), (3)
) AS temp(some_value);
EOF
fifth_element_is_null
-----------------------
t
(1 row)
In ClickHouse, it returns reserved values to replace non-existent values. The following cell shows that instead of a non-existent element of the array, ClickHouse returns:
0
for numeric columns.An empty string
''
for string columns.
docker exec collect_array_click_examples clickhouse-client -q "
CREATE TEMPORARY TABLE temp (numeric_value Int, string_value String) ENGINE = Memory;
INSERT INTO temp VALUES
(1, 'a'), (2, 'b'), (3, 'c');
SELECT
groupArray(numeric_value)[5] numeric_index_out,
groupArray(string_value)[5] string_index_out
FROM temp
FORMAT Pretty;
"
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ numeric_index_out ┃ string_index_out ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
1. │ 0 │ │
└───────────────────┴──────────────────┘
Ordered array#
Sometimes you’ll need arrays in a specific order. Different SQL dialects achieve this in various ways. In this section, we’ll explore some of them.
In PostgreSQL, you can simply use the ORDER BY
clause inside array_agg
. The following cell shows the difference between unordered_result
and ordered_result
.
docker exec -i collect_array_postgres_examples psql -U postgres -d postgres << EOF
SELECT
array_agg(categories) unordered_result,
array_agg(categories ORDER BY order_value) ordered_result
FROM (
VALUES
(2, 'a'), (3, 'b'), (1, 'c')
) AS temp(order_value, categories);
EOF
unordered_result | ordered_result
------------------+----------------
{a,b,c} | {c,a,b}
(1 row)
In ClickHouse, you can achieve this output using the arraySort
function. Find out more about the array sort function in the corresponding section of the official documentation. The following cell shows the example: unserted_result
is just application of the arraySort
function, sorted_result
uses arraySort
to sort categories
over order_value
.
docker exec collect_array_click_examples clickhouse-client -q "
CREATE TEMPORARY TABLE temp (order_value Int, categories String) ENGINE = Memory;
INSERT INTO temp VALUES
(3, 'b'), (1, 'c'), (2, 'a');
SELECT
groupArray(categories) unsorted_result,
arraySort(
(x, y) -> y,
groupArray(categories),
groupArray(order_value)
) AS sorted_result
FROM temp
FORMAT Pretty;
"
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ unsorted_result ┃ sorted_result ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
1. │ ['b','c','a'] │ ['c','a','b'] │
└─────────────────┴───────────────┘