Conditions on aggregats (HAVING)#
In the next cell, I create everything I need for the examples in this page.
docker run --rm -d\
-e POSTGRES_PASSWORD=postgres \
--name cond_on_agregats_example \
postgres:15.4 &> /dev/null
sleep 5
docker exec -i cond_on_agregats_example psql -U postgres -d postgres << EOF
CREATE TABLE aggregation_table(
col1 TEXT,
col2 INT
);
INSERT INTO aggregation_table(col1, col2) VALUES
('A', 5),
('A', 1),
('B', 2),
('B', 1),
('C', 3),
('C', 4);
EOF
CREATE TABLE
INSERT 0 6
Note don’t forget to stop the container when you finish playing with examples.
docker stop cond_on_agregats_example
cond_on_agregats_example
Let’s begin with the view of the example table.
docker exec -i cond_on_agregats_example psql -U postgres << EOF
SELECT * FROM aggregation_table;
EOF
col1 | col2
------+------
A | 5
A | 1
B | 2
B | 1
C | 3
C | 4
(6 rows)
Now a problem: I need to aggregate SUM(col2)
by the values of col1
and I only get the results where the sums are greater than 5. So I need to set a condition on a result of the aggregation.
Won’t work#
The first thing that comes to mind is to use the arger functions inside the WHERE
block. This will cause an error because the WHERE
block in sql is executed before all aggregations.
docker exec -i cond_on_agregats_example psql -U postgres << EOF
SELECT col1, SUM(col2)
FROM aggregation_table
WHERE SUM(col2) > 5
GROUP BY col1;
EOF
ERROR: aggregate functions are not allowed in WHERE
LINE 3: WHERE SUM(col2) > 5
^
Subquery#
A possible but not optimal solution is to use aggregation in the subquery and then describe the condition on the aggregation in the external query.
In the following example, I just solve the problem mentioned at the beginning of the page using this path.
docker exec -i cond_on_agregats_example psql -U postgres << EOF
SELECT * FROM (
SELECT col1, SUM(col2)
FROM aggregation_table
GROUP BY col1
) AS tab1
WHERE sum > 5;
EOF
col1 | sum
------+-----
C | 7
A | 6
(2 rows)
HAVING
#
There’s a special keyword for describing these cases: HAVING
, which is the same as WHERE
, but for aggregates. It’s the optimal way to solve such a task.
You have to use HAVING
after GROUP BY
satement.
In the following example, I just solve the problem mentioned at the beginning of the page using this path.
docker exec -i cond_on_agregats_example psql -U postgres << EOF
SELECT col1, SUM(col2)
FROM aggregation_table
GROUP BY col1
HAVING SUM(col2) > 5;
EOF
col1 | sum
------+-----
C | 7
A | 6
(2 rows)
Note You can also use conditions on aggregates not mentioned in the SELECT
statement. So in the next cell I got sums of col2
by col1
, but only for cases where the average of col2
by col1
is greater than 3.
docker exec -i cond_on_agregats_example psql -U postgres << EOF
SELECT col1, SUM(col2)
FROM aggregation_table
GROUP BY col1
HAVING AVG(col2) > 3;
EOF
col1 | sum
------+-----
C | 7
(1 row)
Note You can use aggregation variables in conditions. So in the following example I got sums of col2
only for certain values of col1
.
docker exec -i cond_on_agregats_example psql -U postgres << EOF
SELECT col1, SUM(col2)
FROM aggregation_table
GROUP BY col1
HAVING col1 IN ('A', 'B');
EOF
col1 | sum
------+-----
B | 3
A | 6
(2 rows)