Conditions on aggregats (HAVING)

Conditions on aggregats (HAVING)#

In the next cell, I create everything I need for the examples in this page.

docker run --rm -d\
    -e POSTGRES_PASSWORD=postgres \
    --name cond_on_agregats_example \
    postgres:15.4 &> /dev/null
sleep 5
docker exec -i cond_on_agregats_example psql -U postgres -d postgres << EOF
CREATE TABLE aggregation_table(
    col1 TEXT,
    col2 INT
);
INSERT INTO aggregation_table(col1, col2) VALUES
('A', 5),
('A', 1),
('B', 2),
('B', 1),
('C', 3),
('C', 4);
EOF
CREATE TABLE
INSERT 0 6

Note don’t forget to stop the container when you finish playing with examples.

docker stop cond_on_agregats_example
cond_on_agregats_example

Let’s begin with the view of the example table.

docker exec -i cond_on_agregats_example psql -U postgres << EOF
SELECT * FROM aggregation_table;
EOF
 col1 | col2 
------+------
 A    |    5
 A    |    1
 B    |    2
 B    |    1
 C    |    3
 C    |    4
(6 rows)

Now a problem: I need to aggregate SUM(col2) by the values of col1 and I only get the results where the sums are greater than 5. So I need to set a condition on a result of the aggregation.

Won’t work#

The first thing that comes to mind is to use the arger functions inside the WHERE block. This will cause an error because the WHERE block in sql is executed before all aggregations.

docker exec -i cond_on_agregats_example psql -U postgres << EOF

SELECT col1, SUM(col2)
FROM aggregation_table 
WHERE SUM(col2) > 5
GROUP BY col1;

EOF
ERROR:  aggregate functions are not allowed in WHERE
LINE 3: WHERE SUM(col2) > 5
              ^

Subquery#

A possible but not optimal solution is to use aggregation in the subquery and then describe the condition on the aggregation in the external query.

In the following example, I just solve the problem mentioned at the beginning of the page using this path.

docker exec -i cond_on_agregats_example psql -U postgres << EOF
SELECT * FROM (
    SELECT col1, SUM(col2)
    FROM aggregation_table
    GROUP BY col1
) AS tab1
WHERE sum > 5;
EOF
 col1 | sum 
------+-----
 C    |   7
 A    |   6
(2 rows)

HAVING#

There’s a special keyword for describing these cases: HAVING, which is the same as WHERE, but for aggregates. It’s the optimal way to solve such a task.

You have to use HAVING after GROUP BY satement.

In the following example, I just solve the problem mentioned at the beginning of the page using this path.

docker exec -i cond_on_agregats_example psql -U postgres << EOF

SELECT col1, SUM(col2)
FROM aggregation_table
GROUP BY col1
HAVING SUM(col2) > 5;

EOF
 col1 | sum 
------+-----
 C    |   7
 A    |   6
(2 rows)

Note You can also use conditions on aggregates not mentioned in the SELECT statement. So in the next cell I got sums of col2 by col1, but only for cases where the average of col2 by col1 is greater than 3.

docker exec -i cond_on_agregats_example psql -U postgres << EOF

SELECT col1, SUM(col2)
FROM aggregation_table
GROUP BY col1
HAVING AVG(col2) > 3;

EOF
 col1 | sum 
------+-----
 C    |   7
(1 row)

Note You can use aggregation variables in conditions. So in the following example I got sums of col2 only for certain values of col1.

docker exec -i cond_on_agregats_example psql -U postgres << EOF

SELECT col1, SUM(col2)
FROM aggregation_table
GROUP BY col1
HAVING col1 IN ('A', 'B');

EOF
 col1 | sum 
------+-----
 B    |   3
 A    |   6
(2 rows)