Avoid COUNT DISTINCT on Nullable Multi-Columns
Copyright © 2023 PawSQL
Problem Definition
When using COUNT(DISTINCT) on multiple columns, the result may differ from what you expect. COUNT(DISTINCT col) counts the number of non-NULL distinct values in that column, while COUNT(DISTINCT col, col2) will exclude rows where either column is NULL.
For example, in the following query counting distinct values in column a and column combination (a,b):
select count(distinct t.a) a_cnt, count(distinct t.a,t.b) a_b_cnt from (values row(1,2),row(3,null)) as t(a,b);
The result is 2 distinct values for column a, but only 1 distinct value for column combination (a,b). This may not match your expectation.