Avoid COUNT DISTINCT on Nullable Multi-Columns
Copyright © 2023 PawSQL
Problem Definition
When using COUNT(DISTINCT) on multiple columns, the result may differ from what you expect. COUNT(DISTINCT col) counts the number of non-NULL distinct values in that column, while COUNT(DISTINCT col, col2) will exclude rows where either column is NULL.
For example, in the following query counting distinct values in column a and column combination (a,b):
select count(distinct t.a) a_cnt, count(distinct t.a,t.b) a_b_cnt from (values row(1,2),row(3,null)) as t(a,b);
The result is 2 distinct values for column a, but only 1 distinct value for column combination (a,b). This may not match your expectation.
PawSQL Auditing
PawSQL provides a corresponding SQL audit warning for this case.
Default Warning Level
- Note
There are 3 warning levels in ascending order: Note < Warning < Error
Conditions
- COUNT DISTINCT function present in SQL
- COUNT has 2 or more expressions as parameters
- At least one parameter can be NULL
Database Types and Versions
- MySQL
- openGauss
- Oracle
- PostgreSQL
- KingbaseES
- MariaDB
About PawSQL
PawSQL is dedicated to automatic and intelligent database performance optimization. The database types supported include MySQL, PostgreSQL, Oracle, and openGauss; and the products provided by PawSQL include:
- PawSQL Cloud, an online automated SQL optimization tool that supports SQL auditing, intelligent query rewriting, cost-based index recommendations, suitable for database administrators and data application developers.
- PawSQL Advisor, an IntelliJ plugin that is suitable for data application developers and can be installed via the IDEA/DataGrip marketplace by searching for "PawSQL Advisor" by name.
- PawSQL Engine, which is the backend optimization engine of the PawSQL series of products, can be installed and deployed independently, and provides SQL optimization services through http/json interfaces. PawSQL Engine is provided for deployment and installation as a docker image.
Contact Us
Website: https://docs.pawsql.com
Email: service@pawsql.com
Twitter: https://twitter.com/pawsql