Skip to main content

Avoid COUNT DISTINCT on Nullable Multi-Columns

· 2 min read
PawSQL Team
Optimize your SQL Queries by Clicks!

Copyright © 2023 PawSQL

Problem Definition

When using COUNT(DISTINCT) on multiple columns, the result may differ from what you expect. COUNT(DISTINCT col) counts the number of non-NULL distinct values in that column, while COUNT(DISTINCT col, col2) will exclude rows where either column is NULL.

For example, in the following query counting distinct values in column a and column combination (a,b):

select count(distinct t.a) a_cnt, count(distinct t.a,t.b) a_b_cnt from (values row(1,2),row(3,null)) as t(a,b);

The result is 2 distinct values for column a, but only 1 distinct value for column combination (a,b). This may not match your expectation.

PawSQL Auditing

PawSQL provides a corresponding SQL audit warning for this case.

Default Warning Level

  • Note

There are 3 warning levels in ascending order: Note < Warning < Error

Conditions

  • COUNT DISTINCT function present in SQL
  • COUNT has 2 or more expressions as parameters
  • At least one parameter can be NULL

Database Types and Versions

  • MySQL
  • openGauss
  • Oracle
  • PostgreSQL
  • KingbaseES
  • MariaDB

About PawSQL

PawSQL is dedicated to automatic and intelligent database performance optimization. The database types supported include MySQL, PostgreSQL, Oracle, and openGauss; and the products provided by PawSQL include:

  • PawSQL Cloud, an online automated SQL optimization tool that supports SQL auditing, intelligent query rewriting, cost-based index recommendations, suitable for database administrators and data application developers.
  • PawSQL Advisor, an IntelliJ plugin that is suitable for data application developers and can be installed via the IDEA/DataGrip marketplace by searching for "PawSQL Advisor" by name.
  • PawSQL Engine, which is the backend optimization engine of the PawSQL series of products, can be installed and deployed independently, and provides SQL optimization services through http/json interfaces. PawSQL Engine is provided for deployment and installation as a docker image.

Contact Us

Website: https://docs.pawsql.com

Email: service@pawsql.com

Twitter: https://twitter.com/pawsql