Skip to main content

Avoid COUNT DISTINCT on Nullable Multi-Columns

· 2 min read
PawSQL Team
Optimize your SQL Queries by Clicks!

Copyright © 2023 PawSQL

Problem Definition

When using COUNT(DISTINCT) on multiple columns, the result may differ from what you expect. COUNT(DISTINCT col) counts the number of non-NULL distinct values in that column, while COUNT(DISTINCT col, col2) will exclude rows where either column is NULL.

For example, in the following query counting distinct values in column a and column combination (a,b):

select count(distinct t.a) a_cnt, count(distinct t.a,t.b) a_b_cnt from (values row(1,2),row(3,null)) as t(a,b);

The result is 2 distinct values for column a, but only 1 distinct value for column combination (a,b). This may not match your expectation.

Avoid Using CROSS JOIN

· 2 min read
PawSQL Team
Optimize your SQL Queries by Clicks!

Copyright © 2023 PawSQL

Problem Definition

CROSS JOIN performs a Cartesian product between every row of the first table and every row of the second table. It generates table1 rows x table2 rows records. Theoretically it is equivalent to an inner join with condition 1=1. CROSS JOIN can quickly join multiple tables but also generates a large number of records causing low efficiency. Also with no join condition specified, the result may not have practical meaning.

PawSQL provides a risk warning for statements using CROSS JOIN to avoid potential performance issues caused by it.

Example SQL:

SELECT *
FROM lineitem
CROSS JOIN orders;

This combines every record in the lineitem table with all records in the orders table.