31 posts tagged with "pawsql"
View All TagsFour Pitfalls of SQL Processing with Null Values
Channel of advanced SQL tuning
Overview
NULL value processing is the most error-prone for database application developers, mainly because we are accustomed to using binary Boolean logic, while the database's processing logic for NULL values is three-valued logic. In fact, the most flawed component in the database optimizers is the logic related to NULL value processing. Even mature database software, such as DB2/Teradata, still has more than 20% of the bugs related NULL processing.
In this article, we analyze the root causes of the NULL value pitfalls, and conclude with a simple and effective examination logic to infer the final result. At the same time, we explain the applicable conditions and solutions for the four common scenarios in daily development work. After reading this article, you will be able to cope with all the scenarios regarding NULL value handling in your daily SQL processing work.
Avoid Using Natural Join
Copyright © 2023 PawSQL
Definition
Natural join is a special type of equijoin that can be used with inner, outer and full joins. It automatically searches for all columns in both tables with the same name and type and performs an equijoin on those columns. Natural join can simplify the statement but implicit join conditions reduce code readability and understanding of relationships between tables, and are also prone to mistaken joins. PawSQL provides a risk warning for statements using natural join to avoid potential correctness issues caused by it.
Mixed Sort Directions Disable Index Usage
Copyright © 2023 PawSQL
Problem Definition
For ORDER BY clauses, all expressions must be sorted in the same ASC or DESC direction to utilize indexes. If an ORDER BY statement uses different sort directions on multiple conditions, indexes cannot be used.
For example, creating an index on the lineitem
table in TPCH:
create index l_partkey_suppkey_idx on lineitem(l_partkey, l_suppkey);
Avoid COUNT DISTINCT on Nullable Multi-Columns
Copyright © 2023 PawSQL
Problem Definition
When using COUNT(DISTINCT) on multiple columns, the result may differ from what you expect. COUNT(DISTINCT col) counts the number of non-NULL distinct values in that column, while COUNT(DISTINCT col, col2) will exclude rows where either column is NULL.
For example, in the following query counting distinct values in column a and column combination (a,b):
select count(distinct t.a) a_cnt, count(distinct t.a,t.b) a_b_cnt from (values row(1,2),row(3,null)) as t(a,b);
The result is 2 distinct values for column a, but only 1 distinct value for column combination (a,b). This may not match your expectation.