Skip to main content

30 posts tagged with "blog"

View All Tags

Four Pitfalls of SQL Processing with Null Values

· 10 min read
PawSQL Team
Optimize your SQL Queries by Clicks!

Channel of advanced SQL tuning

Overview

NULL value processing is the most error-prone for database application developers, mainly because we are accustomed to using binary Boolean logic, while the database's processing logic for NULL values is three-valued logic. In fact, the most flawed component in the database optimizers is the logic related to NULL value processing. Even mature database software, such as DB2/Teradata, still has more than 20% of the bugs related NULL processing.

In this article, we analyze the root causes of the NULL value pitfalls, and conclude with a simple and effective examination logic to infer the final result. At the same time, we explain the applicable conditions and solutions for the four common scenarios in daily development work. After reading this article, you will be able to cope with all the scenarios regarding NULL value handling in your daily SQL processing work.

Avoid Using Natural Join

· 2 min read
PawSQL Team
Optimize your SQL Queries by Clicks!

Copyright © 2023 PawSQL

Definition

Natural join is a special type of equijoin that can be used with inner, outer and full joins. It automatically searches for all columns in both tables with the same name and type and performs an equijoin on those columns. Natural join can simplify the statement but implicit join conditions reduce code readability and understanding of relationships between tables, and are also prone to mistaken joins. PawSQL provides a risk warning for statements using natural join to avoid potential correctness issues caused by it.

Mixed Sort Directions Disable Index Usage

· 3 min read
PawSQL Team
Optimize your SQL Queries by Clicks!

Copyright © 2023 PawSQL

Problem Definition

For ORDER BY clauses, all expressions must be sorted in the same ASC or DESC direction to utilize indexes. If an ORDER BY statement uses different sort directions on multiple conditions, indexes cannot be used.

For example, creating an index on the lineitem table in TPCH:

create index l_partkey_suppkey_idx on lineitem(l_partkey, l_suppkey);

Avoid COUNT DISTINCT on Nullable Multi-Columns

· 2 min read
PawSQL Team
Optimize your SQL Queries by Clicks!

Copyright © 2023 PawSQL

Problem Definition

When using COUNT(DISTINCT) on multiple columns, the result may differ from what you expect. COUNT(DISTINCT col) counts the number of non-NULL distinct values in that column, while COUNT(DISTINCT col, col2) will exclude rows where either column is NULL.

For example, in the following query counting distinct values in column a and column combination (a,b):

select count(distinct t.a) a_cnt, count(distinct t.a,t.b) a_b_cnt from (values row(1,2),row(3,null)) as t(a,b);

The result is 2 distinct values for column a, but only 1 distinct value for column combination (a,b). This may not match your expectation.

Avoid Using CROSS JOIN

· 2 min read
PawSQL Team
Optimize your SQL Queries by Clicks!

Copyright © 2023 PawSQL

Problem Definition

CROSS JOIN performs a Cartesian product between every row of the first table and every row of the second table. It generates table1 rows x table2 rows records. Theoretically it is equivalent to an inner join with condition 1=1. CROSS JOIN can quickly join multiple tables but also generates a large number of records causing low efficiency. Also with no join condition specified, the result may not have practical meaning.

PawSQL provides a risk warning for statements using CROSS JOIN to avoid potential performance issues caused by it.

Example SQL:

SELECT *
FROM lineitem
CROSS JOIN orders;

This combines every record in the lineitem table with all records in the orders table.