Skip to main content

Join Elimination

· 5 min read
PawSQL Team
Optimize your SQL Queries by Clicks!

Copyright © 2024 PawSQL

Definition

Join Elimination is a rewriting optimization in SQL that simplifies queries and improves query performance by removing joins from the query without affecting the final result. Typically, this optimization is used when a query contains a primary-foreign key join and only references the primary key columns of the main table.

Consider the following example:

select o.* from orders o inner join customer c on c.c_custkey=o.o_custkey

The orders table is joined with the customer table, and c_custkey is the primary key of the customer table. In this case, the customer table can be eliminated, and the rewritten SQL would be:

select * from orders where o_custkey

Types of Join Elimination

  • Inner Join Elimination
  • Outer Join Elimination

Inner Join Elimination

Conditions for Inner join elimination:

  • Fact-based primary-foreign key equality join (parent table's join column is non-null and unique)
  • The parent table's primary key is the only column of the parent table referenced in the query.

Inner join elimination works in the following way:

  • The parent table and the primary-foreign key join condition are eliminated.
  • Other references to the parent table's primary key are replaced with the foreign key of the external table.
  • If the foreign key can be null and there are no other NFC[^1] conditions, a new non-null foreign key condition needs to be added.

Example:

  • Original SQL:
select c_custkey from customer,orders where c_custkey=o_custkey
  • Rewritten SQL:
  select orders.o_custkey from orders where orders.o_custkey is not null

Outer Join Elimination

Conditions for outer join elimination:

  • The outer join to be eliminated must be a left or right outer join.
  • The join condition must have a primary-foreign key equality join connected by AND.
  • The primary key of the inner table (non-null and unique) is the only column of the inner table referenced in the query.

Outer join elimination works in the following way:

  • The inner table and all join conditions are eliminated.
  • Other references to the inner table's primary key are replaced with the outer table's foreign key.

Example 1: PK only appears in the join condition

  • Original SQL:
select o_custkey from orders left join customer on c_custkey=o_custkey
  • Rewritten SQL:
select orders.o_custkey from orders

Example 2: PK appears elsewhere

  • Original SQL:
select orders.* from customer right join orders on c_custkey=o_custkey and c_custkey>20 where o_orderstatus='T'
  • Rewritten SQL:
select orders.* from orders where orders.o_orderstatus = 'T'

Join Elimination in DBMS

Join Elimination is advanced optimization technique and provided by some advanced commercial database management systems, such as Oracle, DB2. But for MySQL and PostgreSQL, Join Elimination is not supported yet.

For the SQL in the first section:

select o.* from orders o inner join customer c on c.c_custkey=o.o_custkey

Execution plan in MySQL is:

-> Inner hash join (o.O_CUSTKEY = c.C_CUSTKEY)  (cost=20541.07 rows=20013)
-> Table scan on o (cost=2529.20 rows=200128)
-> Hash
-> Index scan on c using key_idx (cost=0.35 rows=1)

Execution plan in PostgreSQL is:

Hash Join  (cost=401.29..711.56 rows=10001 width=115)
Hash Cond: (o.o_custkey = c.c_custkey)
-> Seq Scan on orders o (cost=0.00..284.01 rows=10001 width=115)
-> Hash (cost=276.29..276.29 rows=10000 width=4)
-> Index Only Scan using customer_pkey on customer c (cost=0.29..276.29 rows=10000 width=4)

As can be seen, neither MySQL nor PostgreSQL support join elimination rewrite optimization.

Join Elimination in PawSQL

PawSQL provides a comprehensive join elimination optimization through the [JoinEliminationRewrite].

  • Input SQL statement:
 select orders.* from customer right join orders on c_custkey=o_custkey and c_custkey>20
  • Rewritten SQL statement after Join Elimination
select orders.* from orders
  • Execution plan before optimization(MySQL)
  -> Nested loop left join  (cost=90585.51 rows=200128)
-> Table scan on orders (cost=20540.71 rows=200128)
-> Filter: (orders.O_CUSTKEY > 20) (cost=0.25 rows=1)
-> Single-row covering index lookup on customer using key_idx (C_CUSTKEY=orders.O_CUSTKEY) (cost=0.25 rows=1)
  • Execution plan after optimization:
-> Table scan on orders  (cost=20540.71 rows=200128)

As shown, PawSQL provides excellent support for join elimination rewrite optimization, resulting in a performance improvement of 441.01% through the elimination of meaningless table joins.

Conclusion

Since MySQL and PostgreSQL do not support join elimination, join elimination optimization provided by PawSQL is a meaningful supplement for them. By eliminating meaningless table joins before SQL is deployed in a production environment, PawSQL can prevent the database from wasting resources on unnecessary table join operations.

About PawSQL

PawSQL is dedicated to automatic and intelligent database performance optimization. The products provided by PawSQL include:

  • PawSQL Cloud, an online automated SQL optimization tool that supports SQL auditing, intelligent query rewriting, cost-based index recommendations, suitable for database administrators and data application developers.
  • PawSQL Advisor, an IntelliJ plugin that is suitable for data application developers and can be installed via the IDEA/DataGrip marketplace by searching for "PawSQL Advisor" by name.
  • PawSQL Engine, which is the backend optimization engine of the PawSQL series of products, can be installed and deployed independently, and provides SQL optimization services through http/json interfaces. PawSQL Engine is provided for deployment and installation as a docker image.

Contact Us

Email: service@pawsql.com

Twitter: https://twitter.com/pawsql