Query Rewrite
PawSQL's rewrite optimization engine offers a variety of SQL rewrite optimizations, recommending semantically equivalent but more efficient SQL statements for execution.
Correctness Rules
1. Rewrite for Subqueries with ALL
Qualifier
Rule Description
Assuming the following SQL to get users registered after the order system is closed:
select * from customer where c_regdate > all(select o_orderdate from orders)
If there are NULLs in the result of the subquery, this SQL will always return empty. The correct approach is to add a non-null restriction in the subquery or use the max/min syntax.
select * from customer where c_regdate > (select max(o_custkey) from orders)
PawSQL recommends the second approach, which can further optimize SQL through max/min rewrites. Get more detailed information.
Default Alert Level
- Prohibited
Conditions
- Subquery conditions qualified by
ALL
2. IN
Nullable Subquery May Lead to Unexpected Results
Rule Description
For the following SQL that wants to query users without orders,
select * from customer where c_custkey not in (select o_custkey from orders)
If the result set of the subquery contains null values, this SQL will always return empty. The correct approach is to add a non-null restriction in the subquery, i.e.,
select * from customer where c_custkey not in (select o_custkey from orders where o_custkey is not null)
Default Alert Level
- Prohibited
Conditions
- Existence of
IN
subquery conditions - The selected columns in the
IN
subquery can be NULL
3. NPE Rewrite
Rule Description
The NPE (Null Pointer Exception) issue in SQL refers to the situation where aggregate functions like SUM and AVG return NULL when all columns in the SQL query are NULL, which may lead to a null pointer exception in subsequent programs. For example, for the following SQL:
select sum(t.b) from (values row(1,null)) as t(a,b);
The NPE issue can be avoided by using the following method:
SELECT IFNULL(SUM(t.b), 0) from (values row(1,null)) as t(a,b);
This will return 0 instead of NULL, avoiding a null pointer exception.
Oracle: NVL(); SQL Server and MS Access: ISNULL(); MySQL: IFNULL() or COALESCE();
Default Alert Level
- Warning
Conditions
- SUM or AVG aggregate functions
- The arguments of the aggregate functions may all be NULL, including:
- The argument is a column, and the column definition can be NULL
- The argument is an expression, and the expression can be NULL
- The column definition is not NULL, but it is an inner table of an outer join, and the result may be NULL
4. Rewrite =NULL
to IS NULL
Rule Description
-
= null
cannot determine if an expression is null;= null
is always evaluated as false. To check if an expression is null,is null
should be used. -
case expr when null
also cannot determine if an expression is null; to check if an expression is null,case when expr is null
should be used. Incorrect syntax in where/having filter conditions is relatively easy to detect and correct, but using null value judgment in case statements is more difficult to detect.
Default Alert Level
- Prohibited
Conditions
- The statement contains
= null
orcase when expr is null
logic
Performance Optimization Rules
1. Explicitly Prohibit Sorting
Rule Description
In early versions of MySQL, even without an ORDER BY
clause, GROUP BY
would default to sorting by the grouping fields, which could lead to unnecessary file sorting and affect SQL query performance. By adding ORDER BY NULL
, sorting can be forcibly canceled, disabling the sorting of the query result set; PawSQL recognizes and rewrites it.
For example, in the following case,
SELECT l_orderkey, sum(l_quantity)
FROM lineitem
GROUP BY l_orderkey;
In MySQL 5.x versions, group by l_orderkey
would cause default sorting, which can be avoided by adding ORDER BY NULL
.
SELECT l_orderkey, sum(l_quantity)
FROM lineitem
GROUP BY l_orderkey
ORDER BY NULL;
Default Alert Level
- Warning
Conditions
-
MySQL database, version lower than 8.0
-
There are grouping fields and no sorting fields
2. COUNT Scalar Subquery Rewrite
Rule Description
For using COUNT scalar subqueries to determine existence, they can be rewritten as EXISTS subqueries to avoid an aggregation operation. For example, for the following SQL,
select * from customer where (select count(*) from orders where c_custkey=o_custkey) > 0
It can be rewritten as,
select * from customer where exists(select 1 from orders where c_custkey=o_custkey)
Rule Description
Databases can leverage the order of indexes to avoid sorting the columns in the ORDER clause, thereby enhancing SQL performance. However, if the ORDER field is an expression or function, it may not be possible to use the index for sorting.
Default Alert Level
- Notice
Conditions
- Existence of COUNT scalar subquery > 0 condition
3. Unconditional DELETE Rewritten as TRUNCATE
Rule Description
DELETE
statements without query conditions or with always-true conditions will delete all data in the table. DELETE
statements require a large number of logs for transaction rollback and master-slave synchronization. For large tables, this may cause database locking and transaction blocking, while also occupying a large amount of log space. If it is confirmed that the data in the table is no longer needed, TRUNCATE
can be used to replace DELETE
statements. TRUNCATE
is faster than DELETE
because it does not record each deleted row but directly empties the table and releases space.
delete from lineitem
Rewritten as
truncate lineitem
Default Alert Level
- Notice
Conditions
- DELETE statements without conditions or with always-true conditions
4. Implicit Type Conversion Disable Index adoption
Rule Description
When the data types of conditional expressions are different, some implicit data type conversions will occur during the query execution process. Type conversions may apply to constants in the conditions or to columns in the conditions. When type conversions are applied to columns during query execution, indexes cannot be used, which may lead to severe performance issues. For example, for the following SQL,
select count(*) from ORDERS where O_ORDERDATE = current_date();
If the data type of the O_ORDERDATE
column is CHAR(16)
, then the index on O_ORDERDATE
will not be used, resulting in a full table scan. The solution usually involves two approaches: one is to ALTER TABLE
to change the data type of O_ORDERDATE
, and the other is to cast current_date
to CHAR
type (PawSQL provides this rewrite suggestion).
select count(*) from ORDERS where ORDERS.O_ORDERDATE = cast(current_date() as CHAR(16));
Default Alert Level
- Warning
Conditions
- The conditional expression is a filtering condition and is an indexable filtering condition
- The data types on both sides of the filtering condition are different
- According to the database's type conversion priority, the database will prioritize converting columns over constants
5. Elimination of DISTINCT in Subqueries
Rule Description
For subqueries that only perform existence tests, if the subquery contains DISTINCT, it can usually be removed to avoid a deduplication operation, for example,
- IN subquery:
SELECT * FROM customer WHERE c_custkey IN (SELECT DISTINCT o_custkey FROM orders);
It can be simplified to:
SELECT * FROM customer WHERE c_custkey IN (SELECT o_custkey FROM orders);
Default Alert Level
- Notice
Conditions
- Using IN/EXISTS subqueries for existence determination
- The subquery contains DISTINCT/DISTINCT/UNIQUE keywords
6. EXISTS Subqueries to Table Joins Rewrite
Rule Description
EXISTS subqueries return a boolean value to the outer query, indicating whether there are rows that meet the conditions. EXISTS subqueries that meet certain conditions can be converted to JOINs, allowing the database optimizer to have more choices in selecting the driving table, thereby generating a better query plan.
For example, for the following query,
select * from lineitem l where exists (select * from part p where p.p_partkey=l.l_partkey and p.p_name = 'a')
If the subquery returns at most one record for each l.l_partkey
(i.e., there is a uniqueness constraint on the column with the equality condition (p_partkey, p_name)
), then this subquery can be rewritten as follows:
select l.* from lineitem as l, part as p where p.p_partkey = l.l_partkey and p.p_name = 'a'
Conditions
- EXISTS subquery conditions are associated with AND and other conditions
- EXISTS subquery has no grouping and no LIMIT
- EXISTS subquery result set returns UNIQUE rows
- EXISTS subquery and outer query are associated by equality
7. Predicate Pushdown
Rule Description
Predicate Pushdown (FPPD) pushes filtering conditions as far "down" as possible into the internal query blocks of SQL, filtering out some data in advance, reducing the size of the intermediate result set, thereby reducing the amount of data that subsequent calculations need to process, and thus improving SQL execution performance. FPPD is a rewriting optimization.
For example, in the following case, there is a condition nation = 100
in the outer query, which can be pushed down into the person
DT subquery:
select *
from (select c_nationkey nation, 'C' as type, count(1) num
from customer
group by nation
union
select s_nationkey nation, 'S', count(1) num
from supplier
group by nation) as person
where nation = 100
The rewritten SQL is as follows:
select *
from (select c_nationkey nation, 'C' as type, count(1) num
from customer
where c_nationkey = 100
group by nation
union
select s_nationkey nation, 'S', count(1) num
from supplier
where s_nationkey = 100
group by nation) as person
Conditions
-
The filtering condition is an
AND
filtering condition (not a join condition) -
The filtering condition's field comes from the
FROM
subquery (if it is a view, it should be replaced by the SQL defined by the view) -
The subquery has not been optimized and eliminated by query folding
-
The subquery itself does not have a
LIMIT
clause -
The subquery itself does not have window functions such as
rownum
orrank
8. Operations on Index Columns Disable Index
Rule Description
Operations on index columns can cause the index to become ineffective, which can easily lead to full table scans and severe performance issues. Therefore, it is necessary to try to convert operations on index columns to the constant side, such as the following SQL.
select * from tpch.orders where adddate(o_orderdate, INTERVAL 31 DAY) =date '2019-10-10'
The adddate
function will make the index on o_orderdate
unusable, which can be converted into the following equivalent SQL to use the index to improve query efficiency.
select * from tpch.orders where o_orderdate = subdate(date '2019-10-10' , INTERVAL 31 DAY);
PawSQL can help convert a large number of functions and operations related to the +、-、*、/
operators. Click to get more detailed information about this optimization.
Conditions
-
The filtering condition is an
AND
filtering condition (not a join condition) -
The filtering condition is an indexable condition
-
There are calculations or functions on the index column
9. GROUP BY Fields from Different Tables
Rule Description
If the grouping fields come from different tables, the database optimizer will not be able to utilize the order of the index to avoid a sort. If there are equality conditions in the WHERE or HAVING clauses, PawSQL can replace the sorting or grouping fields to make them come from the same table, thereby being able to use the index to avoid a sort. For example, in the following query
select o_custkey, c_name, sum(o.O_TOTALPRICE) from customer c, orders o where o_custkey = c_custkey group by o_custkey, c_name
The grouping fields o_custkey, c_name
come from two tables, and there is a filtering condition o_custkey = c_custkey
, which can be rewritten as
select c_custkey, c_name, sum(o.O_TOTALPRICE) from customer c, orders o where o_custkey = c_custkey group by c_custkey, c_name
Conditions
-
GROUP BY fields come from different tables
-
The filtering condition is an indexable condition
-
There are no calculations or functions on the index column
10. Pushing HAVING Conditions Down to WHERE
Logically, HAVING conditions are executed after grouping, while conditions on the WHERE clause can be executed during table access (index access) or after table access but before grouping. Both of these conditions are less costly than executing after grouping.
Consider the following example,
select c_custkey, count(*) from customer group by c_custkey having c_custkey < 100
The rewritten SQL is,
select c_custkey, count(*) from customer where c_custkey < 100 group by c_custkey
Conditions
- There are no aggregate functions in the HAVING clause
11. IN Subquery Rewrite Optimization
An IN subquery refers to a subquery of the following form. IN subqueries can be rewritten as equivalent related EXISTS subqueries or inner joins, thereby generating a new filtering condition. If there is a suitable index on this filtering condition, or through PawSQL's index recommendation engine recommends a suitable index, better performance can be achieved.
(expr1, expr2...) [NOT] IN (SELECT expr3, expr4, ...)
- Rewriting IN subquery as EXISTS
For example, the following IN subquery language is to obtain customer information with orders in the last year,
select * from customer where c_custkey in (select o_custkey from orders where O_ORDERDATE>=current_date - interval 1 year)
It can be rewritten as an EXISTS subquery, thereby generating a filtering condition (c_custkey = o_custkey):
select * from customer where exists (select * from orders where c_custkey = o_custkey and O_ORDERDATE>=current_date - interval 1 year)
- Rewriting IN subquery as inner join
If the result set of the subquery is not repeated, then the IN subquery can be rewritten as an association between two tables, thereby allowing the database optimizer to plan a better table join order, and also allowing PawSQL to recommend better optimization methods.
For example, in the following SQL, c_custkey is the primary key of the customer table,
select * from orders where o_custkey in (select c_custkey from customer where c_phone like '139%')
Then the above query can be rewritten as
select orders.* from orders, customer where o_custkey=c_custkey and c_phone like '139%'
Click to get more detailed information about this optimization.
Conditions
- The IN subquery is not a reverse condition
- If the result set of the subquery is not repeated, it can be rewritten as an inner join
12. Join Elimination
Rule Description
Join Elimination simplifies SQL by removing tables from the query without affecting the final result, thereby improving query performance. This optimization is usually applicable when the query contains a primary key-foreign key connection and only references the primary key column of the master table. Inner joins and outer joins can both be used for this rewrite optimization.
- Case of inner join elimination
select o.* from orders o inner join customer c on c.c_custkey=o.o_custkey
The orders table (orders) and the customer table (customer) are associated, and c_custkey is the primary key of the customer table, so the customer table can be eliminated, and the rewritten SQL is as follows:
select * from orders where o_custkey
- Case of outer join elimination
select o_custkey from orders left join customer on c_custkey=o_custkey
The customer table can be eliminated, and the rewritten SQL is as follows:
select orders.o_custkey from orders
Conditions
- The query contains a primary key-foreign key connection
- The query only references the primary key column of the master table
13. Pushing LIMIT Down to UNION Branches
Rule Description
The LIMIT clause pushdown optimization pushes the LIMIT clause as far down as possible to filter out some data in advance, reduce the size of the intermediate result set, reduce the amount of data that subsequent calculations need to process, and improve query performance.
For example, in the following case, there is a LIMIT clause in the outer query, which can be pushed down to the inner query for execution:
select *
from (select c_nationkey nation, 'C' as type, count(1) num
from customer
group by c_nationkey
union
select s_nationkey nation, 'S' as type, count(1) num
from supplier
group by nation) as nation_s
order by nation limit 20, 10
The rewritten SQL is as follows:
select *
from (
(select customer.c_nationkey as nation, 'C' as `type`, count(1) as num
from customer
group by customer.c_nationkey
order by customer.c_nationkey limit 30)
union
(select supplier.s_nationkey as nation, 'S' as `type`, count(1) as num
from supplier
group by supplier.s_nationkey
order by supplier.s_nationkey limit 30)) as nation_s
order by nation_s.nation limit 20, 10
Conditions
- The outer query has a
LIMIT
clause - The outer query does not have a
GROUP BY
clause - The outer query's
FROM
only references one table, and it is a subquery - The outer query has no other conditions
- The subquery is a single query or multiple subqueries connected by
UNION/UNION ALL
(or an outer connection of an outer table) - The value of OFFSET is less than a specified threshold
21. Rewrite of MAX/MIN Subqueries
Rule Description
For subqueries using MAX/MIN,
select * from customer where c_custkey = (select max(o_custkey) from orders)
It can be rewritten in the following form to take advantage of the order of the index to avoid an aggregation operation,
select * from customer where c_custkey = (select o_custkey from orders order by o_custkey desc null last limit 1)
Get more detailed information about this optimization.
Conditions
- There is a scalar subquery with MAX/MIN in the SQL
14. ORDER Clause Reordering
Rule Description
If a query includes both sorting fields and grouping fields from the same table but in a different order, the grouping field order can be adjusted to be consistent with the sorting field order, so that the database can avoid a sort operation.
Consider the following two SQLs, the only difference between them is the order of the grouping fields (the first SQL is c_custkey, c_name
, the second SQL is c_name,c_custkey
), since there are no advanced grouping operations such as grouping set/cube/roll up in the grouping fields, so the two SQLs are equivalent. However, their execution plans and execution efficiencies are different, so the first SQL can be rewritten as the second SQL.
select o_custkey, o_orderdate, sum(O_TOTALPRICE)
from orders
group by o_custkey,o_orderdate
order by o_orderdate;
Rewritten as:
select o_custkey, o_orderdate, sum(o_totalprice)
from orders
group by o_orderdate,o_custkey
order by o_orderdate;
Conditions
- There is an order clause with more than 1 member and a group clause in a QueryBlock
- The clauses refer to columns in the same data table without functions or calculations
- The columns in the order clause are a true subset of the group clause
- The order clause is not a prefix of the group clause
15. Rewrite of SELECT with OR Conditions
Rule Description
If a query statement uses OR
conditions, the database optimizer may not be able to use an index to complete the query. For example,
select * from lineitem where l_shipdate = date '2010-12-01' or l_partkey<100
If both of these fields have indexes, the query statement can be rewritten as a UNION
or UNION ALL
query to take advantage of the index to improve query performance.
select * from lineitem where l_shipdate = date '2010-12-01'
union select * from lineitem where l_partkey<100
If the database supports
INDEX MERGING
, the database-related parameters can also be adjusted to enable theINDEX MERGING
optimization strategy to improve database performance. Get more detailed information about this optimization.
Conditions
- The
OR
connected conditions must be able to use an index - The estimated cost of the rewritten
UNION
statement is less than the original SQL - If the conditions of the OR branch are exclusive, then rewrite as
UNION ALL
16. Rewrite of UPDATE or DELETE with OR Conditions
Rule Description
If there are UPDATE or DELETE statements using OR
conditions, the database optimizer may not be able to use an index to complete the operation.
delete from lineitem where l_shipdate = date '2010-12-01' or l_partkey<100
If both of these fields have indexes, it can be rewritten as multiple DELETE statements to take advantage of the index to improve query performance.
delete from lineitem where l_shipdate = date '2010-12-01';
delete from lineitem where l_partkey<100;
Conditions
- The SQL is an UPDATE or DELETE statement
- The UPDATE or DELETE statement has an OR condition
- Each branch of the OR condition can be indexed
17. Elimination of Sorting in Subqueries without LIMIT
Rule Description
If a subquery does not have a LIMIT clause, then the sorting operation of the subquery is meaningless and can be deleted without affecting the final result. Some cases are as follows:
- EXISTS subquery
select * from lineitem l where exists (select * from part p where p.p_partkey=l.l_partkey and p.p_name = 'a' order by p_name )
Can be rewritten as
select * from lineitem l where exists (select * from part p where p.p_partkey=l.l_partkey and p.p_name = 'a')
Conditions
- The subquery has an ORDER clause
- The subquery does not have a LIMIT clause
18. ORDER BY Fields from Different Tables
Rule Description
If the sorting fields come from different tables, the database optimizer will not be able to utilize the order of the index to avoid a sort. If there are equality conditions in the WHERE or HAVING clauses, PawSQL can replace the sorting or grouping fields to make them come from the same table, thereby being able to use the index to avoid a sort. For example, in the following query
select * from customer c, orders o where o_custkey = c_custkey order by o_custkey, c_name
The sorting fields o_custkey, c_name
come from two tables, and there is a filtering condition o_custkey = c_custkey
, which can be rewritten as
select * from customer c, orders
o where o_custkey = c_custkey order by c_custkey, c_name
Conditions
-
ORDER
fields come from different tables -
The filtering condition is an indexable condition
-
There are no calculations or functions on the index column
19. Outer Join Optimization
Rule Description
Outer join optimization refers to the transformation of outer joins into inner joins under certain conditions (the outer table has a NULL rejection condition), thereby allowing the database optimizer to choose a better execution plan and improve the performance of SQL queries.
Consider the following example,
select c_custkey from orders left join customer on c_custkey=o_custkey where C_NATIONKEY < 20
C_NATIONKEY < 20
is a NULL rejection condition on the customer table, so the above left outer join can be rewritten as an inner join,
select c_custkey from orders inner join customer on c_custkey=o_custkey where C_NATIONKEY < 20
Get more detailed information about this optimization.
Conditions
For SQL,
SELECT * T1 FROM T1 LEFT JOIN T2 ON P1(T1,T2) WHERE P(T1,T2) AND R(T2)
If R(T2) is a null rejection condition (NFC), then the above outer join can be transformed into an inner join, that is
SELECT * T1 FROM T1 JOIN T2 ON P1(T1,T2) WHERE P(T1,T2) AND R(T2)
In this way, the optimizer can first apply R(T2), obtain a very small result set, and then associate it with T1.
20. Projection Pushdown
Rule Description
Projection pushdown refers to the process of removing meaningless columns in the DT subquery (not used in the outer query) to reduce IO and network costs, while increasing the chance of the optimizer adopting optimization options that do not require a return table when planning table access.
Consider the following example,
SELECT count(1) FROM (SELECT c_custkey, avg(age) FROM customer group by c_custkey) AS derived_t1;
The rewritten SQL is,
SELECT count(1) FROM (SELECT 1 FROM customer group by c_custkey) AS derived_t1;
Get more detailed information about this optimization.
Conditions
- There are columns in the inner select list that are not used by the outer query block
21. Rewrite of Qualified Subqueries
Rule Description
Subqueries Qualified by ANY/SOME/ALL originate from the SQL-92 standard and are commonly used to check the size relationship between a value and all or any values returned by the subquery. Subqueries Qualified by ANY/SOME/ALL have low execution efficiency because they need to compare with the result set of the subquery row by row, and decrease linearly with the increase of the result set size. Their execution efficiency can be improved by rewriting the query.
For example, for the following SQL:
select * from orders where o_orderdate < all(select o_orderdate from orders where o_custkey > 100)
For MySQL, it can be rewritten as
select * from orders where o_orderdate < (select o_orderdate from orders where o_custkey > 100 order by o_orderdate asc limit 1)
For PostgreSQL or Oracle, it can be rewritten as
select * from orders where o_orderdate < (select o_orderdate from orders where o_custkey > 100 order by o_orderdate asc nulls first limit 1)
Conditions
- There are subqueries Qualified by ANY/SOME/ALL in the SQL
22. Query Folding
Rule Description
Query folding refers to the process of expanding views, CTEs, or DT subqueries and merging them with the query statements that reference them to reduce serialized intermediate result sets or trigger better optimization techniques related to table join planning.
Consider the following example,
SELECT * FROM (SELECT c_custkey, c_name FROM customer) AS derived_t1;
The rewritten SQL is,
SELECT c_custkey, c_name FROM customer
Get more detailed information about this optimization.
Conditions
PawSQL's optimization engine supports two types of query folding optimization strategies for different SQL syntax structures. The first type of query folding optimization is supported internally by the optimizer in MySQL 5.7 and PostgreSQL 14.0 and above; the second type of query folding optimization is not supported in the latest versions of MySQL and PostgreSQL.
-
Query folding type I
-
There is no distinct keyword in the view itself
-
There are no aggregate functions or window functions in the view
-
There is no LIMIT clause in the view itself
-
There is no UNION or UNION ALL in the view itself
-
In the external query block, the folded view is not part of an outer join.
-
-
Query folding type II
-
In the external query block, the
view
is the only table reference -
There are no groupings, aggregate functions, and window functions in the external query block
-
No window functions are used inside the view
-
23. SATTC Rewrite
Rule Description
SAT-TC (SATisfiability-Transitive Closure) rewriting optimization refers to the analysis of a set of related query conditions to discover whether there are contradictory conditions, simplifications, or the inference of new conditions, thereby helping the database optimizer choose a better execution plan and improve SQL performance.
Consider the following example,
select c.c_name FROM customer c where c.c_name = 'John' and c.c_name = 'Jessey'
Since the conditions are contradictory, the rewritten SQL is,
select c.c_name from customer as c where 1 = 0
Get more detailed information about this optimization.
Conditions
-
There are contradictions between predicates (e.g.,
c_custkey=1 AND c_custkey=0
), or -
New predicates can be inferred from the predicate set (e.g.,
c_custkey=1 AND c_custkey=o_custkey
implieso_custkey=1
). -
Predicates can be simplified (e.g.,
c_custkey != c_custkey or c_name = 'b'
can be simplified toc_name = 'b'
)
24. De-Correlation of Select List Scalar Subqueries
Rule Description
Scalar subqueries return a single value of a single row and column, and they can appear anywhere in SQL where a single value is required. For related scalar subqueries, PawSQL attempts to de-correlate them to improve their performance. At the same time, if multiple subqueries with similar main structures appear in the select list, PawSQL will merge them when de-correlating to reduce computation.
For example, the following scalar subquery
select c_custkey,
(select SUM(o_totalprice)
from ORDERS
where o_custkey = c_custkey and o_orderdate = '2020-04-16') as total,
(select count(*)
from ORDERS
where o_custkey = c_custkey and o_orderdate = '2020-04-16') as cnt
from CUSTOMER
Will be rewritten as:
select /*QB_1*/ c_custkey, SUM_ as total, count_ as cnt
from CUSTOMER left outer join (
select o_custkey, SUM(o_totalprice) as SUM_,count(*) as count_
from ORDERS
where o_orderdate = '2020-04-16'
group by o_custkey) as SQ on o_custkey = c_custkey
25. De-Correlation of Conditional Scalar Subqueries
Rule Description
Scalar subqueries return a single value of a single row and column, and they can appear anywhere in SQL where a single value is required. For scalar subqueries that appear in the WHERE and HAVING clauses, PawSQL attempts to de-correlate them to improve their performance.
For example, the following scalar subquery
select c_custkey
from CUSTOMER
where 1000000 < (select SUM(o_totalprice)
from ORDERS
where o_custkey = c_custkey and o_orderdate = '2020-04-16')
Will be rewritten as:
select /*QB_2*/ c_custkey
from CUSTOMER, (
select /*QB_1*/ SUM( o_totalprice) as SUM_, o_custkey
from ORDERS
where o_orderdate = cast('2020-04-16' as DATE)
group by o_custkey) as SQ
where 1000000 < SUM_ and o_custkey = CUSTOMER.c_custkey
26. View Resolving
Rule Description
View expansion refers to the process of replacing the views referenced in the query with their definitions. This optimization technique allows the query optimizer (including the PawSQL optimization engine and the database optimizer) to have more optimization opportunities, which may result in a more efficient execution plan.
For example
-- Create a view
CREATE VIEW high_salary_employees AS
SELECT employee_id, name, department_id
FROM employees
WHERE salary > 50000;
-- Query using the view
SELECT h.name, d.department_name
FROM high_salary_employees h
JOIN departments d ON h.department_id = d.department_id
WHERE h.employee_id < 1000;
Will be rewritten as:
SELECT h.name, d.department_name
FROM high_salary_employees h
JOIN (SELECT employee_id, name, department_id
FROM employees
WHERE salary > 50000) d ON h.department_id = d.department_id
WHERE h.employee_id < 1000;