Calculation Disables Index Usage
Copyright © 2024 PawSQL
Problem
Let's say we have a b-tree index c_acctbal_idx
defined on table tpch.customer
to accelarate queries with a condition on column c_acctbal
. Let's check a query as follows:
select * from tpch.customer where c_acctbal + 100 = 10000.0
Use explain
to get the query plan:
-> Filter: ((customer.C_ACCTBAL + 100) = 10000.0) (cost=102301.50 rows=900955)
-> Table scan on customer (cost=102301.50 rows=900955)
From the query plan, we can see the database optimizer uses a table scan instead of the index lookup as we expected. The query cost is very high because of the full table scan and the arithmatic compution on each of the row.
Diagnosis
Let's replace the condition with an equivalent one by moving the calculation from left to right,
select * from tpch.customer where c_acctbal = 10000.0 - 100;
Use explain
to get the query plan:
-> Index lookup on customer using c_acctbal_idx (C_ACCTBAL=(10000.0 - 100)) (cost=5.50 rows=5)
From the query plan , we can see the database optimizer chooses index lookup
using index c_acctbal_idx
and the cost is less than 1%% of the previous SQL query.
Diagnosis: From above experiment, we can conclude that if there is some calculation on indexed column, the index usage will be disabled. This conclusion is not only appliable to arthematic but also functions, and all other calculations on indexed column.
select * from tpch.orders where ADDDATE(o_orderdate, INTERVAL 31 DAY) =date '2019-10-10'
While following equivalent query can take advantage of index on o_orderdate
select * from tpch.orders where o_orderdate = subdate(date '2019-10-10' , INTERVAL 31 DAY);
Solution
There are two solutions to solve this problem,
-
one solution is to rewrite the original query to an equivalent one by moving the calculation to other side of the condition, as the examples in Diagnosis paragraph, so that the query can take advantage of existing indexes on the column.
-
the other solution is to add a functional index to match the condition.
create index phone_func_idx on customer(left(c_phone, 3))
The functional index is not applicable to other conditions if the conditions don't have the same signature of the index definition. For example, the above index will not applicable to a condition
left(c_phone, 4) = '1399'
.
PawSQL
PawSQL implements the first solution by automatically detecting the problematic predicates, automatically rewriting them into equivalent ones in rewrite optimization rule FuncWithColumnInPredicateRewrite
.
Built-in Operators
PawSQL support following arithmatic operators and the combination of them.
* | Multiplication operator |
---|---|
+ | Addition operator |
- | Minus operator |
/ | Division operator |
- | Change the sign of the argument |
Built-in Functions
MySQL(8.0)
PawSQL for MySQL built-in functions,
Name | Description | Column in function | PawSQL Rewrite |
---|---|---|---|
ADDDATE() | Add time values (intervals) to a date value | ADDDATE(column , 31) = date '2019-10-10' ADDDATE(column , INTERVAL 31 DAY) =date '2019-10-10' ; | column = subdate(date '2019-10-10' , 31) column = subdate(date '2019-10-10' , INTERVAL 31 DAY) |
DATE_ADD() | Add time values (intervals) to a date value | DATE_ADD(column , 31) = date '2019-10-10' DATE_ADD(column , INTERVAL 31 DAY); | column = subdate(date '2019-10-10' , 31) column = subdate(date '2019-10-10' , INTERVAL 31 DAY) |
DATE_SUB() | Subtract a time value (interval) from a date | DATE_SUB(column , 31) = date '2019-10-10' DATE_SUB(column , INTERVAL 31 DAY); | column = adddate(date '2019-10-10' , 31) column = adddate(date '2019-10-10' , INTERVAL 31 DAY) |
SUBDATE() | Synonym for DATE_SUB() when invoked with three arguments | SUBDATE(column , 31) = date '2019-10-10' SUBDATE(column , INTERVAL 31 DAY); | column = adddate(date '2019-10-10' , 31) column = adddate(date '2019-10-10' , INTERVAL 31 DAY) |
DATEDIFF() | Subtract two dates | DATEDIFF(currentdate(),col) > 7 | column < datesub(currentdate(),7) |
ADDTIME() | Add time | ADDTIME(column , '02:00:00.999998') > '23:00:00.999998' | column > subtime('23:00:00.999998' - '02:00:00.999998') |
SUBTIME() | Subtract times | ADDTIME(column , '02:00:00.999998') > '23:00:00.999998' | column > subtime('23:00:00.999998' - '02:00:00.999998') |
TIMEDIFF() | Subtract time | TIMEDIFF(column , '2000-01-01 00:00:00.000001') = '10:10:1.0000001' | column = addtime('2000-01-01 00:00:00.000001', '10:10:1.0000001') |
TIMESTAMPADD() | Add an interval to a datetime expression | TIMESTAMPADD(MINUTE,1,column )> '2003-01-02 00:01:00' | column > TIMESTAMPADD(MINUTE, -1,'2003-01-02 00:01:00' ) |
TIMESTAMPDIFF() | Subtract an interval from a datetime expression | TIMESTAMPDIFF(MINUTE,CURRENT_TIMESTAMP(),column )> 10 | TIMESTAMPDIFF(MINUTE,10, CURRENT_TIMESTAMP())> col |
SEC_TO_TIME() | Converts seconds to 'hh:mm:ss' format | SEC_TO_TIME(column )= '22:23:00' | column = TIME_TO_SEC('22:23:00') |
TIME_TO_SEC() | Return the argument converted to seconds | TIME_TO_SEC(column )= 32435 | column = SEC_TO_TIME(32435) |
FROM_DAYS() | Convert a day number to a date | FROM_DAYS(column )= '2022-10-10' | column = TO_DAYS('2022-10-10') |
TO_DAYS() | Return the date argument converted to days | TO_DAYS(column )= 32435 | column = FROM_DAYS(32435) |
FROM_UNIXTIME() | Format Unix timestamp as a date | FROM_UNIXTIME(column )= '2005-03-27 03:00:00' | column = UNIX_TIMESTAMP('2005-03-27 03:00:00') |
UNIX_TIMESTAMP() | Return a Unix timestamp | UNIX_TIMESTAMP(column ) = 1447431619 | column = FROM_UNIXTIME(1447431619) |
INET_ATON() | Return the numeric value of an IP address | INET_ATON(column )= 13234234 | column = INET_NTOA(13234234) |
INET_NTOA() | Return the IP address from a numeric value | INET_NTOA(column ) = '201.1.1.2' | column = INET_ATON('201.1.1.2') |
INET6_ATON() | Return the numeric value of an IPv6 address | INET_ATON(column )= 13234234 | column = INET_NTOA(13234234) |
INET6_NTOA() | Return the IPv6 address from a numeric value | INET_NTOA(column ) = '201.1.1.2' | column = INET_ATON('201.1.1.2') |
LEFT() | Return the leftmost number of characters as specified | LEFT(column , 5) = 'abcde'; LEFT(column , 5) = 'abcd' | column like 'abcde%' column = 'abcd' |
STRCMP() | Compare two strings | STRCMP(column , 'abc') = 1,-1, 0 | column > 'abc', col < 'abc', column = 'abc' |
DATE_FORMAT() | Format date as specified | DATE_FORMAT(column , '%W %M %Y') = 'Sunday October 2009' | column = STR_TO_DATE( 'Sunday October 2009', '%m/%d/%Y') |
STR_TO_DATE() | Convert a string to a date | STR_TO_DATE(column , '%m/%d/%Y') = '2022-10-10' | column = DATE_FORMAT('2022-10-10', '%m/%d/%Y') |
YEAR() | Return the year | YEAR(column )= 1987 | column between Date('1987-01-01') and Date('1987-12-31') |
EXTRACT() | Extract part of a date | EXTRACT(YEAR FROM col) ='2019' EXTRACT(YEAR_MONTH FROM col)='201910' | column between '2019-01-01' and '2019-12-31' |
CAST() | Cast a value as a certain type | cast(column as type) = date '2022-01-01' | column = cast(date '2022-01-01' as coltype) |
IFNULL() | Null if/else construct | IFNULL(column , 0) = 10 | column is null and 0=10 or col=10; |
ISNULL() | Test whether the argument is NULL | ISNULL(column )= 1 ISNULL(column )= 0 | column is not null column is null |
Summarize
By applying the rewrite optimization FuncWithColumnInPredicateRewrite
, the SQL query is not only able to
- take advantage of index to reduce the data access cost, but also able to
- avoid computing on each of row by moving calculation from on columns to on constant literals.
🌐 About PawSQL
PawSQL is dedicated to automatic and intelligent database performance optimization. The products provided by PawSQL include:
- PawSQL Cloud, an online automated SQL optimization tool that supports SQL auditing, intelligent query rewriting, cost-based index recommendations, suitable for database administrators and data application developers.
- PawSQL Advisor, an IntelliJ plugin that is suitable for data application developers and can be installed via the IDEA/DataGrip marketplace by searching for "PawSQL Advisor" by name.
Contact Us
Email: service@pawsql.com
Website: https://www.pawsql.com