跳到主要内容

Calculation Disables Index Usage

Copyright © 2023 PawSQL

Problem

Let's say we have a b-tree index c_acctbal_idx defined on table tpch.customer to accelarate queries with a condition on column c_acctbal. Let's check a query as follows:

select * from tpch.customer where c_acctbal + 100 = 10000.0

Use explain to get the query plan:

-> Filter: ((customer.C_ACCTBAL + 100) = 10000.0)  (cost=102301.50 rows=900955)
-> Table scan on customer (cost=102301.50 rows=900955)

From the query plan, we can see the database optimizer uses a table scan instead of the index lookup as we expected. The query cost is very high because of the full table scan and the arithmatic compution on each of the row.

Diagnosis

Let's replace the condition with an equivalent one by moving the calculation from left to right,

select * from tpch.customer where c_acctbal = 10000.0 - 100;

Use explain to get the query plan:

-> Index lookup on customer using c_acctbal_idx (C_ACCTBAL=(10000.0 - 100))  (cost=5.50 rows=5)

From the query plan , we can see the database optimizer chooses index lookup using index c_acctbal_idx and the cost is less than 1%% of the previous SQL query.

Diagnosis: From above experiment, we can conclude that if there is some calculation on indexed column, the index usage will be disabled. This conclusion is not only appliable to arthematic but also functions, and all other calculations on indexed column.

select * from tpch.orders where ADDDATE(o_orderdate,  INTERVAL 31 DAY) =date '2019-10-10'  

While following equivalent query can take advantage of index on o_orderdate

select * from tpch.orders where o_orderdate = subdate(date '2019-10-10' , INTERVAL 31 DAY);

Solution

There are two solutions to solve this problem,

  • one solution is to rewrite the original query to an equivalent one by moving the calculation to other side of the condition, as the examples in Diagnosis paragraph, so that the query can take advantage of existing indexes on the column.

  • the other solution is to add a functional index to match the condition.

    create index phone_func_idx on customer(left(c_phone, 3))

    The functional index is not applicable to other conditions if the conditions don't have the same signature of the index definition. For example, the above index will not applicable to a condition left(c_phone, 4) = '1399'.

PawSQL

PawSQL implements the first solution by automatically detecting the problematic predicates, automatically rewriting them into equivalent ones in rewrite optimization rule FuncWithColumnInPredicateRewrite.

Built-in Operators

PawSQL support following arithmatic operators and the combination of them.

*Multiplication operator
+Addition operator
-Minus operator
/Division operator
-Change the sign of the argument

Built-in Functions

MySQL(8.0)

PawSQL for MySQL built-in functions,

NameDescriptionColumn in functionPawSQL Rewrite
ADDDATE()Add time values (intervals) to a date valueADDDATE(column, 31) = date '2019-10-10' ADDDATE(column, INTERVAL 31 DAY) =date '2019-10-10' ;column = subdate(date '2019-10-10' , 31) column = subdate(date '2019-10-10' , INTERVAL 31 DAY)
DATE_ADD()Add time values (intervals) to a date valueDATE_ADD(column, 31) = date '2019-10-10' DATE_ADD(column, INTERVAL 31 DAY);column = subdate(date '2019-10-10' , 31) column = subdate(date '2019-10-10' , INTERVAL 31 DAY)
DATE_SUB()Subtract a time value (interval) from a dateDATE_SUB(column, 31) = date '2019-10-10' DATE_SUB(column, INTERVAL 31 DAY);column = adddate(date '2019-10-10' , 31) column = adddate(date '2019-10-10' , INTERVAL 31 DAY)
SUBDATE()Synonym for DATE_SUB() when invoked with three argumentsSUBDATE(column, 31) = date '2019-10-10' SUBDATE(column, INTERVAL 31 DAY);column = adddate(date '2019-10-10' , 31) column = adddate(date '2019-10-10' , INTERVAL 31 DAY)
DATEDIFF()Subtract two datesDATEDIFF(currentdate(),col) > 7column < datesub(currentdate(),7)
ADDTIME()Add timeADDTIME(column, '02:00:00.999998') > '23:00:00.999998'column> subtime('23:00:00.999998' - '02:00:00.999998')
SUBTIME()Subtract timesADDTIME(column, '02:00:00.999998') > '23:00:00.999998'column> subtime('23:00:00.999998' - '02:00:00.999998')
TIMEDIFF()Subtract timeTIMEDIFF(column, '2000-01-01 00:00:00.000001') = '10:10:1.0000001'column = addtime('2000-01-01 00:00:00.000001', '10:10:1.0000001')
TIMESTAMPADD()Add an interval to a datetime expressionTIMESTAMPADD(MINUTE,1,column )> '2003-01-02 00:01:00'column > TIMESTAMPADD(MINUTE, -1,'2003-01-02 00:01:00' )
TIMESTAMPDIFF()Subtract an interval from a datetime expressionTIMESTAMPDIFF(MINUTE,CURRENT_TIMESTAMP(),column )> 10TIMESTAMPDIFF(MINUTE,10, CURRENT_TIMESTAMP())> col
SEC_TO_TIME()Converts seconds to 'hh:mm:ss' formatSEC_TO_TIME(column)= '22:23:00'column = TIME_TO_SEC('22:23:00')
TIME_TO_SEC()Return the argument converted to secondsTIME_TO_SEC(column)= 32435column = SEC_TO_TIME(32435)
FROM_DAYS()Convert a day number to a dateFROM_DAYS(column)= '2022-10-10'column = TO_DAYS('2022-10-10')
TO_DAYS()Return the date argument converted to daysTO_DAYS(column)= 32435column = FROM_DAYS(32435)
FROM_UNIXTIME()Format Unix timestamp as a dateFROM_UNIXTIME(column)= '2005-03-27 03:00:00'column = UNIX_TIMESTAMP('2005-03-27 03:00:00')
UNIX_TIMESTAMP()Return a Unix timestampUNIX_TIMESTAMP(column) = 1447431619column = FROM_UNIXTIME(1447431619)
INET_ATON()Return the numeric value of an IP addressINET_ATON(column)= 13234234column = INET_NTOA(13234234)
INET_NTOA()Return the IP address from a numeric valueINET_NTOA(column) = '201.1.1.2'column = INET_ATON('201.1.1.2')
INET6_ATON()Return the numeric value of an IPv6 addressINET_ATON(column)= 13234234column = INET_NTOA(13234234)
INET6_NTOA()Return the IPv6 address from a numeric valueINET_NTOA(column) = '201.1.1.2'column = INET_ATON('201.1.1.2')
LEFT()Return the leftmost number of characters as specifiedLEFT(column, 5) = 'abcde'; LEFT(column, 5) = 'abcd'column like 'abcde%' column = 'abcd'
STRCMP()Compare two stringsSTRCMP(column, 'abc') = 1,-1, 0column > 'abc', col<'abc', column = 'abc'
DATE_FORMAT()Format date as specifiedDATE_FORMAT(column, '%W %M %Y') = 'Sunday October 2009'column = STR_TO_DATE( 'Sunday October 2009', '%m/%d/%Y')
STR_TO_DATE()Convert a string to a dateSTR_TO_DATE(column, '%m/%d/%Y') = '2022-10-10'column = DATE_FORMAT('2022-10-10', '%m/%d/%Y')
YEAR()Return the yearYEAR(column)= 1987column between Date('1987-01-01') and Date('1987-12-31')
EXTRACT()Extract part of a dateEXTRACT(YEAR FROM col) ='2019' EXTRACT(YEAR_MONTH FROM col)='201910'column between '2019-01-01' and '2019-12-31'
CAST()Cast a value as a certain typecast(column as type) = date '2022-01-01'column = cast(date '2022-01-01' as coltype)
IFNULL()Null if/else constructIFNULL(column, 0) = 10column is null and 0=10 or col=10;
ISNULL()Test whether the argument is NULLISNULL(column)= 1 ISNULL(column)= 0column is not null column is null

Summarize

By applying the rewrite optimization FuncWithColumnInPredicateRewrite, the SQL query is not only able to

  • take advantage of index to reduce the data access cost, but also able to
  • avoid computing on each of row by moving calculation from on columns to on constant literals.

About PawSQL

PawSQL is dedicated to automatic and intelligent database performance optimization. The products provided by PawSQL include:

  • PawSQL Cloud, an online automated SQL optimization tool that supports SQL auditing, intelligent query rewriting, cost-based index recommendations, suitable for database administrators and data application developers.
  • PawSQL Advisor, an IntelliJ plugin that is suitable for data application developers and can be installed via the IDEA/DataGrip marketplace by searching for "PawSQL Advisor" by name.
  • PawSQL Engine, which is the backend optimization engine of the PawSQL series of products, can be installed and deployed independently, and provides SQL optimization services through http/json interfaces. PawSQL Engine is provided for deployment and installation as a docker image.

Contact Us

Email: service@pawsql.com

Twitter: https://twitter.com/pawsql