Tuesday, October 17, 2006

MySQL and index use

And yet another post about MySQL and how it uses indexes.
I have a transaction log file that contains a million plus records. Two essential fields I almost always use are TranDate and TranTime, CHAR(6) and CHAR(8) respectively. They contain the date and time in YYMMDD and HHMMSSss.

After the log table contained a million records, my queries were starting to take longer and longer, even though I had an index on TranDate and TranTime columns called DateTime. I needed to search for transactions between Date Time (YYMMDDHHMMSSss). The query had the following syntax:

SELECT *
FROM log
WHERE Concat(TranDate, TranTime) BETWEEN '061001' AND '061010'


I initially thought that the DateTime index will be used in the above query, but an EXPLAIN command on MySQL showed a full table scan. My index was not used!

I went digging in, and surely enough, MySQL will not be able to use the DateTime index, because the TranDate and TranTime are NOT used directly in the WHERE clause. The query basically tells MySQL to take the TranDate and TranTime columns in each row, combine them, and only return those rows between the given values. MySQL is no magician to know that the combined values are indexed already.

So, the solution was very simple. Create another column, TranDateTime that is the concatenation of both fields, and then modify the DateTime index to index that single column. The query will also change to replace the Concat function with the TranDateTime column.

Another EXPLAIN command, and the DateTime index is used, cutting the execution time to less than a second instead of few minutes.

The rules of index use in MySQL (and probably other databases) are very simple:
1) The columns that in the WHERE clause will be subject to index use if they are part of and index.
2) An index on (col1, col2, col3) can be used to search for any of the following combinations of columns: (col1), (col1,col2), (col1, col2, col3). The same index cannot be used to search for col2 only. Just think of the index as a sorted array of string, and you should figure out why.
3) The index used will be the one that returns the least number of rows.

Moral of the story?


1) Ask the database engine what is it doing. MySQL may not be the most sophisticated database, especially with multiple tables, but it does the job. Imagine what a database like Oracle, Sybase or one of the big guys will do.
All you have to do is ask.

2) Understand why is the engine doing what it is doing. Why is that seemingly suitable index not used? Is the right column present in a query WHERE clause? is that column used in a function that makes the index unusable? A LIKE '%value%' clause will not use any index.

3) Try to create a better index suitable for that query, or try to fine tune the query so the best index can be used.

Re profile, and continue until you get fed up.

Remember that you cannot solve your bad query problems by getting better hardware!

No comments:

Just Google it!