If we just process urls one by one it works great:
PLAIN TEXT
SQL:
- [FONT='Courier New', Courier, monospace]mysql> EXPLAIN SELECT url FROM 124pages.124pages WHERE url_crc=484036220 AND url="http://www.dell.com/";[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]| id | select_type | TABLE | type | possible_keys | KEY | key_len | ref | rows | Extra |[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]| 1 | SIMPLE | 124pages | ref | url_crc | url_crc | 4 | const | 1 | USING WHERE |[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]1 row IN SET (0.02 sec) [/font]
Handling URLs one by one is however not efficient if you're processing millions of them so we tried to do bulk fetches:
PLAIN TEXT
SQL:
- [FONT='Courier New', Courier, monospace]mysql> EXPLAIN SELECT url FROM 106pages.106pages WHERE (url_crc, url) IN ((2752937066, 'http://members.aye.net/~gharris/blog/'), (3799762538, 'http://www.coxandforkum.com/'));[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]| id | select_type | TABLE | type | possible_keys | KEY | key_len | ref | rows | Extra |[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]| 1 | SIMPLE | 106pages | ALL | NULL | NULL | NULL | NULL | 14936 | USING WHERE |[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]1 row IN SET (0.00 sec) [/font]
As you can see just using multiple column IN makes MySQL to pick doing full table scan in this case, even though the cardinality on the first column is almost perfect. I did some more testing and it looks like a bug or missing optimizer feature.
I should not be surprised though as multi-column in is not the most used MySQL feature out there.
For given application case we could simply rewrite query using more standard single column IN clause:
PLAIN TEXT
SQL:
- [FONT='Courier New', Courier, monospace]mysql> EXPLAIN SELECT url FROM 106pages.106pages WHERE url_crc IN (2752937066,3799762538) AND url IN('http://members.aye.net/~gharris/blog/','http://www.coxandforkum.com/'); +----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]| id | select_type | TABLE | type | possible_keys | KEY | key_len | ref | rows | Extra |[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]| 1 | SIMPLE | 106pages | range | url_crc | url_crc | 4 | NULL | 2 | USING WHERE |[/font]
- [FONT='Courier New', Courier, monospace]+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+[/font]
- [FONT='Courier New', Courier, monospace]1 row IN SET (0.01 sec) [/font]
Theoretically speaking this query is not equivalent to the first one - because row having url_crc=2752937066 and url='http://www.coxandforkum.com/' would match it, while it should not. It however does not happen in our case as url_crc is functionally dependent on url so both queries are equivalent.
So we've got our work around and can forget about the issue and MySQL team gets yet another bug to deal with.
What worries me again is - this is very simple case which seems to to be generally broken which raises a question how good coverage MySQL tests have.
Entry posted by peter | No comment
Add to: | | | |
Multi-Column IN clause - Unexpected MySQL Issue - Read More...