MySQL version 4.1 was quite revolutionary. The main reason for that was support for sub-queries.1
However since then MySQL users were rather discouraged to use that functionality, basically due to the implementation’s poor performance and forced to build complicated queries based on joins rather than on subqueries.
Of course you can do some effort to optimize your subquery with sometimes very good results2. Not always it’s easy or even possible if you can’t change the code though.
You’d say it’s not a problem for typical OLTP, web based traffic at all, just don’t use subqueries! That’s true, for most of the times anyway.
In our case we are sentenced to deal with subqueries beacuse of third party MediaWiki extension called DynamicPageList, which tends to generate ones, and our users want to use.
So, let’s check with some live examples, produced by DynamicPageList extension, if poor subqueries situation has changed for MySQL over the years. And my inspiration for that comes from the fact that MariaDB 5.3.x line has became GA recently and one of it’s most important features is subquery optimization .
I’ve chosen two example nasty queries that I found in slow log and compared their execution times on different MySQL versions. First query has one sub-select and many joins, second one has two sub-selects.
Thing I need to mention is the optimizer_switch variable, which is much more complex in MariaDB than in stock MySQL or Percona. And this variable has the subquery_cache switch, which is crucial in this test.
Details on both the optimizer_switch and the tested queries to be found below the test results.
Remember! These are only two very specific queries using sub-query feature, results for other type of queries and workload may be completely different for you, so please don’t base your MySQL variation decision on these tests!
Test results
Query 1 cold | Query 1 warm | Query 2 cold | Query 2 warm | ||
MariaDB 5.3.5 | 11.78 | 10.82 | 13.03 | 9.78 | |
MariaDB 5.3.5 subquery cache | 1.66 | 0.87 | 5.53 | 2.4 | |
MariaDB 5.5.21 | 13.79 | 13.16 | 13.76 | 10.96 | |
MariaDB 5.5.21 subquery cache | 1.76 | 0.92 | 5.06 | 2.45 | |
Percona 5.1.61 | 11.04 | 10.05 | 12.52 | 9.89 | |
Percona 5.5.21 | 13.54 | 12.01 | 13.69 | 10.98 | |
MySQL 5.5.22 | 11.36 | 10.24 | 12.9 | 10.23 | |
MySQL 5.6.4-m7 | 13.14 | 12.06 | 14.26 | 11.78 |
Test details
Let’s have a look at default values of optimizer settings for tested versions (select @@optimizer_switch\G):
MariaDB 5.3.5 GA
@@optimizer_switch:
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on
MariaDB 5.5.21
@@optimizer_switch:
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off
Percona 5.1.61
Percona 5.5.21
@@optimizer_switch:
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on
MySQL 5.5.22 Community Server
@@optimizer_switch:
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on
MySQL 5.6.4-m7 Community Server
@@optimizer_switch:
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off
Now let’s take a look at our tested queries (apologies for long listings):
Query 1:
01 SELECT /* DPLMain::dynamicPageList */ DISTINCT02 `page`.page_namespace AS page_namespace,03 `page`.page_title AS page_title,04 `page`.page_id AS page_id,05 rev_user,06 rev_user_text,07 rev_comment,08 rev_timestamp09 FROM `revision` AS rev,10 `page`11 LEFT OUTER JOIN `categorylinks` AS cl012 ON `page`.page_id = cl0.cl_from13 AND cl0.cl_to = 'Ébauche'14 LEFT OUTER JOIN `categorylinks` AS cl115 ON `page`.page_id = cl1.cl_from16 AND cl1.cl_to = 'En_travaux'17 LEFT OUTER JOIN `categorylinks` AS cl218 ON `page`.page_id = cl2.cl_from19 AND cl2.cl_to = 'IJ'20 LEFT OUTER JOIN `categorylinks` AS cl321 ON `page`.page_id = cl3.cl_from22 AND cl3.cl_to = 'Pourri'23 LEFT OUTER JOIN `categorylinks` AS cl424 ON `page`.page_id = cl4.cl_from25 AND cl4.cl_to = 'Poubelle'26 LEFT OUTER JOIN `categorylinks` AS cl527 ON `page`.page_id = cl5.cl_from28 AND cl5.cl_to = 'Pas_d\'humour'29 LEFT OUTER JOIN `categorylinks` AS cl630 ON `page`.page_id = cl6.cl_from31 AND cl6.cl_to = 'Désambiguation'32 LEFT OUTER JOIN `categorylinks` AS cl733 ON `page`.page_id = cl7.cl_from34 AND cl7.cl_to = 'Incinération_rapide'35 LEFT OUTER JOIN `categorylinks` AS cl836 ON `page`.page_id = cl8.cl_from37 AND cl8.cl_to = 'Ortho_nulle'38 LEFT OUTER JOIN `categorylinks` AS cl939 ON `page`.page_id = cl9.cl_from40 AND cl9.cl_to = 'Portnawak'41 LEFT OUTER JOIN `categorylinks` AS cl1042 ON `page`.page_id = cl10.cl_from43 AND cl10.cl_to = 'Stub'44 LEFT OUTER JOIN `categorylinks` AS cl1145 ON `page`.page_id = cl11.cl_from46 AND cl11.cl_to = 'Tiroir'47 LEFT OUTER JOIN `categorylinks` AS cl1248 ON `page`.page_id = cl12.cl_from49 AND cl12.cl_to = 'Article_?_mettre_en_forme'50 WHERE 1 = 151 AND cl0.cl_to IS NULL52 AND cl1.cl_to IS NULL53 AND cl2.cl_to IS NULL54 AND cl3.cl_to IS NULL55 AND cl4.cl_to IS NULL56 AND cl5.cl_to IS NULL57 AND cl6.cl_to IS NULL58 AND cl7.cl_to IS NULL59 AND cl8.cl_to IS NULL60 AND cl9.cl_to IS NULL61 AND cl10.cl_to IS NULL62 AND cl11.cl_to IS NULL63 AND cl12.cl_to IS NULL64 AND `page`.page_namespace IN ( '0' )65 AND `page`.page_is_redirect = 066 AND `page`.page_id = rev.rev_page67 AND rev.rev_timestamp = (SELECT Min(rev_aux.rev_timestamp)68 FROM `revision` AS rev_aux69 WHERE rev_aux.rev_page = rev.rev_page)70 AND `page`.page_namespace NOT IN ( 1100 )71 ORDER BY rev_timestamp DESC72 LIMIT 0, 10;
Explain looks like this:
01 +----+--------------------+---------+--------+---------------------------------+----------------+---------+------------------------------------+-------+---------------------------------------------------------------------+02 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |03 +----+--------------------+---------+--------+---------------------------------+----------------+---------+------------------------------------+-------+---------------------------------------------------------------------+04 | 1 | PRIMARY | page | ref | PRIMARY,name_title | name_title | 4 | const | 17960 | Using index condition; Using where; Using temporary; Using filesort |05 | 1 | PRIMARY | cl0 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |06 | 1 | PRIMARY | cl1 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |07 | 1 | PRIMARY | cl2 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |08 | 1 | PRIMARY | cl3 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |09 | 1 | PRIMARY | cl4 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |10 | 1 | PRIMARY | cl5 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |11 | 1 | PRIMARY | cl6 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |12 | 1 | PRIMARY | cl7 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |13 | 1 | PRIMARY | cl8 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |14 | 1 | PRIMARY | cl9 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |15 | 1 | PRIMARY | cl10 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |16 | 1 | PRIMARY | cl11 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |17 | 1 | PRIMARY | cl12 | eq_ref | cl_from,cl_timestamp,cl_sortkey | cl_from | 261 | desencyclopedie.page.page_id,const | 1 | Using where; Using index; Not exists |18 | 1 | PRIMARY | rev | ref | PRIMARY,page_timestamp | PRIMARY | 4 | desencyclopedie.page.page_id | 2 | Using where |19 | 2 | DEPENDENT SUBQUERY | rev_aux | ref | PRIMARY,page_timestamp | page_timestamp | 4 | desencyclopedie.rev.rev_page | 5 | Using index |20 +----+--------------------+---------+--------+---------------------------------+----------------+---------+------------------------------------+-------+---------------------------------------------------------------------+
For this one the MariaDB’s subquery cache made a huge difference indeed. It was also clearly seen from it’s profile – 39312 rows instead of 2830857 rows when the cache was off!
Query 2:
01 SELECT /* DPLMain::dynamicPageList */ DISTINCT02 `page`.page_namespace AS page_namespace,03 `page`.page_title AS page_title,04 `page`.page_id AS page_id,05 rev_user,06 rev_user_text,07 rev_comment,08 rev_timestamp09 FROM `revision` AS rev,10 `page`11 INNER JOIN `categorylinks` AS cl012 ON `page`.page_id = cl0.cl_from13 AND ( cl0.cl_to = 'Comics' )14 WHERE 1 = 115 AND `page`.page_namespace IN ( '116', '0' )16 AND `page`.page_is_redirect = 017 AND `page`.page_id = rev.rev_page18 AND rev.rev_timestamp = (SELECT Max(rev_aux.rev_timestamp)19 FROM `revision` AS rev_aux USE INDEX(20 page_timestamp)21 WHERE rev_aux.rev_page = rev.rev_page)22 AND `page`.page_namespace NOT IN ( 1000, 1010, 1020, 1100 )23 ORDER BY rev_timestamp DESC24 LIMIT 1, 10;
Explain:
1 +----+--------------------+---------+--------+---------------+------------+---------+-------------------------------+-------+-----------------------------------------------------------+2 | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |3 +----+--------------------+---------+--------+---------------+------------+---------+-------------------------------+-------+-----------------------------------------------------------+4 | 1 | PRIMARY | cl0 | ref | cl_sortkey | cl_sortkey | 257 | const | 55938 | Using where; Using index; Using temporary; Using filesort |5 | 1 | PRIMARY | page | eq_ref | PRIMARY | PRIMARY | 4 | enmarveldatabase.cl0.cl_from | 1 | Using where |6 | 1 | PRIMARY | rev | ref | PRIMARY | PRIMARY | 4 | enmarveldatabase.cl0.cl_from | 1 | Using where |7 | 2 | DEPENDENT SUBQUERY | rev_aux | ref | PRIMARY | PRIMARY | 4 | enmarveldatabase.rev.rev_page | 1 | |8 +----+--------------------+---------+--------+---------------+------------+---------+-------------------------------+-------+-----------------------------------------------------------+
In this case performance gain was not so spectacular, but still huge when subquery cache was used.
I have also tested several other DPL queries with double subqueries that I found in slow log, but in those cases there was no performance gain. I wonder if that is a problem with double subqueries in general for MariaDB.
Additional clarifications:
- by “cold cache” I mean test after:
1 echo 3 > /proc/sys/vm/drop_caches2 service mysql restart
- by “warm cache” I mean n-th query run where n>2, so innodb buffer pool is filled
- all tests were made on the same machine, with the same my.cnf, innodb buffer pool set to 9GB
- machine had 16GB RAM, Xeon L5335, SSD, Ubuntu Lucid
- from my observations both queries were purely CPU bound (100% of one core utilization for the whole query run), no I/O wait noticed
- I did all the tests several times to make sure variations are negligible
- in order to not let the MySQL to change the execution plan during tests I have forced using the same indexes when it was needed
Interesting posts on subqueries:
http://www.mysqlperformanceblog.com/2010/10/25/mysql-limitations-part-3-subqueries/
http://blog.montyprogram.com/mariadb-5-3-optimizer-benchmark/
I’ve updated this post with another second query since after forcing proper indexes in previous one that I used – results were almost exactly the same for all MySQL versions.
I’ve also added Percona 5.1.61 results.
btw http://mysqlsandbox.net/ allowed all my tests to be made much quicker
[...] What about the subqueries? - 29 March 2012 – a quick benchmark using 2 subqueries found in a MediaWiki plugin. [...]