mysql连接查询中的主表选择问题

2017年05月26日原创 MySQL

浏览 267144 评论 2

开发在线下统计数据库进行数据查询时，发现一个查询特别慢，一个查询要几百多秒，导致统计报表没有按时出来。领导不开心，开发很着急。 SQL是如下这样的，表结构部分字段 #### select * from `order_info` o left join platform p on p.id = o.platform_id where o.paid_at >'2017-05-24' and o.paid_at < '2017-05-25' and p.client_type = 2; #order_info表结构： mysql> show create table `order_info`\G *************************** 1. row *************************** Table: order_info Create Table: CREATE TABLE `order_info` ( `id` int(11) NOT NULL AUTO_INCREMENT, `created_at` datetime NOT NULL, `updated_at` datetime NOT NULL, `uuid` varchar(255) NOT NULL, `channel_id` int(11) NOT NULL, `business_id` int(11) NOT NULL, `platform_id` int(11) NOT NULL, `status` int(11) NOT NULL, `paid_at` datetime DEFAULT NULL, ....省略... PRIMARY KEY (`id`), UNIQUE KEY `uuid` (`uuid`), ....省略... KEY `order_e3d75fef` (`platform_id`), KEY `order_93025c2f` (`account_id`), KEY `order_de284cef` (`relation_order_id`), KEY `union_paid_at_updated_at_channel_id_64_index` (`paid_at`,`updated_at`,`channel_id`), ....省略... ) ENGINE=InnoDB AUTO_INCREMENT=73823559 DEFAULT CHARSET=utf8 #platform表结构： mysql> show create table platform\G *************************** 1. row *************************** Table: platform Create Table: CREATE TABLE `platform` ( `id` int(11) NOT NULL AUTO_INCREMENT, `created_at` datetime NOT NULL, `updated_at` datetime NOT NULL, `uuid` varchar(255) NOT NULL, `name` varchar(255) NOT NULL, `status` int(11) NOT NULL, `is_allcharge` tinyint(1) unsigned DEFAULT '0' COMMENT '1表全计费', `client_type` tinyint(1) unsigned DEFAULT NULL COMMENT '1yy渠道2xx渠道', PRIMARY KEY (`id`), UNIQUE KEY `uuid` (`uuid`), UNIQUE KEY `name` (`name`), KEY `platform_48fb58bb` (`status`) ) ENGINE=InnoDB AUTO_INCREMENT=306 DEFAULT CHARSET=utf8 1 row in set (0.00 sec) **数据量：** #### order_info:7000万 platform:225 两个表进行left join, 另外join的字段都是有索引的,还有前面的表，也就是“主表”，where里的字段是“主表”的，也是有索引的，直接看SQL是没有问题的，难道是数据量太大？ #### 为什么把”主表“，带上双引号呢？主要是看了explain之后，这个SQL并不是把order_info表做为主表的，mysql是把后面的 platform 表做为主表了，然后进行了全表扫描。 mysql> explain select * from `order_info` o left join platform p on p.id = o.platform_id where o.paid_at >'2017-05-24' and o.paid_at < '2017-05-25' and p.client_type = 2; +----+-------------+-------+------------+------+-------------------------------------------------------------+----------------+---------+-----------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------------------------------------------------- ---+----------------+---------+-----------+------+----------+-------------+ | 1 | SIMPLE | p | NULL | ALL | PRIMARY | NULL | NULL | NULL | 224 | 10.00 | Using where | | 1 | SIMPLE | o | NULL | ref | order_e3d75fef,union_paid_at_updated_at_channel_id_64_index | order_e3d75fef | 4 | boss.p.id | 5102 | 0.24 | Using where | +----+-------------+-------+------------+------+-------------------------------------------------------------+----------------+---------+-----------+------+----------+-------------+ 2 rows in set, 1 warning (0.00 sec) 这里mysql把platform当做主表进行连接查询，因为这个表比较小，就算全表查询代价也比较小，但是现实并不是这样的。因为这里platform所有记录都要与非常大的order_info表查询（type=ref），尽管order_info表的platform_id有索引，但数据量是非常大的，显然按这种查询速度肯定上不去。仔细观察，这个执行计划和SQL，如果走order_info表为主表更为合适。因为order_info的paid_at是有索引的，也在where条件中，是个范围。数据量就会少很多，查询出来也就几w的样子，再与platform的主键查询，肯定是要快的。尝试把两个表的位置换下，也不行，执行计划还是没有变（其实这个位置在高版本的mysql已经不重要了）。如果mysql把platform表做为主表，那么把where条件client_type加一个索引，减少下这个表的数据量，也不要全表扫描。果断加了个索引： #### mysql> ALTER TABLE `platform` ADD INDEX client_type_64 (`client_type`); Query OK, 0 rows affected (0.31 sec) Records: 0 Duplicates: 0 Warnings: 0 表很小，速度加完，再看下执行计划，发现mysql的执行计划变了，把order_info表做为主表了，与于直接查询下SQL，秒内完成。 #### mysql> explain select * from `order_info` o left join platform p on o.platform_id = p.id where o.paid_at >'2017-05-24' and o.paid_at < '2017-05-25' and p.client_type = 2; +----+-------------+-------+------------+--------+-------------------------------------------------------------+----------------------------------------------+---------+--------------------+--------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+-------------------------------------------------------------+----------------------------------------------+---------+--------------------+--------+----------+-----------------------+ | 1 | SIMPLE | o | NULL | range | order_e3d75fef,union_paid_at_updated_at_channel_id_64_index | union_paid_at_updated_at_channel_id_64_index | 6 | NULL | 161204 | 100.00 | Using index condition | | 1 | SIMPLE | p | NULL | eq_ref | PRIMARY,client_type_64 | PRIMARY | 4 | boss.o.platform_id | 1 | 88.84 | Using where | +----+-------------+-------+------------+--------+-------------------------------------------------------------+----------------------------------------------+---------+--------------------+--------+----------+-----------------------+ #直接查询 mysql> select SQL_NO_CACHE * from `order_info` o left join platform p on o.platform_id = p.id where o.paid_at >'2017-05-24' and o.paid_at < '2017-05-25' and p.client_type = 2; 4301 rows in set (0.74 sec) 看了执行计划，type变成了eq_ref，order_info表用到了ICP。不过创建个索引并没有用，只是改变了mysql的执行计划，于是让开发先试试，开发回应一个赞，不过马上又发来一个SQL，client_type=1时，执行特别慢。重新看了下这个执行计划，又把order_info表当作主表了，怎么这么不听话，一下order_info,一下platform？于是怀疑应变是client_type 值的选择性上，导致让mysql觉得走platform为主一更好。就看了下这个字段值的分布： #### mysql> select client_type,count(*) from platform group by client_type; +-------------+----------+ | client_type | count(*) | +-------------+----------+ | 1 | 26 | | 2 | 199 | +-------------+----------+ 2 rows in set (0.00 sec) 就两种值，看这两个值就大概知道原因了，当client_type为2时，是没法用到这个索引的，因为数量超过了表的50%了；就放弃了这个表做为主表。当client_type为1时，只在少数数据，就选择了这个表作为主表。不过还是有个问题没有搞明白，为什么超过50%了，走了全表扫表，就不走platform为主表了呢；没有这个索引的时候也是走全表索引，就还是走platform表为主表。不过可以确定的事，一，走order_info为主表肯定是快的，没有问题的。二，加这个索引只是有可能影响到主表选择，但不能控制。那么加这个索引是没有意义的。只能强制使用union_paid_at_updated_at_channel_id_64_index索引了。只好和开发说把两个SQL都改成如下形式： #### select * from `order_info` o force index(union_paid_at_updated_at_channel_id_64_index) left join platform p on o.platform_id = p.id where o.paid_at >'2017-05-24' and o.paid_at < '2017-05-25' and p.client_type = 1;

文章最后更新时间： 2017年05月26日 04:26:36