【mysql】pt工具之pt-table-checksum
时间:2015-09-09 02:09:57 作者:beebol 标签: mysql pt-table-checksum 分类: mysql
pt-table-checksum是percona-toolkit工具组中的其中之一,主要功能是检查mysql复制的一致性。大概的原理是在主上执行一致性检查,检查方式就是执行replace into的语句完成,然后通过复制传递到从库,再通过update更新master_src的值,然后通过决断this_src和master_src的值来判断复制是否一致。一般情况都是在线进行检查,如果在业务特别忙时,可以通过--max-load参数进行设置负载峰值,到达这个值后就将暂停运行,直到负载下来。如果发现一致性有问题,可以通过tp-table-sync来进行数据修复。
安装:
直接下载官网最新版本"https://www.percona.com/downloads/percona-toolkit/",这里使用的为2.2.12-1版本,需要安装如下东东:
perl(DBI) >= 1.13 perl(DBD::mysql) >= 1.0 perl(Time::HiRes) perl(IO::Socket::SSL) perl(Digest::MD5) rpmlib(PayloadFilesHavePrefix) <= 4.0-1rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(CompressedFileNames) <= 3.0.4-1
使用:
先要有一个主从环境,部署参考“mysql主从复制”,开始进行检查:
[root@mysql.dmc.com ~]$pt-table-checksum --max-load="Threads_connected=1025" --no-check-replication-filters --no-check-binlog-format -h 10.1.16.74 -u checksum -p checksum -P 3306 TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 09-07T18:24:29 0 0 3 1 0 2.265 beebol.history 09-07T18:24:30 Cannot checksum table beebol.history_2: There is no good index and the table is oversized. at /usr/bin/pt-table-checksum line 6454. 09-07T18:24:36 0 0 5 6 0 5.568 beebol.navicat 09-07T18:24:50 0 0 20 11 0 14.076 beebol.people 09-07T18:24:50 Cannot checksum table beebol.t1: There is no good index and the table is oversized. at /usr/bin/pt-table-checksum line 6454. 09-07T18:24:50 Cannot checksum table beebol.t2: There is no good index and the table is oversized. at /usr/bin/pt-table-checksum line 6454. 09-07T18:24:51 0 0 0 1 0 0.805 beebol.t5 09-07T18:24:52 0 0 0 1 0 0.904 cor.accounts_auth_user_user_permissions 09-07T18:24:53 0 0 0 1 0 0.835 cor.accounts_members 09-07T18:24:54 0 0 0 1 0 1.182 cor.accounts_permissions 09-07T18:24:57 0 0 1 1 0 1.233 cor.auth_group 09-07T18:24:59 0 0 1 1 0 1.499 cor.auth_group_permissions 09-07T18:25:04 Checksum queries for table cor.auth_permission are executing very slowly. --chunk-size has been automatically reduced to 1. Check that the server is not being overloaded, or increase --chunk-time. The last chunk, number 1 of table cor.auth_permission, selected 2 rows and took 4.695 seconds to execute. Checksumming cor.auth_permission: 75% 00:10 remain09-07T18:25:54 0 0 36 37 0 54.930 cor.auth_permission 09-07T18:25:54 0 0 36 37 0 54.930 cor.auth_permission
如上已经正常执行,默认没有指定replicate库和表,默认为percona.checksums。如上已经正常执行了。其中有些输出这里再详细说明下:
TS --> The timestamp (without the year) when the tool finished checksumming the table. ERRORS --> The number of errors and warnings that occurred while checksumming the table. Errors and warnings are printed to standard error while the table is in progress. DIFFS --> The number of chunks that differ from the master on one or more replicas. If --no-replicate-check is specified, this column will always have zeros. If --replicate-check-only is specified, then only tables with differences are printed. ROWS --> The number of rows selected and checksummed from the table. It might be different from the number of rows in the table if you use the –where option. CHUNKS --> The number of chunks into which the table was divided. SKIPPED -- > The number of chunks that were skipped due one or more of these problems: * MySQL not using the --chunk-index * MySQL not using the full chunk index (--[no]check-plan) * Chunk size is greater than --chunk-size * --chunk-size-limit * Lock wait timeout exceeded (--retries) * Checksum query killed (--retries) As of pt-table-checksum 2.2.5, skipped chunks cause a non-zero “EXIT STATUS”. TIME --> The time elapsed while checksumming the table.TABLE --> The database and table that was checksummed. TABLE --> The database and table that was checksummed.
09-07T18:24:30 Cannot checksum table beebol.history_2: There is no good index and the table is oversized. at /usr/bin/pt-table-checksum line 6454.
这个由于是beebol.history_2中没有主键和唯一索引,因为该工具是根据主键索引或者唯一索引来分块进行校验的,默认是1000行为一块。
09-07T18:25:04 Checksum queries for table cor.auth_permission are executing very slowly. --chunk-size has been automatically reduced to 1. Check that the server is not being overloaded, or increase --chunk-time. The last chunk, number 1 of table cor.auth_permission, selected 2 rows and took 4.695 seconds to execute.
这个是由于在这个表上选择2行数据的执行时间已经超过了chunk-time默认时间0.5s。这个提示自动将chunk-size降低为1,默认是1000.
如下一直在刷新如下信息,可以将--max-load进行设置一下:
Pausing because Threads_running=0. Pausing because Threads_running=0.Pausing because Threads_running=0. Pausing because Threads_running=0.
我这里设置--max-load="Threads_running=1025"还是会刷,后来将这个connected设置下就ok了,--max-load="Threads_connected=1025"。
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
这个表示找不到slave,检查可能会无效。需要指定 --recursion-method
获取slave信息的方式有:--recursion-method
参数,可以参考下官方文档和tp-table-checksum --help
不一致情况:
[root@mysql.dmc.com ~]$pt-table-checksum --max-load="Threads_connected=1025" --no-check-replication-filters --no-check-binlog-format -h 10.1.16.74 -u checksum -p checksum -P 3306 --databases=test TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 09-07T19:49:20 0 0 3 1 0 0.896 test.history 09-07T19:49:22 0 0 0 1 0 1.816 test.history_2 09-07T19:49:24 0 0 6 1 0 2.517 test.navicat 09-07T19:49:26 0 0 20 1 0 1.236 test.people 09-07T19:49:27 0 0 2 1 0 1.158 test.t1 09-07T19:49:29 0 0 7 1 0 1.484 test.t2 09-07T19:49:31 0 0 4 1 0 1.935 test.t3 09-07T19:49:32 0 0 3 1 0 1.247 test.t4 09-07T19:49:33 0 0 0 1 0 1.114 test.t509-07T19:49:35 0 1 6 1 0 1.843 test.t_order 09-07T19:49:35 0 1 6 1 0 1.843 test.t_order
检查后,发现test.t_order表有不一致情况,原因是由于我手动到从上进行了一次update操作。可以通过pt-table-sync进行修复。