从库mysqldump会导致复制中断
作者:张政俊 就职于中欧基金,知数堂粉丝,数据库爱好者,熟悉RDBMS、nosql、new sql等各类数据库。
背景
环境
- 数据库版本:5.6.24-log
- 主从架构:一主两从搭配mha
操作步骤
1.开发要求truncate一张表,在删除之前需要备份下数据;
2.在从库上使用mysqldump命令做逻辑备份,备份sql:
mysqldump -uXXX -p osdc osdc_XXX > /tmp/osdc_info.sql;
3.这台从库与主库的同步出现中断,报错为:
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1756
Last_Error: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details).
ERROR NO 是1756,而且只是 Slave_SQL_Running 停了。
处理方式
- 停止同步:
stop slave;
- 打开同步:
start slave;
重新启停同步服务后,恢复正常。
分析
之前形形色色的主从问题遇到过很多,但是 1756 这个错误还是第一次遇到,所以事后还是很有兴趣去深入分析下这个问题。
1. 详细报错
查看数据库的 error log ,这个报错的具体输出如下:
2020-08-18 22:04:19 208840 [Warning] Slave SQL: Worker 2 failed executing transaction 'fe0ed3ca-cacc-11e9-8d49-6c92bfcf73b6:1377629720' at master log mysql-bin.003536, end_log_pos 861741231
; Could not execute Update_rows event on table osdc.osdc_XXX; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's
master log mysql-bin.003536, end_log_pos 861741231, Error_code: 1205
2020-08-18 22:04:19 208840 [Note] Worker 2 is exiting: killed 0, error 1, running_status 1
2020-08-18 22:04:19 208840 [ERROR] Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency aut
omatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for detail
s). Error_code: 1756
错误日志里有个 Lock wait timeout exceeded ,这个我们先记下,稍后再说。
先找到 master 事务的 gtid,直接去主库的 binlog 里找下这个 gtid 所对应的 sql:
SET @@SESSION.GTID_NEXT= 'fe0ed3ca-cacc-11e9-8d49-6c92bfcf73b6:1377629720'/*!*/;
# at 861740635
#200818 21:58:42 server id 31 end_log_pos 861740715 CRC32 0xa9ab0dd2 Query thread_id=9403182 exec_time=0 error_code=0
SET TIMESTAMP=1597759122/*!*/;
BEGIN
/*!*/;
# at 861740715
#200818 21:58:42 server id 31 end_log_pos 861740840 CRC32 0xbd8ba613 Rows_query
# update osdc_XXX
# set modify_time=now(),status='04'
# where id=9298XXXXX and status='01'
可以看到,这时候主库对 osdc_XXX 表进行了 update 操作。
2. 问题分析
- 首先,mysqldump 不加任何参数去执行,会对备份的表加表级锁。
- 这时候从库需要执行从主库同步过来的 update 语句,因为 mysqldump 表锁的存在,该语句会处理等待状态。
- update 等待超时,这个错误我们在上面的 error log 里可以看到(Lock wait timeout exceeded)。
- 因为在5.6.26版本下,在使用 MTS 的情况下,slave_transaction_retries 是不生效的。所以当 update 语句超时后,它没有重试机制,导致整个 SQL threads 都停掉了(这也就是为什么 Slave_IO_Running 显示YES,而 Slave_SQL_Running 显示NO)。
在 rpl_slave.cc 里可以看到:
----
/* MTS technical limitation no support of trans retry */
if (mi->rli->opt_slave_parallel_workers != 0 && slave_trans_retries != 0
----
这个 bug 很早就由 Yoshinori Matsunobu 提在 https://bugs.mysql.com/bug.php?id=68465 了,官方也采纳了,并于 MySQL 5.7.5 修复掉了该问题。
Yoshinori Matsunobu 就是大名鼎鼎的 MHA 的作者!
3. 问题复现
相关参数
- slave_parallel_workers:3
[root@localhost][(none)]> show variables like '%slave_parallel_workers%';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| slave_parallel_workers | 5 |
+------------------------+-------+
- innodb_lock_wait_timeout:50
[root@localhost][(none)]> show variables like '%innodb_lock_wait_timeout%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50 |
+--------------------------+-------+
- slave_transaction_retries:10
[root@localhost][none]> show variables like '%slave_transaction_retries%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| slave_transaction_retries | 10 |
+---------------------------+-------+
- 测试数据:
[root@localhost][bug]> select * from repl_test;
+------+-------+
| id | info |
+------+-------+
| 1 | tom |
| 2 | jerry |
+------+-------+
从库加锁操作
从库:
[root@localhost][bug]> begin;update repl_test set info='all';
Query OK, 0 rows affected (0.00 sec)
Query OK, 2 rows affected (0.01 sec)
Rows matched: 2 Changed: 2 Warnings: 0
显性开启事务,让它持续持有 InnoDB lock
主库更新操作
主库:
[root@localhost][bug]> update repl_test set info='all' where id=1;commit;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Query OK, 0 rows affected (0.00 sec)
手工更新 repl_test 表的一条数据,此时从库的复制线程会等待这条sql的应用,并处于 system lock 状态:
从库:
[root@localhost][(none)]> show processlist;
+----+-----------------+-----------+------+---------+------+-----------------------------------------------------------------------------+--------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+-----------+------+---------+------+-----------------------------------------------------------------------------+--------------------------------------------+
| 6 | event_scheduler | localhost | NULL | Daemon | 545 | Waiting on empty queue | NULL |
| 9 | root | localhost | bug | Sleep | 10 | | NULL |
| 15 | system user | | NULL | Connect | 44 | Waiting for master to send event | NULL |
| 16 | system user | | NULL | Connect | 5 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL |
| 17 | system user | | NULL | Connect | 44 | Waiting for an event from Coordinator | NULL |
| 18 | system user | | NULL | Connect | 44 | Waiting for an event from Coordinator | NULL |
| 19 | system user | | NULL | Connect | 5 | System lock | update repl_test set info='all' where id=1 |
| 20 | root | localhost | NULL | Query | 0 | init | show processlist |
+----+-----------------+-----------+------+---------+------+-----------------------------------------------------------------------------+--------------------------------------------+
8 rows in set (0.04 sec)
errorlog
错误日志开始打印错误信息:
2020-08-20 11:01:00 22391 [Warning] Slave SQL: Worker 2 failed executing transaction '' at master log mysql-bin.000272, end_log_pos 654587717; Could not execute Update_rows event on table bug.repl_test; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log FIRST, end_log_pos 654587717, Error_code: 1205
2020-08-20 11:01:00 22391 [ERROR] Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756
2020-08-20 11:01:00 22391 [Note] Error reading relay log event: slave SQL thread was killed
报的错与当时的内容一致。
4. 总结
- mysqldump 即使在从库中执行,也必须加上 --single-transcation 等参数,直接执行会上表锁,成本大大。
- 部分mysql工具,如 navicat 直接使用它自带的导出功能,也会锁住全表。所以尽量不要使用工具去处理导出工作。
mysql> show open tables where in_use >0 ;
+----------+-----------+--------+-------------+
| Database | Table | In_use | Name_locked |
+----------+-----------+--------+-------------+
| osdc | osdc_XXX | 1 | 0 |
+----------+-----------+--------+-------------+
1 row in set (0.00 sec)
- 5.6 的主从的坑已经踩了很多了,大多是由于自身的bug,而且在 5.6 下根本体现不出 MTS 的优势。把 5.6 升级至 5.7 或 8.0 是非常有必要的,之前测试的 8.0 的复制稳定性和性能的提升非常大,推荐直接升级至 8.0 版本。
全文完。
Enjoy MySQL :)
叶老师的「MySQL核心优化」大课已升级到MySQL 8.0,扫码开启MySQL 8.0修行之旅吧
- python contextlib 上下文管理器
- Django扩展自定义manage命令
- Elasticsearch分片、副本与路由(shard replica routing)
- EsRejectedExecutionException排错与线程池类型
- python重试(指数退避算法)
- 动手为王 | Oracle 数据库跨版本升级迁移实践
- 日志归档与数据挖掘
- Redis实现分布式锁
- Mybatis分页插件PageHelper的配置和使用方法
- 另类保存微信公众平台历史消息的方法 - 星标消息
- CentOS环境下Docker私有仓库搭建
- spark三种连接join
- 用firebug给firefox添加信任链接
- Elasticsearch索引别名、Filtered索引别名、Template
- MySQL 教程
- MySQL 安装
- MySQL 管理与配置
- MySQL PHP 语法
- MySQL 连接
- MySQL 创建数据库
- MySQL 删除数据库
- MySQL 选择数据库
- MySQL 数据类型
- MySQL 创建数据表
- MySQL 删除数据表
- MySQL 插入数据
- MySQL 查询数据
- MySQL where 子句
- MySQL UPDATE 查询
- MySQL DELETE 语句
- MySQL LIKE 子句
- mysql order by
- Mysql Join的使用
- MySQL NULL 值处理
- MySQL 正则表达式
- MySQL 事务
- MySQL ALTER命令
- MySQL 索引
- MySQL 临时表
- MySQL 复制表
- 查看MySQL 元数据
- MySQL 序列 AUTO_INCREMENT
- MySQL 处理重复数据
- MySQL 及 SQL 注入
- MySQL 导出数据
- MySQL 导入数据
- MYSQL 函数大全
- MySQL Group By 实例讲解
- MySQL Max()函数实例讲解
- mysql count函数实例
- MYSQL UNION和UNION ALL实例
- MySQL IN 用法
- MySQL between and 实例讲解
- Android 自定义日期段选择控件功能(开始时间-结束时间)
- Yii框架日志操作图文与实例详解
- Flutter 实现虎牙/斗鱼 弹幕功能
- Yii框架中用response保存cookie,用request读取cookie的原理解析
- Yii框架操作cookie与session的方法实例详解
- php中钩子(hook)的原理与简单应用demo示例
- flutter仿微信底部图标渐变功能的实现代码
- Flutter 插件url_launcher简介
- Laravel框架查询构造器 CURD操作示例
- androidx下的fragment的lazy懒加载问题详解
- Thinkphp5框架简单实现钩子(Hook)行为的方法示例
- Android PhoneWindowManager监听屏幕右侧向左滑动实现返回功能
- Laravel框架创建路由的方法详解
- Android 进度条 ProgressBar的实现代码(隐藏、出现、加载进度)
- 解决Laravel blade模板转义html标签的问题