专业的编程技术博客社区

网站首页 > 博客文章 正文

02-分析慢查询导致主从宕机(数据库慢查询分析)

baijin 2024-09-11 00:44:40 博客文章 6 ℃ 0 评论

标签

mysql主从中断, 慢查询,慢查询太大,my_net_writ()

作者

Matthew

时间

2022-02-09

告警

出现告警!
触发告警名: 192.168.32.79:MySQL IO thread stopped
告警开始时间:2022.02.09 09:08:20
告警状态: PROBLEM
告警级别: 严重
监控项值:Slave IO thread running (06_slave_192.168.32.79:mysql.slave.Slave_IO_Running): Connecting
原始事件ID: 170558153
当前时间:2022.02.09 09:08:23

查看错误日志

数据库异常的时间。
022-02-09T08:42:12.598396+08:00 50408604 [Note] Aborted connection 50408604 to db: 'finenter' user: 'finenter' host: '192.168.50.56' (Got an error reading communication packets)
2022-02-09T01:07:34.069934Z mysqld_safe Number of processes running now: 0
2022-02-09T01:07:34.073206Z mysqld_safe mysqld restarted
2022-02-09T09:07:34.308228+08:00 0 [Warning] 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE' and 'ERROR_FOR_DIVISION_BY_ZERO' sql modes should be used with strict mode. They will be merged with strict mode in a future release.
2022-02-09T09:07:34.308361+08:00 0 [Note] --secure-file-priv is set to NULL. Operations related to importing and exporting data are disabled
2022-02-09T09:07:34.308395+08:00 0 [Note] /usr/local/mysql/bin/mysqld (mysqld 5.7.16-log) starting as process 22048 ..

这一段是mysql开始启动的sql
2022-02-09T09:07:37.104002+08:00 0 [Note] InnoDB: Opened 3 undo tablespaces
2022-02-09T09:07:37.104025+08:00 0 [Note] InnoDB: 3 undo tablespaces made active
2022-02-09T09:07:37.104252+08:00 0 [Note] InnoDB: Highest supported file format is Barracuda.
2022-02-09T09:07:37.303950+08:00 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 9352433176052
2022-02-09T09:07:37.304003+08:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 9352433181152
2022-02-09T09:07:37.501117+08:00 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 9352433181152
2022-02-09T09:07:37.506272+08:00 0 [Note] InnoDB: Database was not shutdown normally!
2022-02-09T09:07:37.506291+08:00 0 [Note] InnoDB: Starting crash recovery.
2022-02-09T09:07:37.816635+08:00 0 [Note] InnoDB: Transaction 11296772973 was in the XA prepared state.
2022-02-09T09:07:37.824610+08:00 0 [Note] InnoDB: Transaction 11296772973 was in the XA prepared state.
2022-02-09T09:07:37.828609+08:00 0 [Note] InnoDB: Transaction 11296772975 was in the XA prepared state.
2022-02-09T09:07:37.828632+08:00 0 [Note] InnoDB: Transaction 11296772975 was in the XA prepared state.
2022-02-09T09:07:37.828647+08:00 0 [Note] InnoDB: 2 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

数据库启动完成
2022-02-09T09:07:41.579857+08:00 0 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.7.16-log'  socket: '/usr/local/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL)
2022-02-09T09:07:44.746676+08:00 47 [Note] Aborted connection 47 to db: 'unconnected' user: 'app_finenter' host: '192.168.50.116' (Got an error reading communication packets)
2022-02-09T09:07:45.907252+08:00 75 [Note] Aborted connection 75 to db: 'unconnected' user: 'app_finenter' host: '192.168.50.115' (Got an error reading communication packets)
2022-02-09T09:08:34.038236+08:00 259 [Note] Start binlog_dump to master_thread_id(259) slave_server(32110), pos(, 4)
2022-02-09T09:08:34.039017+08:00 260 [Note] Start binlog_dump to master_thread_id(260) slave_server(3280), pos(, 4)
2022-02-09T09:08:34.042458+08:00 261 [Note] Start binlog_dump to master_thread_id(261) slave_server(3279), pos(, 4)
2022-02-09T09:08:34.046842+08:00 262 [Note] Start binlog_dump to master_thread_id(262) slave_server(3309), pos(, 4)
2022-02-09T09:08:34.077449+08:00 264 [Note] Start binlog_dump to master_thread_id(264) slave_server(3217), pos(, 4)
2022-02-09T09:08:34.083366+08:00 261 [Note] Aborted connection 261 to db: 'unconnected' user: 'repl' host: '192.168.32.79' (Failed on my_net_write())
2022-02-09T09:08:34.111652+08:00 263 [Note] Start binlog_dump to master_thread_id(263) slave_server(3331), pos(, 4)
2022-02-09T09:09:12.242692+08:00 0 [Note] InnoDB: Buffer pool(s) load completed at 220209  9:09:12

重启原因

用pt工具分析一个比较大的慢查询文件,资源不够用导致mysql异常重启了

从库报错

Last_IO_Error: Fatal error: Failed to run 'after_read_event' hook

分析源码

我们知道,mysqld是一个多线程的C/S架构的网络应用,因此少不了通过网络来读写数据,所以可能会出现写数据失败的情况。如果mysql的错误日志中出现此类错误,就说明是mysqld在向客户端发送网络包时失败导致的,当然,引申到复制场景,则说明是复制过程中,master向slave推送binlog时,写网络数据包失败。

查看master_dump_thread 逻辑

需要确定master在send binlog失败的情况下退出dump thread的逻辑,根据错误日志提示,进入到相关的代码查看。
错误代码在如下位置


inline int Binlog_sender::send_packet()
{
  DBUG_ENTER("Binlog_sender::send_packet");
  DBUG_PRINT("info",
             ("Sending event of type %s", Log_event::get_type_str(
                (Log_event_type)m_packet.ptr()[1 + EVENT_TYPE_OFFSET])));
  // We should always use the same buffer to guarantee that the reallocation
  // logic is not broken.
  if (DBUG_EVALUATE_IF("simulate_send_error", true,
                       my_net_write(
                         m_thd->get_protocol_classic()->get_net(),
                         (uchar*) m_packet.ptr(), m_packet.length())))
  {
    set_unknow_error("Failed on my_net_write()");
    DBUG_RETURN(1);
  }

调用关系

(gdb) bt
#0  Binlog_sender::send_packet (this=0x7fea741655d0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1158
#1  0x000000000190f74e in Binlog_sender::send_packet_and_flush (this=0x7fea741655d0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1182
#2  0x000000000190e181 in Binlog_sender::send_heartbeat_event (this=0x7fea741655d0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:1143
#3  0x000000000190ee01 in Binlog_sender::wait_with_heartbeat (this=0x7fea741655d0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:633
#4  0x000000000190ecd7 in Binlog_sender::wait_new_events (this=0x7fea741655d0, log_pos=504) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:599
#5  0x000000000190e938 in Binlog_sender::get_binlog_end_pos (this=0x7fea741655d0, log_cache=0x7fea74165020) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:365
#6  0x000000000190c5e0 in Binlog_sender::send_binlog (this=0x7fea741655d0, log_cache=0x7fea74165020, start_pos=123) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:313
#7  0x000000000190c1b4 in Binlog_sender::run (this=0x7fea741655d0) at /data/mysql-server-explain_ddl/sql/rpl_binlog_sender.cc:225

结果层层返回到Binlog_sender::run

大致看下Binlog_sender::run的逻辑

void Binlog_sender::run()
{
    while (!has_error() && !m_thd->killed)
    {
     if (send_binlog(&log_cache, start_pos))
      break;
    }

}

解释到这里,大概就清楚了吧。

Tags:

本文暂时没有评论,来添加一个吧(●'◡'●)

欢迎 发表评论:

最近发表
标签列表