23 MySQL MHA实验

# MHA 实验

MHA（Master High Availability）是⼀套相对成熟的MySQL⾼可⽤⽅案，能做到在0~30s内⾃动完成数据库的故 障切换操作，在master服务器不宕机的情况下，基本能保证数据的⼀致性。

它由两部分组成：MHA Manager（管理节点）和MHA Node（数据节点）。其中，MHA Manager可以单独部署 在⼀台独⽴的机器上管理多个master-slave集群，也可以部署在⼀台slave上。MHA Node则运⾏在每个mysql节点 上，MHA Manager会定时探测集群中的master节点，当master出现故障时，它⾃动将最新数据的slave提升为 master，然后将其它所有的slave指向新的master。

在MHA⾃动故障切换过程中，MHA试图保存master的⼆进制⽇志，从⽽最⼤程度地保证数据不丢失，当这并不总 是可⾏的，譬如，主服务器硬件故障或⽆法通过ssh访问，MHA就没法保存⼆进制⽇志，这样就只进⾏了故障转移 但丢失了最新数据。可结合MySQL 5.5中推出的半同步复制来降低数据丢失的⻛险。

------

MHA for MySQL：掌握 MySQL 高可用性管理器和工具

MHA 的主要目标是在短时间（通常 10-30 秒）停机时间内自动执行主故障转移和从升级，而不会遇到复制一致性问题，无需花钱购买大量新服务器，不会造成性能损失，也不会产生复杂性（易于操作）安装），并且无需更改现有部署。

MHA还提供了一种预定的在线master切换的方法：在停机的几秒（0.5-2秒）内安全地将当前正在运行的master更改为新的master（仅阻止写入）。

MHA 提供以下功能，并且在需要高可用性、数据完整性、几乎不间断的主维护等要求的许多部署中非常有用。 * 自动主控监控和故障转移 MHA 具有在现有复制环境中监控 MySQL 主控、检测主控故障并自动执行主控故障转移的功能。即使某些从站尚未收到最新的中继日志事件，MHA 也会自动识别来自最新从站的差异中继日志事件，并将差异事件应用于其他从站。所以所有的slave都可以保持一致。 MHA 通常可以在几秒钟内完成故障转移（9-12 秒检测主站故障，可选地 7-10 秒关闭主站电源以避免脑裂，几秒钟将差异中继日志应用到新主站，因此总停机时间为通常为 10-30 秒）。此外，您可以在配置文件中将特定从站定义为候选主站（设置优先级）。由于 MHA 修复了从站之间的一致性，因此您可以将任何从站提升为新的主站，并且不会发生一致性问题（这可能会导致突然的复制失败）。 * 交互式（手动）主站故障转移 您还可以使用 MHA 仅进行故障转移，而不是用于监控主站。您可以交互地使用 MHA 进行主故障转移。 * 非交互式主机故障转移 还支持非交互式主机故障转移（不监视主机，但自动执行故障转移）。这个功能非常有用，特别是当你已经使用过监控MySQL master的软件时。例如，您可以使用[Pacemaker（Heartbeat）](http://www.linux-ha.org/wiki/Pacemaker)来检测主站故障和虚拟IP地址接管，并使用MHA来进行主站故障转移和从站升级。 * 在线切换master到不同的主机 在许多情况下，有必要将现有的master迁移到不同的机器（即当前的master在RAID控制器或RAM上有H/W问题，您想更换为更快的机器等） 。这不是主站崩溃，但需要定期进行主站维护才能做到这一点。计划的主维护会导致停机（至少你不能写主），所以应该尽快完成。另一方面，您应该非常小心地阻止/终止当前正在运行的会话，因为不同 master 之间可能会发生一致性问题（即“更新 master1、更新 master 2、提交 master1、提交 master 2 时出错”将导致数据不一致）。快速主切换和优雅的阻塞写入都是必需的。 MHA 提供了一种方法来做到这一点。您可以在 writer block 的 0.5-2 秒内优雅地切换 master。在许多情况下，0.5-2 秒的写入器停机时间是可以接受的，即使不分配计划的维护窗口，您也可以切换主服务器。这意味着您可以更轻松地采取行动，例如升级到更高版本、更快的机器等。

## 1 集群拓扑

| 角色             | IP           |
| ---------------- | ------------ |
| Master           | 172.31.7.110 |
| Candicate master | 172.31.7.111 |
| Slave            | 172.31.7.112 |
| Monitor节点      | 172.31.7.113 |
| VIP              | 172.31.7.166 |

## 2 准备工作

```bash
# systemctl stop firewalld
# systemctl disable firewalld
# iptables -F
# setenforce 0
# vim /etc/selinux/config
SELINUX=disabled
```

## 3 安装MHA

官网地址：https://github.com/yoshinorim

**安装相关依赖**

- centos 7 或 Rocky Linux 9

```bash
[root@db01 ~]# sudo dnf install epel-release
[root@db01 ~]# yum  -y install perl-DBD-MySQL  ncftp perl-CPAN
```

- Ubuntu 20.04

```sql
apt-get update
apt-get install libdbd-mysql-perl
apt-get install libconfig-tiny-perl
apt-get install liblog-dispatch-perl
apt-get install libparallel-forkmanager-perl

apt-get install libdbd-mysql-perl libconfig-tiny-perl liblog-dispatch-perl libparallel-forkmanager-perl -y
```

### 3.1 在所有节点上部署 MHA Node

- centos 7

```bash
[root@db01 ~]# curl -LO https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
[root@db01 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:mha4mysql-node-0.58-0.el7.centos ################################# [100%]
   
   
# 快速命令
curl -LO https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
yum install epel-release
yum  -y install perl-DBD-MySQL  ncftp perl-CPAN
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
```

- Ubuntu 20.04

```bash
# wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node_0.58-0_all.deb
# dpkg -i mha4mysql-node_0.58-0_all.deb
(Reading database ... 111270 files and directories currently installed.)
Preparing to unpack mha4mysql-node_0.58-0_all.deb ...
Unpacking mha4mysql-node (0.58-0) over (0.58-0) ...
Setting up mha4mysql-node (0.58-0) ...
Processing triggers for man-db (2.10.2-1) ...
```

- 编译安装centos

```sql
# wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
# tar xvf mha4mysql-node-0.58.tar.gz
# cd mha4mysql-node-0.58
# yum install cpan
# yum install perl-Mail-Sender
# perl -MCPAN -e "install inc::Module::Install"
# perl -MCPAN -e "install DBI"
# perl -MCPAN -e "install DBD::mysql"
# perl Makefile.PL
# make
# make install
...
Installing /usr/local/bin/apply_diff_relay_logs
Installing /usr/local/bin/filter_mysqlbinlog
Installing /usr/local/bin/purge_relay_logs
Installing /usr/local/bin/save_binary_logs
...
```

各个⼯具的作⽤：

-  save_binary_logs：保存或复制 master 的⼆进制⽇志 
-  apply_diff_relay_logs：识别差异的 relay log 并将差异的⽇志应⽤到其它 slave 中 
-  filter_mysqlbinlog：去除不必要的ROLLBACK事件（MHA已不再使⽤这个⼯具） 
-  purge_relay_logs：删除 relay log。

### 3.2 在 Monitor 节点上部署 MHA Manager

- centos 7

```bash
[root@monitor ~]# yum install epel-release 
[root@monitor ~]# yum makecache
[root@monitor ~]# yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
[root@monitor ~]# curl -LO https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
[root@monitor ~]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
```

- Ubuntu 20.04

```bash
# wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager_0.58-0_all.deb
# dpkg -i mha4mysql-manager_0.58-0_all.deb
Selecting previously unselected package mha4mysql-manager.
(Reading database ... 111233 files and directories currently installed.)
Preparing to unpack mha4mysql-manager_0.58-0_all.deb ...
Unpacking mha4mysql-manager (0.58-0) ...
Setting up mha4mysql-manager (0.58-0) ...
Processing triggers for man-db (2.10.2-1) ...
```

- 编译安装centos

```sql
# cd /usr/src/
# wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
# tar xvf mha4mysql-manager-0.58.tar.gz
# cd mha4mysql-manager-0.58
# yum install cpan
# perl -MCPAN -e "install Config::Tiny"
# perl -MCPAN -e "install Log::Dispatch"
# perl -MCPAN -e "install Parallel::ForkManager"
# perl Makefile.PL
# make
# make install
Installing /usr/local/bin/masterha_check_repl
Installing /usr/local/bin/masterha_check_ssh
Installing /usr/local/bin/masterha_check_status
Installing /usr/local/bin/masterha_conf_host
Installing /usr/local/bin/masterha_manager
Installing /usr/local/bin/masterha_master_monitor
Installing /usr/local/bin/masterha_master_switch
Installing /usr/local/bin/masterha_secondary_check
Installing /usr/local/bin/masterha_stop
```

各个⼯具的作⽤：

- masterha_check_ssh：检查 MHA 的 SSH 配置状况 
- masterha_check_repl：检查 MySQL 的复制状况 
- masterha_manager：启动 MHA 
- masterha_check_status：检测当前 MHA 运⾏状态 
- masterha_master_monitor：检测 master 是否宕机 
- masterha_master_switch：控制故障转移（⾃动或⼿动） 
- masterha_conf_host：添加或删除配置的server信息 
- masterha_stop：关闭MHA

## 4 配置 SSH 免密码登陆

1.在 manager（172.31.7.113）上配置到所有 Node 节点的⽆密码验证

```sql
# ssh-keygen
⼀路按“Enter”
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.7.110
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.7.111
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.7.112
```

2.在 Master（172.31.7.110）上配置

```sql
# ssh-keygen
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.7.111
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.7.112
```

3.在 Candicate master（172.31.7.111）上配置

```sql
# ssh-keygen
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.7.110
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.7.112
```

4.在 Slave（172.31.7.112）上配置

```sql
# ssh-keygen
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.7.110
# ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.7.111
```

## 5 搭建主从复制环境

- MYSQL配置文件

由于之后slave节点会提升为master节点，所以主从配置要做一些修改

Master配置(参考)

```bash
log_bin = master-bin  # 启动二进制日志
log_bin_index = master-bin.index
relay-log = slave-relay-bin
relay-log-purge = 0  # 禁用或启用不再需要中继日志时是否自动清空它们
```

Slave配置（参考）

```bash
log_bin = master-bin
relay-log = slave-relay-bin
relay-log-index = slave-relay-bin.index
relay-log-purge = 0
read_only = 1
```

- 配置文件

```bash
root@db01:~# cat /etc/my.cnf 
[client]
port = 3306
socket = /data/mysql3306/tmp/mysql.sock
#default_character_set = utf8mb4

[mysqld]
user= mysql
port = 3306
server_id = 1103306
#character-set-server = utf8mb4
datadir = /data/mysql3306/data
socket = /data/mysql3306/tmp/mysql.sock
log_error = /data/mysql3306/log/mysql.err
pid_file = /data/mysql3306/data/mysql.pid
slow-query-log
slow_query_log_file = /data/mysql3306/log/slowquery.log
tmpdir = /data/mysql3306/tmp/
log_bin = /data/mysql3306/log/mysql-bin
#replication
binlog_format = row
binlog_row_image = full
expire_logs_days = 10
relay-log = /data/mysql3306/log/relay-log
slave_net_timeout = 30
#skip-slave-start
slave-parallel-workers = 0
relay_log_info_repository = TABLE
master_info_repository = TABLE
sync_binlog = 1
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1

root@db02:~# cat /etc/my.cnf 
[client]
port = 3306
socket = /data/mysql3306/tmp/mysql.sock

[mysqld]
user= mysql
port = 3306
server_id = 1113306
datadir = /data/mysql3306/data
socket = /data/mysql3306/tmp/mysql.sock
log_error = /data/mysql3306/log/mysql.err
pid_file = /data/mysql3306/data/mysql.pid
slow-query-log
slow_query_log_file = /data/mysql3306/log/slowquery.log
tmpdir = /data/mysql3306/tmp/
log_bin = /data/mysql3306/log/mysql-bin
#replication
binlog_format = row
binlog_row_image = full
expire_logs_days = 10
relay-log = /data/mysql3306/log/relay-log
slave_net_timeout = 30
#skip-slave-start
slave-parallel-workers = 0
relay_log_info_repository = TABLE
master_info_repository = TABLE
sync_binlog = 1
#gtid
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1

root@db03:~# cat /etc/my.cnf 
[client]
port = 3306
socket = /data/mysql3306/tmp/mysql.sock

[mysqld]
user= mysql
port = 3306
server_id = 1123306
datadir = /data/mysql3306/data
socket = /data/mysql3306/tmp/mysql.sock
log_error = /data/mysql3306/log/mysql.err
pid_file = /data/mysql3306/data/mysql.pid
slow-query-log
slow_query_log_file = /data/mysql3306/log/slowquery.log
tmpdir = /data/mysql3306/tmp/
log_bin = /data/mysql3306/log/mysql-bin
#replication
binlog_format = row
binlog_row_image = full
expire_logs_days = 10
relay-log = /data/mysql3306/log/relay-log
slave_net_timeout = 30
#skip-slave-start
slave-parallel-workers = 0
relay_log_info_repository = TABLE
master_info_repository = TABLE
sync_binlog = 1
#gtid
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1
```

### 5.1 在Master上执⾏备份

```sql
# mysqldump --master-data=2 --single-transaction -R --triggers -A > all.sql
```

其中，-R是备份存储过程，--triggers是备份触发器 -A代表全库

### 5.2 在Master上创建复制⽤户

```sql
create user 'repl'@'%' identified with mysql_native_password by 'repl123456';
grant replication slave on *.* to 'repl'@'%';
```

### 5.3 查看备份⽂件 all.sql 中的 CHANGE MASTER 语句(位置点参考)

```sql
# head -n 30 all.sql | grep "CHANGE MASTER TO"
# -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000003', MASTER_LOG_POS=157;

root@db01:/data# head -n 30 all.sql | grep "CHANGE MASTER TO"
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=157;
```

### 5.4 将备份⽂件复制到 Candicate master 和 Slave 上

```sql
# scp all.sql 172.31.7.111:/root/
# scp all.sql 172.31.7.112:/root/
```

### 5.5 在 Candicate master 上搭建从库

```bash
CHANGE MASTER TO MASTER_HOST='172.31.7.110',master_port=3306, MASTER_USER='repl', MASTER_PASSWORD='repl123456', master_auto_position=1;
start slave;
show slave status\G
```

（位置点参考）位置点与binlog文件大小的关系

```bash
root@db01:/data/mysql3306/log# ll mysql-bin.000016 
-rw-r----- 1 mysql mysql 197 Apr 26 09:16 mysql-bin.000016
root@db01:/data/mysql3306/log# mysql -e 'show master status;'
+------------------+----------+--------------+------------------+--------------------------------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                          |
+------------------+----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000016 |      197 |              |                  | 32049408-02a5-11ef-8b62-000c29f23176:1-361 |
+------------------+----------+--------------+------------------+--------------------------------------------+
```

```sql
# mysql < all.sql
CHANGE MASTER TO
 MASTER_HOST='172.31.7.110',
 MASTER_USER='repl',
 MASTER_PASSWORD='repl123456',
 MASTER_LOG_FILE='mysql-bin.000001',
 MASTER_LOG_POS=157;
start slave;
show slave status\G

# 报错
root@db03:~# mysql < all.sql
ERROR 1418 (HY000) at line 2623: This function has none of DETERMINISTIC, NO SQL, or READS SQL DATA in its declaration and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable)
临时解决：
SET GLOBAL log_bin_trust_function_creators = 1;
```

### 5.6 在 Slave 上搭建从库

```sql
CHANGE MASTER TO MASTER_HOST='172.31.7.110',master_port=3306, MASTER_USER='repl', MASTER_PASSWORD='repl123456', master_auto_position=1;
start slave;
show slave status\G
```

### 5.7 从库服务器设置为 super_read_only

```sql
set global super_read_only=1;
```

### 5.8 在 Master 中创建监控⽤户

```sql
create user 'mha_monitor'@'%' identified with mysql_native_password by 'monitor123';
grant reload, process, super, replication client on *.* to 'mha_monitor'@'%';
grant all privileges on mysql.apply_diff_relay_logs_test to 'mha_monitor'@'%';
grant select on mysql.slave_relay_log_info to 'mha_monitor'@'%';
grant select on mysql.user to 'mha_monitor'@'%';
```

## 6 配置 MHA

### 6.1 创建配置⽂件

在 Monitor 主机（172.31.7.113）上创建MHA⼯作⽬录，并且创建相关配置⽂件

```bash
# mkdir -p /etc/masterha 
# vim /etc/masterha/app1.cnf

[server default]
manager_log=/masterha/app1/manager.log
manager_workdir=/masterha/app1
master_binlog_dir=/data/mysql3306/log
master_ip_failover_script= /usr/local/bin/master_ip_failover
master_ip_online_change_script= /usr/local/bin/master_ip_online_change
user=mha_monitor
password=monitor123
ping_interval=1
remote_workdir=/tmp
repl_user=repl
repl_password=repl123456
report_script=/usr/local/bin/send_report
secondary_check_script= /usr/local/bin/masterha_secondary_check -s 172.31.7.111 -s 172.31.7.112 --user=root --master_host=172.31.7.110 --master_ip=172.31.7.110 --master_port=3306
shutdown_script=""
ssh_user=root
#client_bindir=/usr/local/mysql/bin
client_bindir=/opt/mysql-8.0.36/bin
[server1]
hostname=172.31.7.110
port=3306

[server2]
hostname=172.31.7.111
port=3306
candidate_master=1
check_repl_delay=0

[server3]
hostname=172.31.7.112
port=3306
check_repl_delay=0

# 用gtid的模式需要添加下面参数，如果传统复制可以不加，建议开启半同步
[binlog1]
no_master=1
hostname=172.31.7.110
master_binlog_dir=/data/mysql3306/log
```

各参数的具体含义可参考：https://www.cnblogs.com/ivictor/p/5686275.html

```bash
[server default]
manager_workdir=/var/log/masterha/app1 ##工作目录 目录需要自己创建
manager_log=/var/log/masterha/app1/manager.log ##日志文件 目录需要自己创建
master_ip_failover_script=/etc/masterha/master_ip_failover ##vip切换脚本
user=mha_monitor ##操作mysql的账户 三台主机都必须一样
password=mha_monitor ##操作mysql的密码 三台主机都必须一样
ssh_user=root ##ssh面密登录账户
repl_user=repl ##mysql的复制账户
repl_password=repl ##mysql的复制密码
ping_interval=1   ##每隔ping_interval秒检测主库心跳，最多四次机会，如果都没有心跳，主库宕机
[server1]
hostname=172.16.100.101   ##mysql master主机ip；
port=3306    ##端口；
master_binlog_dir=/var/lib/mysql/data/mha-server     ##二进制日志目录；
candidate_master=1    #设置为候选master，如果设置该参数以后，发生主从切换以后将会将此从库提升为主库，即使这个主库不是集群中事件最新的slave。
[server2]
hostname=172.16.100.102    ##从机ip；
port=3306   ##从机端口；
master_binlog_dir=/var/lib/mysql/data/mha-server    ##二进制日志目录；
candidate_master=1  ##是否候选备用master主机；
check_repl_delay=0   ### 默认情况下，如果一个slave落后master 100M的relay logs的话，MHA将不会选择该slave作为一个新的master，因为对于这个slave的恢复需要花费很长时间。如果设置 check_repl_delay=0,MHA在触发切换选择一个新的master的时候将会忽略复制延时，这个参数对于设置了candidate_master=1的主机非常有用，可以保证候选主在切换后一定是新master。
[server3]
hostname=192.168.88.131     ##从机ip；
port=3306；
master_binlog_dir=/var/lib/mysql/data/mha-server；
no_master=1    ##不适合做主库的从机。
```

### 6.2 设置 relay log 清除⽅式

```sql
mysql> set global relay_log_purge=0;
```

### 6.3 设置 relay log 定期清理脚本

```bash
# vim /etc/cron.d/purge_relay_logs
0 1 * * * /usr/local/bin/purge_relay_logs --host=127.0.0.1 --port=3306 --user=mha_monitor --password=monitor123 -disable_relay_log_purge --workdir=/tmp/ >>/tmp/purge_relay_logs.log 2>&1
```

## 7 创建脚本

参考：https://github.com/yoshinorim/mha4mysql-manager/tree/master/samples/scripts

- /usr/local/bin/master_ip_failover

```perl
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);

my $vip = '172.31.7.166';
my $nic_name = 'ens33';
my $key = "2";
my $ssh_start_vip = "/sbin/ifconfig $nic_name:$key $vip/24";
my $ssh_stop_vip = "/sbin/ifconfig $nic_name:$key down";
my $ssh_send_garp = "/sbin/arping -U $vip -I $nic_name -c 1";

GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {
      print "Disabling the VIP an old master: $orig_master_host \n";
      &stop_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {

# all arguments are passed.
    # If you manage master ip address at global catalog database,
    # activate new_master_ip here.
    # You can also grant write access (create user, set read_only=0, etc) here.
    my $exit_code = 10;
    eval {
      
      my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print "Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

## Creating an app user on the new master
      # print "Creating app user on the new master..\n";
      # FIXME_xxx_create_user( $new_master_handler->{dbh} );
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

print "Enabling the VIP $vip on the new master: $new_master_host \n";
      &start_vip();
      $exit_code = 0;
    };
    if ($@) {
      warn $@;

# If you want to continue failover, exit 10.
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

# do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub start_vip(){
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    `ssh $ssh_user\@$new_master_host \" $ssh_send_garp \"`;
}

sub stop_vip(){
    return 0  unless  ($ssh_user);
    `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
```

- /usr/local/bin/master_ip_online_change

```perl
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;

my $vip = '172.31.7.166';
my $nic_name = 'ens33';
my $key = "2";
my $ssh_start_vip = "/sbin/ifconfig $nic_name:$key $vip/24";
my $ssh_stop_vip = "/sbin/ifconfig $nic_name:$key down";
my $ssh_send_garp = "/sbin/arping -U $vip -I $nic_name -c 1";

my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,  $new_master_ssh_user,
);
GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

exit &main();

sub start_vip(){
    `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
    `ssh $new_master_ssh_user\@$new_master_host \" $ssh_send_garp \"`;
}

sub stop_vip(){
    `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

push @threads, $ref;
  }
  return @threads;
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    # 1. Set read_only= 1 on the new master
    # 2. DROP USER so that no app user can establish new connections
    # 3. Set read_only= 1 on the current master
    # 4. Kill current queries
    # * Any database access failure will result in script die.
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

# Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
      $orig_master_handler->disable_log_bin_local();
      # print current_time_us() . " Drpping app user on the orig master..\n";
      #drop_app_user($orig_master_handler);

## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      $orig_master_handler->enable_log_bin_local();
      $orig_master_handler->disconnect();

## Droping the VIP     
      print "Disabling the VIP an old master: $orig_master_host \n";
      &stop_vip();

## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    # 1. Create app user with write privileges
    # 2. Moving backup script if needed
    # 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

## Set read_only=0 on the new master
      $new_master_handler->disable_log_bin_local();
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

## Creating an app user on the new master
      #print current_time_us() . " Creating app user on the new master..\n";
      # create_app_user($new_master_handler);
      print "Enabling the VIP $vip on the new master: $new_master_host \n";
      &start_vip();
      $new_master_handler->enable_log_bin_local();
      $new_master_handler->disconnect();

## Update master ip on the catalog database, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {

# do nothing
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
  die;
}
```

- /usr/local/bin/send_report

```perl
#!/usr/bin/perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.126.com';
my $mail_from='slowtech@126.com';
my $mail_user='slowtech@126.com';
my $mail_pass='xxxxx';
my $mail_to=['slowtech@126.com'];
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /tmp/monitormail.log"
        or die "Can't open the debug      file:$!\n";
    my $sender = new Mail::Sender {
        ctype       => 'text/plain; charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed => '0',
        authid      => $user,
        authpwd     => $passwd,
        to          => $mail_to,
        subject     => $subject,
        debug       => $DEBUG
    };

$sender->MailMsg(
        {   msg   => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}

# Do whatever you want here
exit 0;
```

```python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import yagmail
import argparse
# 创建 ArgumentParser 对象并定义命令⾏参数
parser = argparse.ArgumentParser(description='send_report for MHA')
parser.add_argument('--orig_master_host', type=str, help='original master host')
parser.add_argument('--new_master_host', type=str, help='new master host')
parser.add_argument('--new_slave_hosts', type=str, help='new slave hosts')
parser.add_argument('--conf', type=str, help='configuration file')
parser.add_argument('--subject', type=str, help='email subject')
parser.add_argument('--body', type=str, help='email body')
# 解析命令⾏参数
args = parser.parse_args()
# 从环境变量中读取⽤户名和密码
email_user = os.environ.get('EMAIL_USER')
email_password = os.environ.get('EMAIL_PASSWORD')
# 创建 yagmail.SMTP 对象
yag = yagmail.SMTP(user=email_user, password=email_password, host='smtp.126.com')
# 发送邮件
yag.send(
    to=email_user,
    subject=args.subject,
    contents=args.body
)
```

- 采用python发送邮件

```python
root@monitor:~# cat /usr/local/bin/send_report 
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import yagmail
import argparse
# 创建 ArgumentParser 对象并定义命令⾏参数
parser = argparse.ArgumentParser(description='send_report for MHA')
parser.add_argument('--orig_master_host', type=str, help='original master host')
parser.add_argument('--new_master_host', type=str, help='new master host')
parser.add_argument('--new_slave_hosts', type=str, help='new slave hosts')
parser.add_argument('--conf', type=str, help='configuration file')
parser.add_argument('--subject', type=str, help='email subject')
parser.add_argument('--body', type=str, help='email body')
# 解析命令⾏参数
args = parser.parse_args()
# 从环境变量中读取⽤户名和密码
email_user = os.environ.get('EMAIL_USER')
email_password = os.environ.get('EMAIL_PASSWORD')
# 创建 yagmail.SMTP 对象
yag = yagmail.SMTP(user='1210353303@qq.com', password='wyfzqfjkyhfjigai', host='smtp.qq.com')
# 发送邮件
yag.send(
    to='yanzhao.li@qq.com',
    subject=args.subject,
    contents=args.body
)
```

给脚本添加可执⾏权限

```bash
# chmod +x /usr/local/bin/master_ip_failover
# chmod +x /usr/local/bin/master_ip_online_change
# chmod +x /usr/local/bin/send_report
```

## 8 检查 SSH 的配置

在 Monitor 主机上执⾏

```bash
# masterha_check_ssh --conf=/etc/masterha/app1.cnf

root@monitor:~# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Thu Apr 25 13:07:53 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Apr 25 13:07:53 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Apr 25 13:07:53 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Apr 25 13:07:53 2024 - [info] Starting SSH connection tests..
Thu Apr 25 13:07:54 2024 - [debug] 
Thu Apr 25 13:07:53 2024 - [debug]  Connecting via SSH from root@172.31.7.110(172.31.7.110:22) to root@172.31.7.111(172.31.7.111:22)..
Thu Apr 25 13:07:53 2024 - [debug]   ok.
Thu Apr 25 13:07:53 2024 - [debug]  Connecting via SSH from root@172.31.7.110(172.31.7.110:22) to root@172.31.7.112(172.31.7.112:22)..
Thu Apr 25 13:07:54 2024 - [debug]   ok.
Thu Apr 25 13:07:54 2024 - [debug] 
Thu Apr 25 13:07:53 2024 - [debug]  Connecting via SSH from root@172.31.7.111(172.31.7.111:22) to root@172.31.7.110(172.31.7.110:22)..
Thu Apr 25 13:07:54 2024 - [debug]   ok.
Thu Apr 25 13:07:54 2024 - [debug]  Connecting via SSH from root@172.31.7.111(172.31.7.111:22) to root@172.31.7.112(172.31.7.112:22)..
Thu Apr 25 13:07:54 2024 - [debug]   ok.
Thu Apr 25 13:07:55 2024 - [debug] 
Thu Apr 25 13:07:54 2024 - [debug]  Connecting via SSH from root@172.31.7.112(172.31.7.112:22) to root@172.31.7.110(172.31.7.110:22)..
Thu Apr 25 13:07:54 2024 - [debug]   ok.
Thu Apr 25 13:07:54 2024 - [debug]  Connecting via SSH from root@172.31.7.112(172.31.7.112:22) to root@172.31.7.111(172.31.7.111:22)..
Thu Apr 25 13:07:55 2024 - [debug]   ok.
Thu Apr 25 13:07:55 2024 - [info] All SSH connection tests passed successfully.
Use of uninitialized value in exit at /usr/bin/masterha_check_ssh line 44.
```

## 9 查看整个集群的状态

在 Monitor 主机上执⾏

```bash
# masterha_check_repl --conf=/etc/masterha/app1.cnf

root@monitor:~# masterha_check_repl --conf=/etc/masterha/app1.cnf
Thu Apr 25 13:19:21 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Apr 25 13:19:21 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Apr 25 13:19:21 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Apr 25 13:19:21 2024 - [info] MHA::MasterMonitor version 0.58.
Thu Apr 25 13:19:22 2024 - [info] GTID failover mode = 1
Thu Apr 25 13:19:22 2024 - [info] Dead Servers:
Thu Apr 25 13:19:22 2024 - [info] Alive Servers:
Thu Apr 25 13:19:22 2024 - [info]   172.31.7.110(172.31.7.110:3306)
Thu Apr 25 13:19:22 2024 - [info]   172.31.7.111(172.31.7.111:3306)
Thu Apr 25 13:19:22 2024 - [info]   172.31.7.112(172.31.7.112:3306)
Thu Apr 25 13:19:22 2024 - [info] Alive Slaves:
Thu Apr 25 13:19:22 2024 - [info]   172.31.7.111(172.31.7.111:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 13:19:22 2024 - [info]     GTID ON
Thu Apr 25 13:19:22 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 13:19:22 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Apr 25 13:19:22 2024 - [info]   172.31.7.112(172.31.7.112:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 13:19:22 2024 - [info]     GTID ON
Thu Apr 25 13:19:22 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 13:19:22 2024 - [info] Current Alive Master: 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 13:19:22 2024 - [info] Checking slave configurations..
Thu Apr 25 13:19:22 2024 - [info] Checking replication filtering settings..
Thu Apr 25 13:19:22 2024 - [info]  binlog_do_db= , binlog_ignore_db= 
Thu Apr 25 13:19:22 2024 - [info]  Replication filtering check ok.
Thu Apr 25 13:19:22 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Thu Apr 25 13:19:22 2024 - [info] Checking SSH publickey authentication settings on the current master..
Thu Apr 25 13:19:23 2024 - [info] HealthCheck: SSH to 172.31.7.110 is reachable.
Thu Apr 25 13:19:23 2024 - [info] 
172.31.7.110(172.31.7.110:3306) (current master)
 +--172.31.7.111(172.31.7.111:3306)
 +--172.31.7.112(172.31.7.112:3306)

Thu Apr 25 13:19:23 2024 - [info] Checking replication health on 172.31.7.111..
Thu Apr 25 13:19:23 2024 - [info]  ok.
Thu Apr 25 13:19:23 2024 - [info] Checking replication health on 172.31.7.112..
Thu Apr 25 13:19:23 2024 - [info]  ok.
Thu Apr 25 13:19:23 2024 - [info] Checking master_ip_failover_script status:
Thu Apr 25 13:19:23 2024 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.31.7.110 --orig_master_ip=172.31.7.110 --orig_master_port=3306 
Thu Apr 25 13:19:23 2024 - [info]  OK.
Thu Apr 25 13:19:23 2024 - [warning] shutdown_script is not defined.
Thu Apr 25 13:19:23 2024 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.
```

### 9.1 常⻅报错

1.Redundant argument in sprintf at /usr/local/share/perl5/5.32/MHA/NodeUtil.pm line 202

```bash
[error][/usr/local/share/perl5/5.32/MHA/MasterMonitor.pm, ln427] Error happened on
checking configurations. Redundant argument in sprintf at
/usr/local/share/perl5/5.32/MHA/NodeUtil.pm line 202.
```

解决⽅法：

```bash
$str =~ s/\.[^.]+$//;
my $result = sprintf( '%03d%03d', $str =~ m/(\d+)/g );
```

2.Can't exec "mysqlbinlog": No such file or directory at /usr/local/share/perl5/5.32/MHA/BinlogManager.pm line 106.

```bash
Can't exec "mysqlbinlog": No such file or directory at
/usr/local/share/perl5/5.32/MHA/BinlogManager.pm line 106.
mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH,
and client options
 at /usr/local/bin/apply_diff_relay_logs line 532.
```

解决⽅法：配置⽂件中指定 client_bindir。

```bash
client_bindir=/usr/local/mysql/bin
```

3.Ubuntu 20.04 检查集群状态报错

```bash
root@monitor:~# masterha_check_repl --conf=/etc/masterha/app1.cnf
Thu Apr 25 13:17:47 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Apr 25 13:17:47 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Apr 25 13:17:47 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Apr 25 13:17:47 2024 - [info] MHA::MasterMonitor version 0.58.
Thu Apr 25 13:17:48 2024 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Redundant argument in sprintf at /usr/share/perl5/MHA/NodeUtil.pm line 201.
Thu Apr 25 13:17:48 2024 - [error][/usr/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Thu Apr 25 13:17:48 2024 - [info] Got exit code 1 (Not master dead).

MySQL Replication Health is NOT OK!

#解决方案：
vim /usr/share/perl5/MHA/NodeUtil.pm +195

#sub parse_mysql_version($) {
#  my $str = shift;
#  my $result = sprintf( '%03d%03d%03d', $str =~ m/(\d+)/g );
#  return $result;
#}
sub parse_mysql_version($) {
  my $str = shift;
 ($str) =  $str =~ m/^[^-]*/g;
  my $result = sprintf( '%03d%03d%03d', $str =~ m/(\d+)/g );
  return $result;
}

#sub parse_mysql_major_version($) {
#  my $str = shift;
#  my $result = sprintf( '%03d%03d', $str =~ m/(\d+)/g );
#  return $result;
#}
sub parse_mysql_major_version($) {
  my $str = shift;
  $str =~ /(\d+)\.(\d+)/;
  my $strmajor = "$1.$2";
  my $result = sprintf( '%03d%03d', $strmajor =~ m/(\d+)/g );
  return $result;
}

# 参考：
# https://www.cnblogs.com/windman0/p/16058777.html
# https://github.com/yoshinorim/mha4mysql-manager/issues/116
```

## 10 检查 MHA Manager 的状态

```bash
# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
```

如果正常，会显示 PING_OK ，否则会显示 NOT_RUNNING ，代表 MHA 监控还没有开启。

## 11 开启 MHA Manager 监控

```bash
root@monitor:~# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /masterha/app1/manager.log 2>&1 &

# 测试去除--remove_dead_master_conf 参数 方便测试
nohup masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover < /dev/null > /masterha/app1/manager.log 2>&1 &

root@monitor:~# nohup masterha_manager --conf=/etc/masterha/app1.cnf --ignore_last_failover < /dev/null > /masterha/app1/manager.log 2>&1 &
[1] 22278

# tailf /masterha/app1/manager.log

# masterha_check_status --conf=/etc/masterha/app1.cnf

root@monitor:~# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:5058) is running(0:PING_OK), master:172.31.7.110
```

关闭

```bash
root@monitor:~# masterha_stop --conf=/etc/masterha/app1.cnf
```

## 12 Master 上挂载 VIP

```bash
# /sbin/ifconfig ens160:2 172.31.7.166/24

root@db01:~# apt install arping -y
root@db01:~# apt install net-tools -y
root@db01:~# ifconfig ens33:2 172.31.7.166/24

# 验证
root@db03:~# mysql -h 172.31.7.166 -u repl -prepl123456
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 48
Server version: 8.0.36 MySQL Community Server - GPL

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select @@server_id;
+-------------+
| @@server_id |
+-------------+
|     1103306 |
+-------------+
1 row in set (0.00 sec)
```

## 13 模拟主库故障

- 关闭主库

```bash
root@db01:~# systemctl stop mysql
```

- 查看日志

```bash
root@monitor:~# tail -f /masterha/app1/manager.log 
 +--172.31.7.112(172.31.7.112:3306)

Thu Apr 25 17:43:58 2024 - [info] Checking master_ip_failover_script status:
Thu Apr 25 17:43:58 2024 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.31.7.110 --orig_master_ip=172.31.7.110 --orig_master_port=3306 
Thu Apr 25 17:43:59 2024 - [info]  OK.
Thu Apr 25 17:43:59 2024 - [warning] shutdown_script is not defined.
Thu Apr 25 17:43:59 2024 - [info] Set master ping interval 1 seconds.
Thu Apr 25 17:43:59 2024 - [info] Set secondary check script: /usr/bin/masterha_secondary_check -s 172.31.7.111 -s 172.31.7.112 --user=root --master_host=172.31.7.110 --master_ip=172.31.7.110 --master_port=3306
Thu Apr 25 17:43:59 2024 - [info] Starting ping health check on 172.31.7.110(172.31.7.110:3306)..
Thu Apr 25 17:43:59 2024 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Thu Apr 25 17:44:17 2024 - [warning] Got error on MySQL select ping: 1053 (Server shutdown in progress)
Thu Apr 25 17:44:17 2024 - [info] Executing secondary network check script: /usr/bin/masterha_secondary_check -s 172.31.7.111 -s 172.31.7.112 --user=root --master_host=172.31.7.110 --master_ip=172.31.7.110 --master_port=3306  --user=root  --master_host=172.31.7.110  --master_ip=172.31.7.110  --master_port=3306 --master_user=mha_monitor --master_password=monitor123 --ping_type=SELECT
Thu Apr 25 17:44:17 2024 - [info] Executing SSH check script: exit 0
Thu Apr 25 17:44:17 2024 - [info] HealthCheck: SSH to 172.31.7.110 is reachable.
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_TIME = "en_DK.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Monitoring server 172.31.7.111 is reachable, Master is not reachable from 172.31.7.111. OK.
Thu Apr 25 17:44:18 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.7.110:3306' (111))
Thu Apr 25 17:44:18 2024 - [warning] Connection failed 2 time(s)..
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_TIME = "en_DK.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Monitoring server 172.31.7.112 is reachable, Master is not reachable from 172.31.7.112. OK.
Thu Apr 25 17:44:18 2024 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Thu Apr 25 17:44:19 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.7.110:3306' (111))
Thu Apr 25 17:44:19 2024 - [warning] Connection failed 3 time(s)..
Thu Apr 25 17:44:20 2024 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '172.31.7.110:3306' (111))
Thu Apr 25 17:44:20 2024 - [warning] Connection failed 4 time(s)..
Thu Apr 25 17:44:20 2024 - [warning] Master is not reachable from health checker!
Thu Apr 25 17:44:20 2024 - [warning] Master 172.31.7.110(172.31.7.110:3306) is not reachable!
Thu Apr 25 17:44:20 2024 - [warning] SSH is reachable.
Thu Apr 25 17:44:20 2024 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Thu Apr 25 17:44:20 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Apr 25 17:44:20 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Apr 25 17:44:20 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Apr 25 17:44:21 2024 - [info] GTID failover mode = 1
Thu Apr 25 17:44:21 2024 - [info] Dead Servers:
Thu Apr 25 17:44:21 2024 - [info]   172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:21 2024 - [info] Alive Servers:
Thu Apr 25 17:44:21 2024 - [info]   172.31.7.111(172.31.7.111:3306)
Thu Apr 25 17:44:21 2024 - [info]   172.31.7.112(172.31.7.112:3306)
Thu Apr 25 17:44:21 2024 - [info] Alive Slaves:
Thu Apr 25 17:44:21 2024 - [info]   172.31.7.111(172.31.7.111:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 17:44:21 2024 - [info]     GTID ON
Thu Apr 25 17:44:21 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:21 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Apr 25 17:44:21 2024 - [info]   172.31.7.112(172.31.7.112:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 17:44:21 2024 - [info]     GTID ON
Thu Apr 25 17:44:21 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:21 2024 - [info] Checking slave configurations..
Thu Apr 25 17:44:21 2024 - [info] Checking replication filtering settings..
Thu Apr 25 17:44:21 2024 - [info]  Replication filtering check ok.
Thu Apr 25 17:44:21 2024 - [info] Master is down!
Thu Apr 25 17:44:21 2024 - [info] Terminating monitoring script.
Thu Apr 25 17:44:21 2024 - [info] Got exit code 20 (Master dead).
Thu Apr 25 17:44:21 2024 - [info] MHA::MasterFailover version 0.58.
Thu Apr 25 17:44:21 2024 - [info] Starting master failover.
Thu Apr 25 17:44:21 2024 - [info] 
Thu Apr 25 17:44:21 2024 - [info] * Phase 1: Configuration Check Phase..
Thu Apr 25 17:44:21 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] GTID failover mode = 1
Thu Apr 25 17:44:22 2024 - [info] Dead Servers:
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:22 2024 - [info] Checking master reachability via MySQL(double check)...
Thu Apr 25 17:44:22 2024 - [info]  ok.
Thu Apr 25 17:44:22 2024 - [info] Alive Servers:
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.111(172.31.7.111:3306)
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.112(172.31.7.112:3306)
Thu Apr 25 17:44:22 2024 - [info] Alive Slaves:
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.111(172.31.7.111:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 17:44:22 2024 - [info]     GTID ON
Thu Apr 25 17:44:22 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:22 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.112(172.31.7.112:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 17:44:22 2024 - [info]     GTID ON
Thu Apr 25 17:44:22 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:22 2024 - [info] Starting GTID based failover.
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] * Phase 2: Dead Master Shutdown Phase..
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] Forcing shutdown so that applications never connect to the current master..
Thu Apr 25 17:44:22 2024 - [info] Executing master IP deactivation script:
Thu Apr 25 17:44:22 2024 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=172.31.7.110 --orig_master_ip=172.31.7.110 --orig_master_port=3306 --command=stopssh --ssh_user=root  
Disabling the VIP an old master: 172.31.7.110 
Thu Apr 25 17:44:22 2024 - [info]  done.
Thu Apr 25 17:44:22 2024 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Apr 25 17:44:22 2024 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] * Phase 3: Master Recovery Phase..
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] The latest binary log file/position on all slaves is mysql-bin.000013:197
Thu Apr 25 17:44:22 2024 - [info] Retrieved Gtid Set: 32049408-02a5-11ef-8b62-000c29f23176:1-126
Thu Apr 25 17:44:22 2024 - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.111(172.31.7.111:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 17:44:22 2024 - [info]     GTID ON
Thu Apr 25 17:44:22 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:22 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.112(172.31.7.112:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 17:44:22 2024 - [info]     GTID ON
Thu Apr 25 17:44:22 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:22 2024 - [info] The oldest binary log file/position on all slaves is mysql-bin.000013:197
Thu Apr 25 17:44:22 2024 - [info] Retrieved Gtid Set: 32049408-02a5-11ef-8b62-000c29f23176:1-126
Thu Apr 25 17:44:22 2024 - [info] Oldest slaves:
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.111(172.31.7.111:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 17:44:22 2024 - [info]     GTID ON
Thu Apr 25 17:44:22 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:22 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.112(172.31.7.112:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 17:44:22 2024 - [info]     GTID ON
Thu Apr 25 17:44:22 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] * Phase 3.3: Determining New Master Phase..
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] Searching new master from slaves..
Thu Apr 25 17:44:22 2024 - [info]  Candidate masters from the configuration file:
Thu Apr 25 17:44:22 2024 - [info]   172.31.7.111(172.31.7.111:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 17:44:22 2024 - [info]     GTID ON
Thu Apr 25 17:44:22 2024 - [info]     Replicating from 172.31.7.110(172.31.7.110:3306)
Thu Apr 25 17:44:22 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Apr 25 17:44:22 2024 - [info]  Non-candidate masters:
Thu Apr 25 17:44:22 2024 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Thu Apr 25 17:44:22 2024 - [info] New master is 172.31.7.111(172.31.7.111:3306)
Thu Apr 25 17:44:22 2024 - [info] Starting master failover..
Thu Apr 25 17:44:22 2024 - [info] 
From:
172.31.7.110(172.31.7.110:3306) (current master)
 +--172.31.7.111(172.31.7.111:3306)
 +--172.31.7.112(172.31.7.112:3306)

To:
172.31.7.111(172.31.7.111:3306) (new master)
 +--172.31.7.112(172.31.7.112:3306)
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info] * Phase 3.3: New Master Recovery Phase..
Thu Apr 25 17:44:22 2024 - [info] 
Thu Apr 25 17:44:22 2024 - [info]  Waiting all logs to be applied.. 
Thu Apr 25 17:44:22 2024 - [info]   done.
Thu Apr 25 17:44:22 2024 - [info] Getting new master's binlog name and position..
Thu Apr 25 17:44:22 2024 - [info]  mysql-bin.000001:2520343
Thu Apr 25 17:44:22 2024 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.31.7.111', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Apr 25 17:44:22 2024 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 2520343, 32049408-02a5-11ef-8b62-000c29f23176:1-126
Thu Apr 25 17:44:22 2024 - [info] Executing master IP activate script:
Thu Apr 25 17:44:22 2024 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=172.31.7.110 --orig_master_ip=172.31.7.110 --orig_master_port=3306 --new_master_host=172.31.7.111 --new_master_ip=172.31.7.111 --new_master_port=3306 --new_master_user='mha_monitor'   --new_master_password=xxx
Set read_only=0 on the new master.
Enabling the VIP 172.31.7.166 on the new master: 172.31.7.111 
Thu Apr 25 17:44:24 2024 - [info]  OK.
Thu Apr 25 17:44:24 2024 - [info] ** Finished master recovery successfully.
Thu Apr 25 17:44:24 2024 - [info] * Phase 3: Master Recovery Phase completed.
Thu Apr 25 17:44:24 2024 - [info] 
Thu Apr 25 17:44:24 2024 - [info] * Phase 4: Slaves Recovery Phase..
Thu Apr 25 17:44:24 2024 - [info] 
Thu Apr 25 17:44:24 2024 - [info] 
Thu Apr 25 17:44:24 2024 - [info] * Phase 4.1: Starting Slaves in parallel..
Thu Apr 25 17:44:24 2024 - [info] 
Thu Apr 25 17:44:24 2024 - [info] -- Slave recovery on host 172.31.7.112(172.31.7.112:3306) started, pid: 22336. Check tmp log /masterha/app1/172.31.7.112_3306_20240425174421.log if it takes time..
Thu Apr 25 17:44:26 2024 - [info] 
Thu Apr 25 17:44:26 2024 - [info] Log messages from 172.31.7.112 ...
Thu Apr 25 17:44:26 2024 - [info] 
Thu Apr 25 17:44:24 2024 - [info]  Resetting slave 172.31.7.112(172.31.7.112:3306) and starting replication from the new master 172.31.7.111(172.31.7.111:3306)..
Thu Apr 25 17:44:24 2024 - [info]  Executed CHANGE MASTER.
Thu Apr 25 17:44:25 2024 - [info]  Slave started.
Thu Apr 25 17:44:25 2024 - [info]  gtid_wait(32049408-02a5-11ef-8b62-000c29f23176:1-126) completed on 172.31.7.112(172.31.7.112:3306). Executed 0 events.
Thu Apr 25 17:44:26 2024 - [info] End of log messages from 172.31.7.112.
Thu Apr 25 17:44:26 2024 - [info] -- Slave on host 172.31.7.112(172.31.7.112:3306) started.
Thu Apr 25 17:44:26 2024 - [info] All new slave servers recovered successfully.
Thu Apr 25 17:44:26 2024 - [info] 
Thu Apr 25 17:44:26 2024 - [info] * Phase 5: New master cleanup phase..
Thu Apr 25 17:44:26 2024 - [info] 
Thu Apr 25 17:44:26 2024 - [info] Resetting slave info on the new master..
Thu Apr 25 17:44:26 2024 - [info]  172.31.7.111: Resetting slave info succeeded.
Thu Apr 25 17:44:26 2024 - [info] Master failover to 172.31.7.111(172.31.7.111:3306) completed successfully.
Thu Apr 25 17:44:26 2024 - [info]

----- Failover Report -----

app1: MySQL Master failover 172.31.7.110(172.31.7.110:3306) to 172.31.7.111(172.31.7.111:3306) succeeded

Master 172.31.7.110(172.31.7.110:3306) is down!

Check MHA Manager logs at monitor.hosts.com:/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 172.31.7.110(172.31.7.110:3306)
Selected 172.31.7.111(172.31.7.111:3306) as a new master.
172.31.7.111(172.31.7.111:3306): OK: Applying all logs succeeded.
172.31.7.111(172.31.7.111:3306): OK: Activated master IP address.
172.31.7.112(172.31.7.112:3306): OK: Slave started, replicating from 172.31.7.111(172.31.7.111:3306)
172.31.7.111(172.31.7.111:3306): Resetting slave info succeeded.
Master failover to 172.31.7.111(172.31.7.111:3306) completed successfully.
Thu Apr 25 17:44:26 2024 - [info] Sending mail..
```

- 查看vip 切换

```bash
root@db02:~# ifconfig ens33:2
ens33:2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.31.7.166  netmask 255.255.255.0  broadcast 172.31.7.255
        ether 00:0c:29:73:08:a4  txqueuelen 1000  (Ethernet)
```

- 查看主从切换情况

```sql
# DB03查看：
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.31.7.111
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 2520343
               Relay_Log_File: relay-log.000002
                Relay_Log_Pos: 420
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
```

- 查看邮件

![image-20240425180132265](https://img.sunrisenan.com/img/2024/04/25/180136052.png)

## 15 角色还原

前提条件：

- 原主库要以slave身份加入集群

```sql
# 在db01数据库上
mysql> CHANGE MASTER TO MASTER_HOST='172.31.7.111',master_port=3306, MASTER_USER='repl', MASTER_PASSWORD='repl123456', master_auto_position=1;
Query OK, 0 rows affected, 8 warnings (0.03 sec)

mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for source to send event
                  Master_Host: 172.31.7.111
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 2520343
               Relay_Log_File: relay-log.000002
                Relay_Log_Pos: 420
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Executed_Gtid_Set: 32049408-02a5-11ef-8b62-000c29f23176:1-126
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
       Master_public_key_path: 
        Get_master_public_key: 0
            Network_Namespace: 
1 row in set, 1 warning (0.00 sec)
```

**下面将新备库（原master节点）还原为主库，新主库（节点db02）还原为备库**。

在MHA manager节点执行如下命令，停止MHA监控：

```bash
masterha_stop --conf=/etc/masterha/app1.cnf
```

然后执行如下命令，进行角色还原：

```bash
masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.31.7.110 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
```

执行完成后，可以看到恢复到初始的1个master（节点db01）和两个slave（节点db02和节点db03）状态。

```bash
root@monitor:~# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=172.31.7.110 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Thu Apr 25 19:02:45 2024 - [info] MHA::MasterRotate version 0.58.
Thu Apr 25 19:02:45 2024 - [info] Starting online master switch..
Thu Apr 25 19:02:45 2024 - [info] 
Thu Apr 25 19:02:45 2024 - [info] * Phase 1: Configuration Check Phase..
Thu Apr 25 19:02:45 2024 - [info] 
Thu Apr 25 19:02:45 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Apr 25 19:02:45 2024 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Apr 25 19:02:45 2024 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Apr 25 19:02:46 2024 - [info] GTID failover mode = 1
Thu Apr 25 19:02:46 2024 - [info] Current Alive Master: 172.31.7.111(172.31.7.111:3306)
Thu Apr 25 19:02:46 2024 - [info] Alive Slaves:
Thu Apr 25 19:02:46 2024 - [info]   172.31.7.110(172.31.7.110:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 19:02:46 2024 - [info]     GTID ON
Thu Apr 25 19:02:46 2024 - [info]     Replicating from 172.31.7.111(172.31.7.111:3306)
Thu Apr 25 19:02:46 2024 - [info]   172.31.7.112(172.31.7.112:3306)  Version=8.0.36 (oldest major version between slaves) log-bin:enabled
Thu Apr 25 19:02:46 2024 - [info]     GTID ON
Thu Apr 25 19:02:46 2024 - [info]     Replicating from 172.31.7.111(172.31.7.111:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 172.31.7.111(172.31.7.111:3306)? (YES/no): yes
Thu Apr 25 19:02:52 2024 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Thu Apr 25 19:02:52 2024 - [info]  ok.
Thu Apr 25 19:02:52 2024 - [info] Checking MHA is not monitoring or doing failover..
Thu Apr 25 19:02:52 2024 - [info] Checking replication health on 172.31.7.110..
Thu Apr 25 19:02:52 2024 - [info]  ok.
Thu Apr 25 19:02:52 2024 - [info] Checking replication health on 172.31.7.112..
Thu Apr 25 19:02:52 2024 - [info]  ok.
Thu Apr 25 19:02:52 2024 - [info] 172.31.7.110 can be new master.
Thu Apr 25 19:02:52 2024 - [info] 
From:
172.31.7.111(172.31.7.111:3306) (current master)
 +--172.31.7.110(172.31.7.110:3306)
 +--172.31.7.112(172.31.7.112:3306)

To:
172.31.7.110(172.31.7.110:3306) (new master)
 +--172.31.7.112(172.31.7.112:3306)
 +--172.31.7.111(172.31.7.111:3306)

Starting master switch from 172.31.7.111(172.31.7.111:3306) to 172.31.7.110(172.31.7.110:3306)? (yes/NO): yes
Thu Apr 25 19:03:11 2024 - [info] Checking whether 172.31.7.110(172.31.7.110:3306) is ok for the new master..
Thu Apr 25 19:03:11 2024 - [info]  ok.
Thu Apr 25 19:03:11 2024 - [info] 172.31.7.111(172.31.7.111:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Thu Apr 25 19:03:11 2024 - [info] 172.31.7.111(172.31.7.111:3306): Resetting slave pointing to the dummy host.
Thu Apr 25 19:03:11 2024 - [info] ** Phase 1: Configuration Check Phase completed.
Thu Apr 25 19:03:11 2024 - [info] 
Thu Apr 25 19:03:11 2024 - [info] * Phase 2: Rejecting updates Phase..
Thu Apr 25 19:03:11 2024 - [info] 
Thu Apr 25 19:03:11 2024 - [info] Executing master ip online change script to disable write on the current master:
Thu Apr 25 19:03:11 2024 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=172.31.7.111 --orig_master_ip=172.31.7.111 --orig_master_port=3306 --orig_master_user='mha_monitor' --new_master_host=172.31.7.110 --new_master_ip=172.31.7.110 --new_master_port=3306 --new_master_user='mha_monitor' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
Thu Apr 25 19:03:12 2024 210750 Set read_only on the new master.. ok.
Thu Apr 25 19:03:12 2024 219255 Waiting all running 3 threads are disconnected.. (max 1500 milliseconds)
{'db' => undef,'Host' => 'localhost','Time' => 32494,'State' => 'Waiting on empty queue','Id' => 5,'Command' => 'Daemon','User' => 'event_scheduler','Info' => undef}
{'State' => 'Source has sent all binlog to replica; waiting for more updates','db' => undef,'Host' => '172.31.7.112:35742','Time' => 4728,'User' => 'repl','Info' => undef,'Id' => 97,'Command' => 'Binlog Dump GTID'}
{'Id' => 108,'Command' => 'Binlog Dump GTID','Info' => undef,'User' => 'repl','Host' => '172.31.7.110:38192','Time' => 58,'db' => undef,'State' => 'Source has sent all binlog to replica; waiting for more updates'}
Thu Apr 25 19:03:12 2024 721301 Waiting all running 3 threads are disconnected.. (max 1000 milliseconds)
{'User' => 'event_scheduler','Info' => undef,'Id' => 5,'Command' => 'Daemon','State' => 'Waiting on empty queue','db' => undef,'Host' => 'localhost','Time' => 32494}
{'User' => 'repl','Info' => undef,'Command' => 'Binlog Dump GTID','Id' => 97,'State' => 'Source has sent all binlog to replica; waiting for more updates','db' => undef,'Time' => 4728,'Host' => '172.31.7.112:35742'}
{'Host' => '172.31.7.110:38192','Time' => 58,'db' => undef,'State' => 'Source has sent all binlog to replica; waiting for more updates','Id' => 108,'Command' => 'Binlog Dump GTID','Info' => undef,'User' => 'repl'}
Thu Apr 25 19:03:13 2024 223617 Waiting all running 3 threads are disconnected.. (max 500 milliseconds)
{'Info' => undef,'User' => 'event_scheduler','Command' => 'Daemon','Id' => 5,'State' => 'Waiting on empty queue','Time' => 32495,'Host' => 'localhost','db' => undef}
{'db' => undef,'Time' => 4729,'Host' => '172.31.7.112:35742','State' => 'Source has sent all binlog to replica; waiting for more updates','Command' => 'Binlog Dump GTID','Id' => 97,'User' => 'repl','Info' => undef}
{'User' => 'repl','Info' => undef,'Command' => 'Binlog Dump GTID','Id' => 108,'State' => 'Source has sent all binlog to replica; waiting for more updates','db' => undef,'Time' => 59,'Host' => '172.31.7.110:38192'}
Thu Apr 25 19:03:13 2024 725993 Set read_only=1 on the orig master.. ok.
Thu Apr 25 19:03:13 2024 730496 Waiting all running 3 queries are disconnected.. (max 500 milliseconds)
{'Command' => 'Daemon','Id' => 5,'User' => 'event_scheduler','Info' => undef,'db' => undef,'Time' => 32495,'Host' => 'localhost','State' => 'Waiting on empty queue'}
{'Info' => undef,'User' => 'repl','Command' => 'Binlog Dump GTID','Id' => 97,'State' => 'Source has sent all binlog to replica; waiting for more updates','Time' => 4729,'Host' => '172.31.7.112:35742','db' => undef}
{'Command' => 'Binlog Dump GTID','Id' => 108,'Info' => undef,'User' => 'repl','Time' => 59,'Host' => '172.31.7.110:38192','db' => undef,'State' => 'Source has sent all binlog to replica; waiting for more updates'}
Thu Apr 25 19:03:14 2024 226671 Killing all application threads..
Thu Apr 25 19:03:14 2024 230799 done.
Disabling the VIP an old master: 172.31.7.111 
Thu Apr 25 19:03:15 2024 - [info]  ok.
Thu Apr 25 19:03:15 2024 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Thu Apr 25 19:03:15 2024 - [info] Executing FLUSH TABLES WITH READ LOCK..
Thu Apr 25 19:03:15 2024 - [info]  ok.
Thu Apr 25 19:03:15 2024 - [info] Orig master binlog:pos is mysql-bin.000001:2520343.
Thu Apr 25 19:03:15 2024 - [info]  Waiting to execute all relay logs on 172.31.7.110(172.31.7.110:3306)..
Thu Apr 25 19:03:15 2024 - [info]  master_pos_wait(mysql-bin.000001:2520343) completed on 172.31.7.110(172.31.7.110:3306). Executed 0 events.
Thu Apr 25 19:03:15 2024 - [info]   done.
Thu Apr 25 19:03:15 2024 - [info] Getting new master's binlog name and position..
Thu Apr 25 19:03:15 2024 - [info]  mysql-bin.000014:197
Thu Apr 25 19:03:15 2024 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.31.7.110', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Apr 25 19:03:15 2024 - [info] Executing master ip online change script to allow write on the new master:
Thu Apr 25 19:03:15 2024 - [info]   /usr/local/bin/master_ip_online_change --command=start --orig_master_host=172.31.7.111 --orig_master_ip=172.31.7.111 --orig_master_port=3306 --orig_master_user='mha_monitor' --new_master_host=172.31.7.110 --new_master_ip=172.31.7.110 --new_master_port=3306 --new_master_user='mha_monitor' --orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
Thu Apr 25 19:03:15 2024 392468 Set read_only=0 on the new master.
Enabling the VIP 172.31.7.166 on the new master: 172.31.7.110 
Thu Apr 25 19:03:17 2024 - [info]  ok.
Thu Apr 25 19:03:17 2024 - [info] 
Thu Apr 25 19:03:17 2024 - [info] * Switching slaves in parallel..
Thu Apr 25 19:03:17 2024 - [info] 
Thu Apr 25 19:03:17 2024 - [info] -- Slave switch on host 172.31.7.112(172.31.7.112:3306) started, pid: 22573
Thu Apr 25 19:03:17 2024 - [info] 
Thu Apr 25 19:03:19 2024 - [info] Log messages from 172.31.7.112 ...
Thu Apr 25 19:03:19 2024 - [info] 
Thu Apr 25 19:03:17 2024 - [info]  Waiting to execute all relay logs on 172.31.7.112(172.31.7.112:3306)..
Thu Apr 25 19:03:17 2024 - [info]  master_pos_wait(mysql-bin.000001:2520343) completed on 172.31.7.112(172.31.7.112:3306). Executed 0 events.
Thu Apr 25 19:03:17 2024 - [info]   done.
Thu Apr 25 19:03:17 2024 - [info]  Resetting slave 172.31.7.112(172.31.7.112:3306) and starting replication from the new master 172.31.7.110(172.31.7.110:3306)..
Thu Apr 25 19:03:17 2024 - [info]  Executed CHANGE MASTER.
Thu Apr 25 19:03:18 2024 - [info]  Slave started.
Thu Apr 25 19:03:19 2024 - [info] End of log messages from 172.31.7.112 ...
Thu Apr 25 19:03:19 2024 - [info] 
Thu Apr 25 19:03:19 2024 - [info] -- Slave switch on host 172.31.7.112(172.31.7.112:3306) succeeded.
Thu Apr 25 19:03:19 2024 - [info] Unlocking all tables on the orig master:
Thu Apr 25 19:03:19 2024 - [info] Executing UNLOCK TABLES..
Thu Apr 25 19:03:19 2024 - [info]  ok.
Thu Apr 25 19:03:19 2024 - [info] Starting orig master as a new slave..
Thu Apr 25 19:03:19 2024 - [info]  Resetting slave 172.31.7.111(172.31.7.111:3306) and starting replication from the new master 172.31.7.110(172.31.7.110:3306)..
Thu Apr 25 19:03:19 2024 - [info]  Executed CHANGE MASTER.
Thu Apr 25 19:03:20 2024 - [info]  Slave started.
Thu Apr 25 19:03:20 2024 - [info] All new slave servers switched successfully.
Thu Apr 25 19:03:20 2024 - [info] 
Thu Apr 25 19:03:20 2024 - [info] * Phase 5: New master cleanup phase..
Thu Apr 25 19:03:20 2024 - [info] 
Thu Apr 25 19:03:20 2024 - [info]  172.31.7.110: Resetting slave info succeeded.
Thu Apr 25 19:03:20 2024 - [info] Switching master to 172.31.7.110(172.31.7.110:3306) completed successfully.
```

## 16 结论性总结

MHA切换一次就自动退出了。

MHA位置点和GTID的流程不一样。

### 16.1 MHA 的切换流程（位置点）

⾸先，MHA Manager检测到master不可⽤，则会通过另外两个slave检查master的可⽤性。⼀共检测4次。 同时判断MHA Manager到master的ssh可⽤性。 ssh可⽤性的判断结果影响后后续切换中的“Phase 3.2: Saving Dead Master's Binlog Phase”

Phase 1: Configuration Check Phase.. 检查了MHA的配置信息，并再次判断了master的可⽤性。

Phase 2: Dead Master Shutdown Phase.. 宕机的master处理阶段，包括摘除VIP，执⾏shutdown_script中定义的脚本。

Phase 3: Master Recovery Phase..

Phase 3.1: Getting Latest Slaves Phase..

判断哪个slave拥有最新的relay log（通过⽐较show slave status中的Master_Log_File, Read_Master_Log_Pos位 置）， 哪个slave拥有最旧的relay log

Phase 3.2: Saving Dead Master's Binlog Phase..

根据上⾯得到的slave的最新位置信息，将差异的⼆进制⽇志保存到MHA Manager的指定⽬录下。 如果在第⼀步骤中，判断了MHA Manager到master的ssh不可⽤，则会跳过这个阶段。

Phase 3.3: Determining New Master Phase.. 选择新的master

Phase 3.4: New Master Diff Log Generation Phase.. 将差异的relay log和master差异⽇志scp到新的master上。

Phase 3.4: Master Log Apply Phase.

1. 通过apply_diff_relay_logs应⽤差异的relay log，及差异的binlog。 
2. 2. 应⽤完毕后，得到新的master binlog的⽂件和位置，其它slave可根据该⽂件和位置来建⽴主从复制关系。

3. 执⾏master_ip_failover脚本，执⾏如下操作
   1. 将新主的read_only设置为0 
   2. 启动vip

Phase 4: Slaves Recovery Phase Phase

Phase 4.1: Starting Parallel Slave Diff Log Generation Phase 为slave获取差异的relay log 因为172.31.7.112拥有最新的relay log，所以也没必要获取差异的relay log

Phase 4.2: Starting Parallel Slave Log Apply Phase

开始slave的恢复阶段

1. 应⽤差异⽇志。 
2. 2. 清除原来的复制关系，并再次执⾏change master命令建⽴新的主从同步。

如果有多个slave，则该恢复过程是并⾏的。

Phase 5: New master cleanup phase

1. 在新主上执⾏ reset slave all 
2. 从MHA的配置⽂件中剔除server1的配置信息

在默认情况下，如果 MHA 检测到两次切换的时间⼩于8⼩时，则不允许进⾏Failover，这个时候，需指定 -- ignore_last_failover 参数

MHA在线切换的原理

### 16.2 MHA在线切换的原理

1、检查当前的配置信息及主从服务器的信息

包括读取MHA的配置⽂件/etc/masterha/app1.cnf及检查当前slave的健康状态

2、阻⽌对当前master的更新 主要通过如下步骤： 
```bash
1.等待1.5s（$time_until_kill_threads*100ms），等待当前连接断开。 
2.执⾏ read_only=1，阻⽌新的DML操作 
3.等待0.5s，等待当前DML操作完成。 
4.kill掉所有连接。 
5.FLUSH NO_WRITE_TO_BINLOG TABLES 
6.FLUSH TABLES WITH READ LOCK 
```
3、等待新master执⾏完所有的relay log

4、将新master的read_only设置为off，并添加VIP

5、slave切换到新master上。 
```bash
1. 等待slave应⽤完原主从复制产⽣的relay log，然后执⾏change master操作切换到新master上。
2. 释放原master上加的锁。 
3. 因masterha_master_switch命令⾏中带有--orig_master_is_new_slave参数，故原master也切换为新 master的从。
````
 6、清理新master的相关信息。

主要是执⾏了reset slave all操作，清除之前的复制信息。

### 16.3 MHA在线切换需满⾜的条件 
```bash
# /usr/local/bin/masterha_master_switch --conf=/etc/masterha/app1.cnf --
master_state=alive --new_master_host=172.31.7.110 --new_master_port=3306 --
orig_master_is_new_slave --running_updates_limit=10000 --seconds_behind_master=100
```

MHA在执⾏在线切换之前，会判断当前的主从复制信息，只有满⾜了以下条件，才能执⾏切换动作：

1. 所有SLAVE的IO线程和SQL线程都在运⾏。 
2. 所有slave的Seconds_Behind_Master⼩于seconds_behind_master的值，该参数如果没有显示指定的话， 则默认为30s 
3. 在master上，通过show processlist输出，没有⼀个DML操作的时间⼤于running_updates_limit的值。

## 17 MHA常见问题

### 17.1MHA 如何适配 MySQL 8.0

- MySQL 5.6

- MySQL 8.0.36

- 出于上面不同的原因

```bash
root@monitor:~# vim /usr/share/perl5/MHA/DBHelper.pm +707
707           if ( $state =~ m/^Has read all relay log/
708             || $state =~ m/^Slave has read all relay log/ )
替换为
707           if ( $state =~ m/^Has read all relay log/ || $state =~ m/^Replica has readall relay log/
708             || $state =~ m/^Slave has read all relay log/ )
```

### 17.2 GTID 的注意事项

```bash
[server default]
client_bindir=/usr/local/mysql/bin
manager_log=/masterha/app2/manager.log
manager_workdir=/masterha/app2
master_binlog_dir=/data/mysql/3307/data
master_ip_failover_script=/usr/local/bin/master_ip_failover
master_ip_online_change_script=/usr/local/bin/master_ip_online_change
password=monitor123
ping_interval=1
remote_workdir=/tmp
repl_password=repl123456
repl_user=repl
report_script=/usr/local/bin/send_report
secondary_check_script=/usr/bin/masterha_secondary_check -s 172.31.7.111 --user=root --master_host=172.31.7.110 --master_ip=172.31.7.110 --master_port=3306
shutdown_script=""
ssh_user=root
user=mha_monitor

[server1]
hostname=172.31.7.110
port=3307

[server2]
candidate_master=1
check_repl_delay=0
hostname=172.31.7.111
port=3307

[binlog1]
no_master=1
hostname=172.31.7.110
master_binlog_dir=/data/mysql/3307/data
```

### 17.3 脑裂

- 半同步
- shutdown_script

### 17.4 定期检测集群状态的健康

```bash
# masterha_check_repl --conf=/etc/masterha/app1.cnf
```

### 17.5 定期检测主从数据⼀致性

pt-table-checksum

[技术分享 | 数据校验工具 pt-table-checksum - 知乎 (zhihu.com)](https://zhuanlan.zhihu.com/p/340411904)

### 17.6 MHA 的单点问题

使⽤ daemontools、systemd 之类的进程管理⼯具

### 17.7 跳过主从延迟检测

```bash
# 配置⽂件中设置 
check_repl_delay =0
```

### 17.8 跳过过滤规则检测

```bash
# 配置⽂件中设置 
check_repl_filter = 0
```

------------

测试：

```bash
mysql> create user sunrise@'%' identified by '123456';
Query OK, 0 rows affected (0.05 sec)

mysql> grant all privileges on *.* to 'sunrise'@'%' WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)

mysql> create database school;
Query OK, 1 row affected (0.00 sec)
```

造数据

```python
import random
import string
import pymysql
import time

# MySQL数据库配置
# db_config = {
#     "host": "172.31.7.110",
#     "port": 3306,
#     "user": "sunrise",
#     "password": "3edfud",
#     "database": "school",
# }

db_config = {
    "host": "172.31.7.166",
    "port": 3306,
    "user": "sunrise",
    "password": "123456",
    "database": "school",
}

# 生成随机字符串
def generate_random_string(length):
    letters = string.ascii_letters
    return ''.join(random.choice(letters) for i in range(length))

# 生成随机数据
def generate_random_data(record_count):
    students = [
        {
            "学号": f"NSD{random.randint(100000, 999999)}",
            "姓名": generate_random_string(5),
            "性别": random.choice(["男", "女"]),
            "手机号": "1" + ''.join(random.choice(string.digits) for _ in range(10)),  # 生成11位数字字符串
            "通信地址": generate_random_string(10),
        }
        for _ in range(record_count)
    ]
    return students

# 将随机数据插入MySQL数据库中的新表
def create_and_insert_random_data(data):
    try:
        # 连接到MySQL数据库
        conn = pymysql.connect(**db_config)
        cursor = conn.cursor()

# 为每次迭代创建一个新的表
        table_name = f"student_{int(time.time())}"
        create_table_query = f"""
            CREATE TABLE {table_name} (
                学号 VARCHAR(20),
                姓名 VARCHAR(20),
                性别 VARCHAR(10),
                手机号 VARCHAR(20),
                通信地址 VARCHAR(50)
            )
        """
        cursor.execute(create_table_query)

# 插入数据到新表中
        insert_query = f"INSERT INTO {table_name} (学号, 姓名, 性别, 手机号, 通信地址) VALUES (%s, %s, %s, %s, %s)"
        for student in data:
            values = (
                student["学号"],
                student["姓名"],
                student["性别"],
                student["手机号"],
                student["通信地址"],
            )
            cursor.execute(insert_query, values)

# 提交更改并关闭连接
        conn.commit()
        cursor.close()
        conn.close()
        print(f"成功插入 {len(data)} 条记录到表 {table_name}。")
    except pymysql.Error as error:
        print(f"错误：{error}")
        if conn and conn.open:
            cursor.close()
            conn.close()

if __name__ == "__main__":
    try:
        while True:
            # 生成随机数据（这里插入1000条记录，可以根据需要调整）
            random_data = generate_random_data(10)

# 将随机数据插入新表中
            create_and_insert_random_data(random_data)

# 等待一段时间，例如5秒
            time.sleep(5)

except KeyboardInterrupt:
        print("手动停止数据插入。")
```

### 参考

- https://www.jianshu.com/p/41f0774226df
- https://www.cnblogs.com/ivictor/p/5686275.html
- https://mp.weixin.qq.com/s/WkTk7-nDUNuYmkf8w5gW4Q