WSREP has not yet prepared node for application use

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 高可用系列,价值2615元额度,1个月
简介:

最近PXC 5.7出现脑裂,前端Navicate连接到MySQL时,提示WSREP has not yet prepared node for application use错误。脑裂可是个不容易忽视的问题啊,严重的导致数据错乱。你懂的,就好比酒驾,迷糊啊,都乱了。下文是本次故障的具体描述。

一、环境说明

当前的集群采用了2个数据节点+1个仲裁节点,两节点均有写入数据
重启了仲裁节点后,数据节点发生脑裂
操作系统CentOS 7
数据库版本Percona XtraDB Cluster 5.7.20-18-57-log

故障现象
Navicate客户端可以连接,但是查询数据报错:WSREP has not yet prepared node for application use
2节点状态查看都显示为non-Primary

mysql> show status like '%wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 6                                    |
| wsrep_cluster_size      | 3                                    |
| wsrep_cluster_state_uuid | cd96b06a-0a1d-11e8-99d2-837e6f3b95a9 |
| wsrep_cluster_status    | non-Primary                              |
+--------------------------+--------------------------------------+

二、故障分析

节点1日志

2018-05-29T11:36:56.595913+08:00 0 [Note] WSREP: (45125952, 'tcp://0.0.0.0:4567') connection to peer 17b576f8 with addr tcp://192.168.1.253:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-05-29T11:36:56.602911+08:00 0 [Note] WSREP: (45125952, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.253:4567 
2018-05-29T11:37:05.600073+08:00 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,029e6b40,17) ##此处已提示为非主节点
memb {
    45125952,0
    }
joined {
    }
left {
    }
partitioned {
    029e6b40,0
    17b576f8,0
    }
)
2018-05-29T11:37:05.631952+08:00 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,45125952,18)
memb {
    45125952,0
    }
joined {
    }
left {
    }
partitioned {
    029e6b40,0
    17b576f8,0
    }
)
2018-05-29T11:37:05.634073+08:00 0 [Note] WSREP: declaring 029e6b40 at tcp://192.168.1.249:4567 stable
2018-05-29T11:37:05.634674+08:00 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,029e6b40,19)
memb {
    029e6b40,0
    45125952,0
    }
joined {
    }
left {
    }
partitioned {
    17b576f8,0
    }
)
2018-05-29T11:37:05.663943+08:00 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2018-05-29T11:37:05.670701+08:00 151042 [Warning] WSREP: Send action {(nil), 563, TORDERED} returned -107 (Transport endpoint is not connected)
2018-05-29T11:37:05.670763+08:00 151042 [Note] WSREP: --------- CONFLICT DETECTED --------  ###检查到冲突
2018-05-29T11:37:05.670778+08:00 151042 [Note] WSREP: cluster conflict due to certification failure for threads:

节点2日志

2018-05-28T23:59:07.757523+08:00 0 [Note] WSREP: Member 2.0 (pxc-cluster-node-1) synced with group.
2018-05-29T09:47:36.679991+08:00 6095593 [Note] Aborted connection 6095593 to db: 'DB71' user: 'user71' host: '192.168.1.253' (Got an error reading communication packets)
2018-05-29T09:47:36.680062+08:00 6095607 [Note] Aborted connection 6095607 to db: 'DB71' user: 'user71' host: '192.168.1.253' (Got an error reading communication packets)
            .............................................#Author : Leshami Blog : http://blog.csdn.net/leshami
2018-05-29T11:34:11.981171+08:00 6104858 [Note] Aborted connection 6104858 to db: 'DB71' user: 'user71' host: '192.168.1.253' (Got an error reading communication packets)
2018-05-29T11:36:02.203516+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') connection to peer 17b576f8 with addr tcp://192.168.1.253:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-05-29T11:36:02.203708+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.253:4567 
2018-05-29T11:36:03.203906+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') reconnecting to 17b576f8 (tcp://192.168.1.253:4567), attempt 0
2018-05-29T11:36:03.704413+08:00 0 [Note] WSREP: declaring node with index 1 suspected, timeout PT5S (evs.suspect_timeout)
2018-05-29T11:36:03.704482+08:00 0 [Note] WSREP: evs::proto(029e6b40, OPERATIONAL, view_id(REG,029e6b40,17)) suspecting node: 17b576f8
2018-05-29T11:36:03.704507+08:00 0 [Note] WSREP: evs::proto(029e6b40, OPERATIONAL, view_id(REG,029e6b40,17)) suspected node without join message, declaring inactive
2018-05-29T11:36:04.204630+08:00 0 [Note] WSREP: declaring node with index 1 inactive (evs.inactive_timeout) 
2018-05-29T11:36:06.204943+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.1.253:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-05-29T11:36:09.206069+08:00 0 [Note] WSREP: declaring node with index 2 suspected, timeout PT5S (evs.suspect_timeout)
2018-05-29T11:36:09.206168+08:00 0 [Note] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) suspecting node: 45125952
2018-05-29T11:36:09.706352+08:00 0 [Note] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) suspecting node: 45125952
2018-05-29T11:36:10.206318+08:00 0 [Note] WSREP: (029e6b40, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.1.253:4567 timed out, no messages seen in PT3S (gmcast.peer_timeout)
2018-05-29T11:36:10.206514+08:00 0 [Note] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) suspecting node: 45125952
2018-05-29T11:36:10.706763+08:00 0 [Note] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) suspecting node: 45125952
2018-05-29T11:36:11.204659+08:00 0 [Warning] WSREP: evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)) install timer expired
evs::proto(evs::proto(029e6b40, GATHER, view_id(REG,029e6b40,17)), GATHER) {
current_view=Current view of cluster as seen by this node
view (view_id(REG,029e6b40,17)
memb {
    029e6b40,0
    17b576f8,0
    45125952,0
    }
joined {
    }
left {
    }
partitioned {
    }
),
input_map=evs::input_map: {aru_seq=5278704,safe_seq=5278704,node_index=node: {idx=0,range=[5278710,5278709],safe_seq=5278704} node: {idx=1,range=[5278705,5278704],safe_seq=5278704} node: {idx=2,range=[5278710,5278709],safe_seq=5278704} },
fifo_seq=10557624,
last_sent=5278709,
known:

三、故障解决

对于当前的脑裂,由于在测试环境,因此直接停止其中一个节点(1.248),然后将剩余的节点强制提升为主节点
首先停止仲裁节点,然后停止1.249节点
如下在1.249节点上做如下操作

mysql> SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';
Query OK, 0 rows affected (0.00 sec)

--再次查看此时wsrep_cluster_status为Primary
mysql> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 4                                    |
| wsrep_cluster_size      | 1                                    |
| wsrep_cluster_state_uuid | cd96b06a-0a1d-11e8-99d2-837e6f3b95a9 |
| wsrep_cluster_status    | Primary                              |
+--------------------------+--------------------------------------+
4 rows in set (0.01 sec)

在第一个节点248上启动
[root@node248 ~]# systemctl start mysql

--集群状态查看,此时也已正常
mysql> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 5                                    |
| wsrep_cluster_size      | 2                                    |
| wsrep_cluster_state_uuid | cd96b06a-0a1d-11e8-99d2-837e6f3b95a9 |
| wsrep_cluster_status    | Primary                              |
+--------------------------+--------------------------------------+

在第三个节点启动仲裁节点
[root@ydq4 ~]# /etc/init.d/garb start
正在启动 /usr/bin/garbd:                                  [确定]

mysql> show status like 'wsrep_cluster%';
+--------------------------+--------------------------------------+
| Variable_name            | Value                                |
+--------------------------+--------------------------------------+
| wsrep_cluster_conf_id    | 6                                    |
| wsrep_cluster_size      | 3                                    |
| wsrep_cluster_state_uuid | cd96b06a-0a1d-11e8-99d2-837e6f3b95a9 |
| wsrep_cluster_status    | Primary                              |
+--------------------------+--------------------------------------+

四、后记

1、此次脑裂时应用程序端已出现节点无法连接,并且有日志记录到mysql error log
2、在数据节点已出现异常时,冒然重启仲裁节点不可取
3、建议使用3个数据节点最佳
4、多节点数据写入,最好按模块将写划分到不同节点
5、节点修复参考:http://galeracluster.com/documentation-webpages/twonode.html

上传日志:

相关实践学习
如何快速连接云数据库RDS MySQL
本场景介绍如何通过阿里云数据管理服务DMS快速连接云数据库RDS MySQL,然后进行数据表的CRUD操作。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助     相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
7月前
Electron——node_modules\ffi-napi\build\Release\ffi_bindings.node is not a valid Win32 application.
Electron——node_modules\ffi-napi\build\Release\ffi_bindings.node is not a valid Win32 application.
107 0
|
数据库连接
(node:2612) DeprecationWarning: collection.ensureIndex is deprecated. Use createIndexes instead.
(node:2612) DeprecationWarning: collection.ensureIndex is deprecated. Use createIndexes instead.
When to use next() and return next() in Node.js
Some people always write return next() is to ensure that the execution stops after triggering the callback.
1031 1
|
JavaScript
use apt update nodejs on Ubuntu
Use n module from npm in order to upgrade node sudo npm cache clean -f sudo npm install -g n sudo n stable sudo ln -sf /usr/local/n/version...
979 0
|
27天前
|
弹性计算 JavaScript 前端开发
一键安装!阿里云新功能部署Nodejs环境到ECS竟然如此简单!
Node.js 是一种高效的 JavaScript 运行环境,基于 Chrome V8 引擎,支持在服务器端运行 JavaScript 代码。本文介绍如何在阿里云上一键部署 Node.js 环境,无需繁琐配置,轻松上手。前提条件包括 ECS 实例运行中且操作系统为 CentOS、Ubuntu 等。功能特点为一键安装和稳定性好,支持常用 LTS 版本。安装步骤简单:登录阿里云控制台,选择扩展程序管理页面,安装 Node.js 扩展,选择实例和版本,等待创建完成并验证安装成功。通过阿里云的公共扩展,初学者和经验丰富的开发者都能快速进入开发状态,开启高效开发之旅。
|
10天前
|
JavaScript 前端开发 数据可视化
【01】Cocos游戏开发引擎从0开发一款游戏-cocos环境搭建以及配置-Cocos Creator软件系统下载安装-node环境-优雅草卓伊凡
【01】Cocos游戏开发引擎从0开发一款游戏-cocos环境搭建以及配置-Cocos Creator软件系统下载安装-node环境-优雅草卓伊凡
22 2
【01】Cocos游戏开发引擎从0开发一款游戏-cocos环境搭建以及配置-Cocos Creator软件系统下载安装-node环境-优雅草卓伊凡
|
1月前
|
弹性计算 JavaScript 前端开发
一键安装!阿里云新功能部署Nodejs环境到ECS竟然如此简单!
一键安装!阿里云新功能部署Nodejs环境到ECS竟然如此简单!
一键安装!阿里云新功能部署Nodejs环境到ECS竟然如此简单!

热门文章

最新文章

AI助理

你好,我是AI助理

可以解答问题、推荐解决方案等