Managing MySQL/MariaDB Galera Cluster

databases
mysql
mariadb
galera

#1

Determine Number of Present Nodes

In a Galera cluster at least 2 nodes needs to be online for MySQL to be functional. To check the number of preset nodes in the cluster, on any of the server nodes, run this MySQL query:

SHOW STATUS LIKE 'wsrep_cluster_size';

You should get a result like this:

MariaDB [(none)]> SHOW STATUS LIKE 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.001 sec)

This shows that there are 3 nodes present in the cluster.

Single Node

MySQL will continue to operate as a single node if the other nodes were shutdown gracefully. If they were not due to a network issue or a crash, then the single node will not accept queries and will return the following error:

ERROR 1047 (08S01): WSREP has not yet prepared node for application use

This error typically occurs whenever a node is isolated from the others (network problems, servers down, etc…) or the other nodes were shutdown ungracefully.

Bootstrap cluster from a single node

If you determine that no other nodes are present, and want to make this a primary node, you could run this query:

SET GLOBAL wsrep_provider_options='pc.bootstrap=YES';

Once the other nodes come back online or regain network connectivity with this node, they will sync up changes from this primary.

In an event that MySQL is stopped on all servers and you need to bring the cluster back up, to identify which node is in the most advanced state, look at the /var/lib/mysql/grastate.dat file on each server and look at the seqno. If the seqno looks like something like this, it is probably the most advanced node:

# GALERA saved state
version: 2.1
uuid:    26f7bc7c-cca7-11e8-932c-06d13cbd32e4
seqno:   7
safe_to_bootstrap: 1

Start this node as the primary in the cluster by running:

galera_new_cluster

Bringing the Cluster Back Online

If all nodes have a seqno value of -1 in the /var/lib/mysql/grastate.dat file than they all were disconnected from each other due to some networking problem or all of them crashed. In this case, find the node that is most likely to have the most recent data and use it to bootstrap the cluster from. To do this:

  1. First make sure no mysql processes are running on any of the nodes

    killall -KILL mysql mysqld_safe mysqld mysql-systemd
    
  2. On the node you decided to be the new primary edit the /var/lib/mysql/grastate.dat file and set the safe_to_bootstrap value to 1 so that it’ll look like something like this:

    # GALERA saved state
    version: 2.1
    uuid:    26f7bc7c-cca7-11e8-932c-06d13cbd32e4
    seqno:   -1
    safe_to_bootstrap: 1
    

    Then run this command:

    galera_new_cluster
    
  3. Check to make sure the previous step started MySQL by running the MySQL prompt and check the number of nodes present in the cluster (should be 1 at this step) by running this MySQL query:

    SHOW STATUS LIKE 'wsrep_cluster_size';
    

    The output should look like this:

    +--------------------+-------+
    | Variable_name      | Value |
    +--------------------+-------+
    | wsrep_cluster_size | 1     |
    +--------------------+-------+
    1 row in set (0.001 sec)
    
  4. Now add the remaining nodes to the cluster, one by one. For large databases, the time it takes to sync up the data can exceed the systemd startup timeout. So for the initial re-joining to the cluster, we will not use systemd to start MySQL to allow for enough time to sync up all the data. So on the 2nd node, run this command to join the cluster:

    mysqld --user=mysql --wsrep_cluster_address=gcomm://123.123.123.123
    

    replace 123.123.123.123 with the actual IP of the node that is now the primary and running

    Depending on the size of the databases, this may take some time to run. Once this node is sync, you’ll see a message like this:

    ...
    WSREP: Synchronized with group, ready for connections
    ...
    

    Verify that the primary node now shows the cluster size of 2 by running MySQL query:

    SHOW STATUS LIKE 'wsrep_cluster_size';
    

    After the 2nd node is fully synced, stop the mysqld process by running:

    mysqladmin shutdown
    

    And then start mysql with systemd

    systemctl start mysql
    

    Make sure MySQL is running and that the cluster size is 2.

  5. One the 2nd node is up, do the same on the 3rd node. If the 3rd node is a Galera Arbitrator, then start garb service and make sure the cluster size is showing 3 nodes.