ClickHouse and Friends (11) MySQL Real-Time Replication with GTID Mode
Last updated: 2020-09-03
MySQL Real-Time Replication Principles
A few days ago, ClickHouse officially released v20.8.1.4447-testing. This version includes the MaterializeMySQL engine, implementing ClickHouse’s capability to replicate MySQL data in real-time. Interested friends can experience it through the official installation package. Installation method reference: https://clickhouse.tech/#quick-start. Note that you need to select the testing branch.
Position-Based Synchronization
MaterializeMySQL in version v20.8.1.4447-testing uses binlog position-based synchronization mode.
After consuming a batch of binlog events, it records the event position information to the .metadata file:
1 | Version: 1 |
This way, when ClickHouse restarts, it sends the {‘mysql-bin.000002’, 328} tuple through the protocol to the MySQL Server, and MySQL sends data starting from this position:
1 | s1> ClickHouse sends {'mysql-bin.000002', 328} position info to MySQL |
Looks good, but there’s a problem:
If MySQL Server is a cluster (e.g., 1 master 2 replicas) serving through a VIP, and MaterializeMySQL’s host points to this VIP.
When cluster master-replica failover occurs, the {binlog-name, binlog-position} tuple is actually inaccurate because master-replica binlog within the cluster may not be completely consistent (binlog can be reset).
1 | s1> ClickHouse sends {'mysql-bin.000002', 328} to new master MySQL in cluster |
To solve this problem, we developed GTID synchronization mode, deprecating the unsafe position-based synchronization mode. It has been merged upstream #PR13820 and will be available in the next testing version.
If you’re in a hurry, you can compile it yourself or download and install from ClickHouse Build Check for master-20.9.1.
GTID-Based Synchronization
GTID is an enhanced version of MySQL replication, supported since MySQL 5.6, and is now the mainstream MySQL replication mode.
It assigns a globally unique ID and sequence number to each event. We don’t need to worry about MySQL cluster master-replica topology structure, just tell MySQL this GTID. The .metadata becomes:
1 | Version: 2 |
f4aee41e-e36f-11ea-8b37-0242ac110002 is the UUID of the host that generated the event, 1-5 is the synchronized event range.
The flow becomes:
1 | s1> ClickHouse sends GTID:f4aee41e-e36f-11ea-8b37-0242ac110002:1-5 to MySQL |
Enabling GTID in MySQL
So, how to enable GTID on the MySQL side? Just add these two parameters:
--gtid-mode=ON --enforce-gtid-consistency
For example, start a MySQL docker with GTID enabled:
1 | docker run -d -e MYSQL_ROOT_PASSWORD=123 mysql:5.7 mysqld --datadir=/var/lib/mysql --server-id=1 --log-bin=/var/lib/mysql/mysql-bin.log --gtid-mode=ON --enforce-gtid-consistency |
Important Notes
After enabling GTID replication mode, metadata Version will change to 2, meaning older versions will directly error on startup and databases need to be recreated.
Summary
The MaterializeMySQL engine is still continuously iterating. We have a preliminary plan for it:
Stability Assurance
This requires more testing and more user feedbackIndex Optimization
OLTP indexes are generally not designed for OLAP. Currently index conversion still depends on MySQL table structure, needs to be more intelligentObservability
Convenient viewing of current synchronization information on the ClickHouse side, similar to MySQLshow slave statusData Consistency Verification
Need to provide ways to verify MySQL and ClickHouse data consistency
MaterializeMySQL is already a community feature, but there’s still considerable work to do. We look forward to more forces joining us. Our journey goes beyond the stars and seas.