Replication copies data from master to slave(s). Master logs all changes to binary log. Slave reads and applies changes. Provides read scaling, backups, and failover.
-- Master configuration (my.cnf)
[mysqld]
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
-- Create replication user on master
CREATE USER 'repl'@'slave-ip' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'slave-ip';
-- Get master status
SHOW MASTER STATUS;
-- File: mysql-bin.000001, Position: 154
-- Slave configuration (my.cnf)
[mysqld]
server-id = 2
relay-log = mysql-relay-bin
-- Configure slave replication
CHANGE MASTER TO
MASTER_HOST='master-ip',
MASTER_USER='repl',
MASTER_PASSWORD='password',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=154;
-- Start slave
START SLAVE;
SHOW SLAVE STATUS;Why it matters: Horizontal scaling, data redundancy, disaster recovery.
Real applications: Load balancing reads, backup slaves, geographic redundancy.
Common mistakes: Same server-id on multiple servers, not securing replication user.
ROW: Log actual row changes, safe, larger logs. STATEMENT: Log SQL, smaller, can be unsafe. MIXED: Use statement usually, row for unsafe cases.
-- Configure binlog format
SET GLOBAL binlog_format = 'ROW'; -- Recommended for replication
-- ROW format (safest)
binlog_format = ROW -- Logs actual data changes
-- Advantages: Safe with functions, triggers, UDFs
-- Disadvantages: Larger binary log size
-- STATEMENT format (space-efficient)
binlog_format = STATEMENT -- Logs SQL statements
-- Advantages: Smaller log size
-- Disadvantages: Unsafe with RAND(), NOW(), non-deterministic functions
-- MIXED format (compromise)
binlog_format = MIXED -- Default to STATEMENT, use ROW when needed
-- Balanced between size and safety
-- Check current format
SHOW GLOBAL VARIABLES LIKE 'binlog_format';
-- Check which statements use ROW (when MIXED)
SELECT * FROM mysql.general_log WHERE command_type = 'Query' AND argument LIKE '%UPDATE%';
-- Disadvantages of STATEMENT
-- Non-deterministic: SELECT * FROM table LIMIT 1; -- Different order on slaves
-- Functions: UPDATE t SET ts = NOW(); -- Different time on each slaveWhy it matters: Data consistency on replicas.
Real applications: High volume replication, complex queries.
Common mistakes: Using STATEMENT with unsafe functions, large ROW logs.
Replication lag is delay between master and slave writes. Caused by slow queries, large transactions, network delays. Monitor with SHOW SLAVE STATUS.
-- Check replication lag
SHOW SLAVE STATUS\G
-- Seconds_Behind_Master: Lag in seconds
-- Relay_Master_Log_File: Which master binlog being processed
-- Exec_Master_Log_Pos: Position in master log
-- Example output
-- Seconds_Behind_Master: 5 -- 5 seconds behind
-- Monitor lag with queries
SELECT TIMESTAMPDIFF(SECOND, ts, NOW()) as replication_lag_seconds
FROM replication_heartbeat LIMIT 1; -- Your heartbeat table
-- High lag causes
-- 1. Slow queries on slave
-- 2. Large transactions on master
-- 3. Single-threaded replication (MySQL < 5.7)
-- 4. Network issues
-- Enable parallel replication (MySQL 5.7+)
SET GLOBAL slave_parallel_workers = 4;
SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK';
-- Check lag in scripts
#!/bin/bash
mysql -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print $2}'Why it matters: Data freshness on slaves, failover readiness.
Real applications: Monitoring dashboards, alerting systems.
Common mistakes: Ignoring lag, not monitoring during high load.
Common issues: Duplicate key errors, replication stops, data inconsistency, lag growth. Use binlog position tracking, checksums, and monitoring.
-- Issue 1: Duplicate key error on slave
ERROR 1062: Duplicate entry
-- Solution: Skip error
STOP SLAVE;
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
START SLAVE;
-- Issue 2: Replication stopped at error
SHOW SLAVE STATUS\G
-- Last_Error: error message
-- Fix: Fix data issue then sync
STOP SLAVE;
-- Fix data inconsistency manually
[Fix the data]
START SLAVE;
-- Issue 3: Verify data consistency
-- Use pt-table-checksum (Percona)
pt-table-checksum --replicate=percona.checksums h=master-ip
-- Issue 4: Large lag after restart
-- Use backup restore + incremental binlog
mysqldump --master-data > backup.sql
mysql < backup.sql
-- Issue 5: Master-slave data divergence
-- Use Percona XtraBackup for consistent snapshot
xtrabackup --backup --target-dir=/backup
[Copy to slave]
xtrabackup --prepare --target-dir=/backupWhy it matters: Maintain data consistency across replicas.
Real applications: Production troubleshooting, failover procedures.
Common mistakes: Forcing replication through errors, not fixing root cause.
Master-master replication enables writes on both servers. Useful for geographic distribution. Challenges: Conflict resolution, infinite loops, coordination.
-- Master A configuration
server-id = 1
log_bin = mysql-bin
binlog_format = ROW
auto_increment_increment = 2
auto_increment_offset = 1
-- Master B configuration
server-id = 2
log_bin = mysql-bin
binlog_format = ROW
auto_increment_increment = 2
auto_increment_offset = 2
-- Both are masters and slaves of each other
-- Master A slave setup
CHANGE MASTER TO MASTER_HOST='master-b-ip', MASTER_USER='repl', ...;
-- Master B slave setup
CHANGE MASTER TO MASTER_HOST='master-a-ip', MASTER_USER='repl', ...;
-- AUTO_INCREMENT handling
-- Master A uses IDs: 1, 3, 5, 7, ...
-- Master B uses IDs: 2, 4, 6, 8, ...
-- Prevents duplicate key conflicts
-- Challenges
-- 1. Write conflicts if same row modified on both
-- 2. Application logic must handle eventual consistency
-- 3. Failover is complex
-- Monitoring both directions
SHOW SLAVE STATUS\G -- Check replication to other master
SHOW MASTER STATUS; -- Check outgoing replicationWhy it matters: Active-active replication, geographic redundancy.
Real applications: Multi-region deployments, symmetric failover.
Common mistakes: Same auto_increment offsets, writing same data on both.
PXC provides synchronous replication with quorum. Group Replication (MySQL 5.7+) multi-master clustering. Both ensure data consistency at cost of write latency.
-- Percona XtraDB Cluster
-- Multi-master, synchronous replication, quorum-based
-- Configuration
pxc_strict_mode = ENFORCED
-- Join cluster
systemctl start mysql
-- Check cluster status
SHOW STATUS WHERE variable_name LIKE 'wsrep%';
-- wsrep_connected: ON (connected to cluster)
-- wsrep_cluster_size: 3 (cluster members)
-- wsrep_local_state_uuid: Cluster ID
-- MySQL Group Replication
-- Initialize group replication plugin
INSTALL PLUGIN group_replication SONAME 'group_replication.so';
-- Configure host
SET GLOBAL group_replication_group_name="aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa";
SET GLOBAL group_replication_start_on_boot=ON;
-- Bootstrap primary
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
-- Join secondary
START GROUP_REPLICATION;
-- Monitor group
SELECT * FROM performance_schema.replication_group_members;Why it matters: Synchronous consistency, easier failover.
Real applications: Critical systems requiring strong consistency.
Common mistakes: Network partitions, quorum loss, latency impact.
Backup strategies: Backup from slave (non-intrusive), use Percona tools, point-in-time recovery with binlogs, binlog server for central backup.
-- Backup from slave (non-blocking)
STOP SLAVE;
mysqldump --all-databases --single-transaction > backup.sql;
START SLAVE; -- Lag accumulates during backup
-- Better: Use slave parallel replication, backup compressed
STOP SLAVE;
mysqldump --all-databases --single-transaction --compress | gzip > backup.sql.gz;
START SLAVE;
-- Percona XtraBackup (faster, non-blocking)
xtrabackup --backup --target-dir=/backup --parallel=4
-- Slave continues running, xtrabackup reads consistent snapshot
-- Binlog server (central backup point)
-- Slave collects all binlogs from master
-- Separate server for backup operations
[Master] --> [Slave with binlog_server] --> [Backup jobs]
-- Point-in-time recovery
SHOW BINARY LOGS; -- List all binary logs
mysqlbinlog mysql-bin.000001 mysql-bin.000002 | mysql -- Restore to specific point
-- Setup binlog retention
SET GLOBAL binlog_expire_logs_seconds = 604800; -- 7 days
-- Backup strategy example
-- Day 1: Full backup
mysqldump --all-databases > /backup/full_backup_day1.sql
-- Day 2-6: Incremental (save binlogs)
-- Just copy binlog files: mysql-bin.000002, mysql-bin.000003, ...
-- Recovery
mysql < /backup/full_backup_day1.sql
mysqlbinlog /backup/mysql-bin.000002 /backup/mysql-bin.000003 | mysqlWhy it matters: Fast backups without blocking production.
Real applications: Disaster recovery, compliance, point-in-time restore.
Common mistakes: Blocking backups during peak hours, not testing recovery.
Best practices: Monitoring lag, testing failover, backup strategy, multiple slaves, replication filters carefully, dedicated slave users.
-- ✅ Good: Monitor replication with alerts
SELECT @@hostname, TIMESTAMPDIFF(SECOND, NOW(),
DATE_ADD(NOW(), INTERVAL (SELECT TIMESTAMPDIFF(SECOND, NOW(), NOW())) SECOND)) as lag_seconds
FROM INFORMATION_SCHEMA.PROCESSLIST WHERE ID = 1;
-- ✅ Good: Multiple slaves for different purposes
-- Slave 1: Read scaling
-- Slave 2: Backups
-- Slave 3: Analytics (allow 1-2 min lag)
-- ✅ Good: Test failover regularly
-- Run failover simulation in dev/staging
-- Document runbook
-- Train team
-- ✅ Good: Use dedicated replication user
CREATE USER 'repl'@'%.domain.com' IDENTIFIED BY 'strong_password';
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl'@'%.domain.com';
-- ❌ Bad: Replication filters (can cause data loss)
-- replicate-ignore-db = temp -- Risky!
-- Breaks slave if master has data in ignored db
-- ✉ Good: Set read-only on slaves
SET GLOBAL read_only = ON;
SET GLOBAL super_read_only = ON; -- Even super users can't write
-- Monitor replication with queries
SELECT * FROM performance_schema.replication_connection_status;
SELECT * FROM performance_schema.replication_applier_status;Why it matters: Production reliability and data safety.
Real applications: High availability setup, disaster recovery.
Common mistakes: Ignoring lag, not testing failover, poor monitoring.