Fail-Safe Cluster for FirebirdSQL and something more
-
Upload
alexey-kovyazin -
Category
Technology
-
view
552 -
download
2
Transcript of Fail-Safe Cluster for FirebirdSQL and something more
![Page 1: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/1.jpg)
Fail-Safe Cluster for FirebirdSQLand something more
Alex Koviazin, IBSurgeonwww.ib-aid.com
![Page 2: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/2.jpg)
• Replication, Recovery and Optimization Firebird since 2002
• In Brazil since 2006 (Firebase)• Platinum Sponsor of Firebird
Foundation• Based in Moscow, Russia
www.ib-aid.com
![Page 3: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/3.jpg)
Agenda• In general - what is fail-safe cluster? • What we need to build fail-safe cluster?• Overview of Firebird versions with high-availability• Trigger-based and native replication as basis for fail-safe
cluster• How HQbird Enterprise implements fail-safe cluster• Alternatives of fail-safe cluster: Mirror/Warm Standby • Special offer for FDD visitors
![Page 4: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/4.jpg)
What is a fail-safe cluster?
User 1
User 2
User 3
User N
User N
![Page 5: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/5.jpg)
What is a fail-safe cluster internally?
User 1
User 2
User 3
User N
User N
Firebird master Firebird replica
![Page 6: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/6.jpg)
What is a fail-safe cluster?
User 1
User 2
User 3
User N
User N
Firebird master Firebird replica
Synchronization
![Page 7: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/7.jpg)
When master fails, users reconnect to new master
User 1
User 2
User 3
User N
User N
Firebird master Firebird replica
Synchronization
![Page 8: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/8.jpg)
For Fail-Safe Cluster we need
1. At Server Side• Synchronization of master and replica• Monitoring and switching mechanism
2. At Application Side• Reconnection cycle in case of disconnects • web is good example
![Page 9: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/9.jpg)
Database fail-safe cluster is NOT:
1) Automatically deployed solution• Complex setup (2-3 hours of work)
2) True scalable solution3) Multi-master replication
• Replicas are read only4) Geographically distributed solution
• Fast network connection required
![Page 10: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/10.jpg)
How Fail-Safe Cluster Work: Applications
From the application point of view1. End-user applications connect to the
Firebird database, work as usual2. If Firebird database becomes unavailable,
end-user applications try to reconnect to the same host (correctly, no error screens)
3. End-user applications should better use caching (CachedUpdates) to avoid loss of uncommitted data
![Page 11: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/11.jpg)
How Fail-Safe Cluster Work: Server
Firebird master
Monitoring
Firebird replica
Monitoring
1.All nodes monitor each other: replica watches for master
2. If master fails, replica a) make “confirmation shoot” to masterb) Changes itself as master – script + DNS
![Page 12: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/12.jpg)
After replica fails, there is no cluster anymore!
Just a single database!
![Page 13: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/13.jpg)
Minimal fail-safe cluster
Firebird master
Monitoring
Firebird replica 1
Monitoring
Firebird replica 2
Monitoring
![Page 14: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/14.jpg)
Minimal fail-safe cluster
Firebird master
MonitoringFirebird master
2
Monitoring
Firebird replica 2
Monitoring
![Page 15: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/15.jpg)
What is fail-safe cluster good for?• For web-applications
• Web caches data intensively• Tolerant to reconnects• Many small queries
• For complex database systems with administrator support
![Page 16: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/16.jpg)
HOW TO CREATE FAIL-SAFE CLUSTER IN FIREBIRD
![Page 17: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/17.jpg)
2 main things to implement fail-safe cluster in Firebird
1. Synchronization between master and replica2. Monitoring and switching to the new server
![Page 18: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/18.jpg)
Synchronization
Native• Suitable for high-load• Easy setup• Strict master-slave, no conflicts
• DDL replication• Very low delay in synchronization (seconds)
Trigger-based• <200 users• Difficult setup• Mixed mode, conflicts are possible
• No DDL replication• Delay is 1-2 minutes
![Page 19: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/19.jpg)
Comparison of Firebird versions with replication
HQbird RedDatabase Avalerion FirebirdStatus (July 16) Production Production Beta Alpha
Supported Firebird version 2.5, 30 2.5 3.0 4.0
ODS 11.2, 12.0 11.3 ? 13.0
Compatibility with Firebird 100% Backup -Restore
? 100%
Synchronous Yes Yes No Yes
Asynchronous Yes Yes Yes Yes
Price per master+replica R$1990R$1100 (FDD only!)
~US$3500 ? Free(in 2018)
http://ib-aid.com/fdd2016
![Page 20: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/20.jpg)
Native replication in HQbird
Firebird master Firebird replica
1. No Backup/restore needed2. Sync and Async options3. Easy setup (configuration only)
Sync: ~1-3 secondsAsync: 1 minute
![Page 21: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/21.jpg)
Steps to setup replicationSYNCHRONOUS
1. Stop Firebird2. Copy database to
replica(s)3. Configure replication at
master4. Configure replication at
replica5. Start Firebird at master
and replica
ASYNC
1. Stop Firebird2. Copy database to the
same computer3. Configure replication
at master4. Start Firebird5. Copy and configure
at replica(s)
![Page 22: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/22.jpg)
Configuration for replicationReplication.conf
Sync<database c:\data\test2.fdb> replica_database sysdba:masterkey@replicaserver:/data/testrepl2.fdb</database>
Async<database C:\dbwmaster.fdb> log_directory d:\dblogs log_archive_directory D:\dbarchlogs log_archive_command "copy $(logpathname) $(archpathname)"</database>
![Page 23: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/23.jpg)
Downtime to setup replication• Synchronous
• Time to copy database between servers + Time to setup• Asynchronous
• Time to copy database at the same server• Suitable for geographically distributed systems
ASYNC is much faster to setup!
![Page 24: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/24.jpg)
Monitoring and switching from master to replica
• HQbird (Windows, Linux)• Pacemaker (Linux only)• Other tools (self-developed)
![Page 25: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/25.jpg)
Elections and Impeachment• Not about presidents elections • Replicas must select new master after old master failure
Priority listMASTER1REPLICA1REPLICA2REPLICA3REPLICA4
![Page 26: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/26.jpg)
Algorithm1. We have priority list: master, replica1, replica 22. Each node watches for all other nodes3. If master node fails, the first available replica
takes over the 1. Disables original master – stop Firebird server2. Changes DNS of replica to master 3. Changes other servers configurations4. Restart Firebird server at new master to changes take
effect5. Send notifications
![Page 27: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/27.jpg)
After failure – re-initialization• Downtime needed!• Stop all nodes • Copy database to the new/reinitialized node• Reconfigure replication configuration• Start Firebird at all nodes
Too complex?
![Page 28: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/28.jpg)
Fail-Safe Cluster is a custom solution• HQbird gives options, you decide how to implement them
• Requires careful planning and implementation• Requires high-quality infrastructure
• Requires 2+ replicas • Network outage in 10 seconds will cause all replicas to
degrade!• High-speed network and high speed drives
![Page 29: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/29.jpg)
Do we really need cluster and high availability?!
![Page 30: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/30.jpg)
Why we REALLY need high-availability
• If database size is big (20-100Gb)• Automatic recovery with FirstAID is very slow (downtime > 8 hours) or impossible (>100Gb)
• Manual recovery is expensive (US$1000+) and slow (1 day+)
• Big database = Big company• Losses are not tolerated• Downtime is very expensive ($$$)
![Page 31: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/31.jpg)
Mirror (warm-standby) – easy to setup, reliable mirror (one or more)
Firebird master
Monitoring
Firebird replica
Monitoring
Replica segment 1
Replica segment 2
Replica segment N
Based on async replication
![Page 32: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/32.jpg)
Can be geographically distributed (FTP, Amazon, cloud backup)
Firebird master
Monitoring
Firebird replica
Monitoring
Replica segment 1
Replica segment 2
Replica segment N
![Page 33: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/33.jpg)
LicensesHQbird Enterprise licenses needed for FAIL-SAFE CLUSTER
Master+replica + N*replicas*0.5
For minimal cluster = 1 master and 2 replicas:1 HQbird Enterprise + 0.5*HQbird Enterprise
MIRROR (Warm-Standby) = Master+replica1 HQbird Enterprise
![Page 34: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/34.jpg)
Thank you!
![Page 35: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/35.jpg)
Solution Changes are buffered per transaction, transferred in
batches, synchronized at commit Follows the master priority of locking Replication errors can either interrupt operations or just
detach replica Replica is available for read-only queries (with caveats) Instant takeover (Heartbeat, Pacemaker)
• Issues Additional CPU and I/O load on the slave side Replica cannot be recreated online
Synchronous replication
![Page 36: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/36.jpg)
Txn 1 Txn 2 Txn N
Buffer 1
Txn 1 Txn 2 Txn N
Master DB
Replica DB
Buffer 2 Buffer N
Synchronous replication
![Page 37: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/37.jpg)
Solution Changes are synchronously journalled on the master side Journal better be placed on a separate disk Journal consists of multiple segments (files) that are rotated Filled segments are transfered to the slave and applied to
the replica in background Replica can be recreated online
Issues Replica always lags behind under load Takeover is not immediate Read-only queries work with historical data
Asynchronous replication
![Page 38: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/38.jpg)
Txn 1 Txn 2 Txn N
Buffer 1
Master DB
Buffer 2 Buffer N
Journal
Archive
Asynchronous replication
![Page 39: Fail-Safe Cluster for FirebirdSQL and something more](https://reader035.fdocuments.net/reader035/viewer/2022070512/589cde4b1a28abf86d8b49e1/html5/thumbnails/39.jpg)
Txn 1 Txn 2 Txn N
Replica DB
Archive
Asynchronous replication