High Availability - Brett Thurber - ManageIQ Design Summit 2016
-
Upload
manageiq -
Category
Technology
-
view
385 -
download
1
Transcript of High Availability - Brett Thurber - ManageIQ Design Summit 2016
![Page 1: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/1.jpg)
High AvailabilityManageIQ/CloudForms
Brett Thurber - Red HatJune 2016
![Page 2: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/2.jpg)
AgendaIntroduction & Acknowledgements
What is HA?
Traditional HA
What’s on the horizon?
pglogical
BDR
Containers & Kubernetes
Summary
Q & A
![Page 3: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/3.jpg)
Introduction & Acknowledgements
Brett Thurber - RHCT, RHCE, RHCDS, RHCA, RHCVA
20+ years of IT experience
Been with Red Hat since 2011
Team lead in Systems Engineering focused on management and integrated solutions
Worked with MIQ/CloudForms since 2013
Authored 11 Reference Architectures
Presented at RH Summit 2015 - Application portability & interoperability with Red Hat Cloud Infrastructure
Contact: [email protected]
Special thanks to:
Gregg Tanzillo, Nick Carboni, Joe Rafaniello
![Page 4: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/4.jpg)
What is HA?“A system or component that is continuously operational for a desirably long
length of time. Availability can be measured relative to "100% operational" or "never failing."” - Source: SearchDataCenter
“A characteristic of a system, which aims to ensure an agreed level of operational performance for a higher than normal period.” - Source: Wikipedia
![Page 5: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/5.jpg)
Traditional HA
![Page 6: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/6.jpg)
Heavy Lift
Highly complex and resource intensive
Shared storage
iSCSI, NFS, fibre channel
Multiple number of bare metal or VM hosts
Minimum of 2 cluster hosts for pgsql database
2+ MIQ/CFME instances
Haproxy to load balance
Complex and time intensive deployment
Typical deployment time measured in days
Stretch cluster risks
Expensive, dedicated high speed connection
Supportability
Data consistency
![Page 7: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/7.jpg)
Active/Passive Deployment Pattern: intra-site
MIQ/CFME
haproxy
VIP
MIQ/CFME
pgsql pgsql pacemaker
VIP
![Page 8: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/8.jpg)
Active/Passive Deployment Pattern: inter-site
MIQ/CFME
haproxy
VIP
MIQ/CFME
pgsql pgsql
Streaming Replication
Site 1 Site 2
![Page 9: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/9.jpg)
What’s on the horizon?
![Page 10: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/10.jpg)
Interesting possibilities...Emerging technologies present the possibility of reducing the complexity of HA
and postgresql.
pglogical
BDR
Containers & Kubernetes
![Page 11: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/11.jpg)
pglogical
![Page 12: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/12.jpg)
pglogical
What is pglogical?
pglogical offers Logical Replication as a PostgreSQL extension and is a replacement for streaming replication
Introduced in postgresql 9.4 (MIQ Capablanca, CloudForms 4.1)
Less complex solution for database replication
pglogical works on a per-database level, not whole server level like physical streaming replication
One Provider may feed multiple Subscribers without incurring additional disk write overhead
One Subscriber can merge changes from several origins and detect conflict between changes with automatic and configurable conflict resolution
Replication across major releases is supported (9.4 and >)
![Page 13: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/13.jpg)
How would it work?
pgsql pgsql pgsql pgsql
VMDB Database
MIQ/CFME MIQ/CFME
haproxy
VIP
SubscribersPublisher
![Page 14: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/14.jpg)
What about failover?
pgsql pgsql pgsql pgsql
VMDB Database
MIQ/CFME MIQ/CFME
haproxy
VIP
SubscribersPublisher
??? ??? ???
![Page 15: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/15.jpg)
pglogical limitations...Not suitable for failover
Automatic DDL (data definition language) replication is not supported
Logical decoding doesn't decode catalog changes directly. So the plugin can't just send a CREATE TABLE statement when a new table is added.
If the data being decoded is being applied to another PostgreSQL database then its table definitions must be kept in sync via some means external to the logical decoding plugin itself, such as:
Event triggers using DDL deparse to capture DDL changes as they happen and write them to a table to be replicated and applied on the other end
Doing DDL management via tools that synchronise DDL on all nodes
![Page 16: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/16.jpg)
Bi-Directional Replication
![Page 17: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/17.jpg)
BDRWhat is BDR?
Bi-Directional Replication (BDR) is an asynchronous multi-master replication system for PostgreSQL, specifically designed to allow geographically distributed clusters. Supporting up to 48 nodes (and possibly more in future releases). BDR is a low overhead, low maintenance technology for distributed databases.
BDR excels in environments where users are distributed across high-latency and/or unreliable network links where conventional tightly-coupled clustering software does not work well
Support for DDL replication and Global DDL locking
![Page 18: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/18.jpg)
Active/Active BDR Deployment Pattern: intra-site
MIQ/CFME
haproxy
VIP
MIQ/CFME
pgsql pgsqlBDR
![Page 19: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/19.jpg)
Active/Active BDR Deployment Pattern: inter-site
MIQ/CFME
haproxy
VIP
MIQ/CFME
pgsql pgsqlBDR
Site 1 Site 2
![Page 20: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/20.jpg)
BDR limitations...Still under development; not production ready (requires modified version of 9.4)
Asynchronous replication
Changes made on one BDR node are not replicated to other nodes before they are committed locally. As a result the data is not exactly the same on all nodes at any given time
Non-shared storage architecture means additional storage space considerations
![Page 21: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/21.jpg)
Containers & Kubernetes
![Page 22: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/22.jpg)
Containers?Docker image for ManageIQ under development
Currently monolithic
Allows for a MIQ container image to be deployed to Atomic Host and other container providers
Service decoupling on the horizon
Utilizing kubernetes pods, allows for:
Service distribution across multiple hosts
Persistent storage to be used for database
Highly available and scalable architecture
Easily upgradeable with quick roll-back capabilities
Self-healing
![Page 23: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/23.jpg)
Possible Container Architecture
Container
Pod
httprails
pgsql
Persistent Storage
Container
Pod
httprails
pgsql
Persistent StorageBDR
Node Proxy
![Page 24: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/24.jpg)
Possible Container Architecture (con’t)
Container
Pod
httprails
pgsql
Persistent Storage
Container
Pod
httprails
pgsql
Persistent StorageBDR
NodeProxy
NodeProxy
Overlay Network
![Page 25: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/25.jpg)
What about networking?Kubernetes imposes the following network rules:
All containers can communicate with all other containers without NATAll nodes can communicate with all containers (and vice-versa) without NATThe IP that a container sees itself as is the same IP that others see it as
Supported overlay networks
L2 networks and linux bridging
Flannel
OpenVSwitch
Romana
OpenShift SDN
etc...
![Page 26: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/26.jpg)
Summary
![Page 27: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/27.jpg)
In closing….
Traditional HA clustering is complex, expensive, time consuming to implement
and poses some support limitations
pglogical is a good replacement for streaming replication however lacks some
needed features to make it a viable HA solution
BDR bridges the necessary gaps with pglogical to offer a viable HA solution
however is still growing in maturity (> postgresql 9.4)
Containers, coupled with Kubernetes, offer compelling use cases to include self-
healing, upgrades, scaling and high availability
![Page 28: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/28.jpg)
Q & A
![Page 29: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/29.jpg)
Thank You!
![Page 30: High Availability - Brett Thurber - ManageIQ Design Summit 2016](https://reader033.fdocuments.net/reader033/viewer/2022052405/587097e41a28ab412b8b6d75/html5/thumbnails/30.jpg)
ReferencesCloudForms 3.x HA Reference Architecture
Streaming Replication
pglogical FAQ
pglogical vs. streaming replication (logical vs. physical)
BDR Project
BDR Overview
BDR Requirements
MIQ Container Image
Kubernetes
Kubernetes networking
Kubernetes architecture