Asterisk High Availability

21
Asterisk / Linux Contingency Asterisk / Linux Contingency What will you do when things go wrong? What will you do when things go wrong?

description

Presentation on how to tackle redundancy on Asterisk Servers

Transcript of Asterisk High Availability

Page 1: Asterisk High Availability

Asterisk / Linux ContingencyAsterisk / Linux ContingencyWhat will you do when things go wrong?What will you do when things go wrong?

Page 2: Asterisk High Availability

Options for disaster recoveryOptions for disaster recovery

►►Highly redundant server systemHighly redundant server system

►►Disk level backupsDisk level backups

►►High Availability ClusterHigh Availability Cluster

Each has its ups and downs:Each has its ups and downs:►► A redundant server system with RAID and redundant power = $$$$, A redundant server system with RAID and redundant power = $$$$, but still present a single point of failure (SPOF).but still present a single point of failure (SPOF).

►► Backups are mandatory for any enterprise class system, but stillBackups are mandatory for any enterprise class system, but still dondon’’t t guarantee uptime.guarantee uptime.

►► High Availability across 2 or more low cost servers is much moreHigh Availability across 2 or more low cost servers is much more cost cost effective and lowers SPOF significantly.effective and lowers SPOF significantly.

Page 3: Asterisk High Availability

HA AlternativesHA Alternatives

►► RendundantRendundant server systems should have at least RAID1 server systems should have at least RAID1 disk mirroring and dual redundant power supplies disk mirroring and dual redundant power supplies –– such such systems start in the $2500 range.systems start in the $2500 range.

►► Disk level backups Disk level backups –– simple file backups are ok, but simple file backups are ok, but require that you first install OS and patches before require that you first install OS and patches before restoring files. An imaging server for a complete disk restoring files. An imaging server for a complete disk image is preferred. image is preferred. MondoArchiveMondoArchive is the only disk level is the only disk level imaging system that does not require a dedicated imaging system that does not require a dedicated imaging server and can run without shutting down the imaging server and can run without shutting down the PBX (PBX (mondomondo is hardware specific).is hardware specific).

Page 4: Asterisk High Availability

Backup DesignBackup Design

Optimally backups should be stored offsite and/or on multiple meOptimally backups should be stored offsite and/or on multiple media to dia to

avoid location related disasteravoid location related disaster

Page 5: Asterisk High Availability

On to HAOn to HA

►►High availability has 2 major components:High availability has 2 major components:

�� The heartbeat system The heartbeat system (notifies servers of outage)(notifies servers of outage)

�� Data synchronization systemData synchronization system (syncs data between 2 servers)(syncs data between 2 servers)

►► This presentation will use Open Source Linux HA This presentation will use Open Source Linux HA ((http://www.linuxhttp://www.linux--ha.org/ha.org/) and ) and

DRBDDRBD (Distributed Replicated Block Device) (Distributed Replicated Block Device) –– DRBD is like RAID1 mirroring, except 1 DRBD is like RAID1 mirroring, except 1

hard drive is in one server, and the other is across the networkhard drive is in one server, and the other is across the network in another server.in another server.

►► HA starts and stops services in the event of a failure, or manuaHA starts and stops services in the event of a failure, or manual shutdownl shutdown

►► DRBD mirrors all data across the network in real time, in this mDRBD mirrors all data across the network in real time, in this model we assume that only odel we assume that only

one copy of this mirror will be live at any given time, in the eone copy of this mirror will be live at any given time, in the even of a failure the other ven of a failure the other

copy comes live.copy comes live.

►► Simple Simple rsync+cronrsync+cron could be used instead of DRBD, but is not as fast or efficient.could be used instead of DRBD, but is not as fast or efficient.

Page 6: Asterisk High Availability

DRBD HA DiagramHA can be setup in several different manners, this document uses the following due to its simplicity and effectiveness:

Page 7: Asterisk High Availability

Recommended HA physical layoutRecommended HA physical layout

►► The previous slide describes the physical layout quite well.The previous slide describes the physical layout quite well.

►► A 2 node cluster is best for simple failover needsA 2 node cluster is best for simple failover needs

►► Each node should be connected on a separate subnet Each node should be connected on a separate subnet using gigabit using gigabit nicsnics with a crossover cable (no switch) for with a crossover cable (no switch) for heartbeat and data syncheartbeat and data sync

►► Each node should have its own dedicated UPS and if Each node should have its own dedicated UPS and if possible its own dedicated circuit breakerpossible its own dedicated circuit breaker

►► It would be even better if each node where in 2 separate It would be even better if each node where in 2 separate rooms, or buildings even rooms, or buildings even –– it is best to maintain them both it is best to maintain them both on the same local LAN, but could be done over a WAN if on the same local LAN, but could be done over a WAN if speeds permit.speeds permit.

Page 8: Asterisk High Availability

What Can HA Clustering Do For YouWhat Can HA Clustering Do For You

Page 9: Asterisk High Availability

HA DetailsHA Details

►► HA sends heartbeat signals back and forth HA sends heartbeat signals back and forth

between 2 (or more) serversbetween 2 (or more) servers

►► If a failure occurs you can detect it in milliseconds If a failure occurs you can detect it in milliseconds

and the standby machine can take over in 5 and the standby machine can take over in 5

seconds or less (as long as your network can seconds or less (as long as your network can

provide the communication speed)provide the communication speed)

►► Each server has its own IP address plus a floating Each server has its own IP address plus a floating

IP that is controlled by the HA service, the floating IP that is controlled by the HA service, the floating

IP will be used only on the current live system.IP will be used only on the current live system.

Page 10: Asterisk High Availability

HA InstallationHA Installation

►► It is recommended that install all the HA+DRBD packages then It is recommended that install all the HA+DRBD packages then use the sample use the sample configconfig files here: files here: http://http://www.astusers.orgwww.astusers.org/ha/ha

►► On On CentOSCentOS ((RedhatRedhat) ) based based distrosdistros ((ieie trixboxtrixbox) ) you can use yum to install you can use yum to install the HA package:the HA package:�� yum install yum install --y heartbeaty heartbeat

Page 11: Asterisk High Availability

DRBD DRBD ConfigConfig

►► Unfortunately DRBD is not quite as simple to install Unfortunately DRBD is not quite as simple to install –– due to package availability*, due to package availability*, and partitioning.and partitioning.

►► You will need to either You will need to either �� a: repartition your drive with an area for the DRBD partition, aa: repartition your drive with an area for the DRBD partition, as shown on pages 3s shown on pages 3--6 in the appendix**6 in the appendix**

�� b: install a 2b: install a 2ndnd drive dedicated to the DRBD partition (easiest)drive dedicated to the DRBD partition (easiest)

►► If your If your linuxlinux distrodistro maintains up to date packages*, you can use yum to maintains up to date packages*, you can use yum to install, unfortunately this is usually not the case.install, unfortunately this is usually not the case.

►► 3 Components of DRBD install:3 Components of DRBD install:�� DRBD binary (uses /etc/DRBD binary (uses /etc/drbd.confdrbd.conf))

�� DRBD Kernel Module (version specific to your kernel)DRBD Kernel Module (version specific to your kernel)

�� DRBD Links (builds links from your file system to the relocated DRBD Links (builds links from your file system to the relocated files on the DRBD files on the DRBD disk)disk)

The following works on The following works on CentOSCentOS 5.1 with kernel 2.6.185.1 with kernel 2.6.18--53.1.4.el5:53.1.4.el5:

yum install yum install drbddrbdrpm rpm --ihvihv http://ubuntu.nad.go.id/repo/apthttp://ubuntu.nad.go.id/repo/apt--centos5/bleeding/drbdcentos5/bleeding/drbd--kmdlkmdl--2.6.182.6.18--53.1.4.el553.1.4.el5--8.2.58.2.5--

21.el5.i686.rpm21.el5.i686.rpm

rpm rpm --ihvihv ftp://ftp.tummy.com/pub/tummy/drbdlinks/drbdlinksftp://ftp.tummy.com/pub/tummy/drbdlinks/drbdlinks--1.111.11--1.noarch.rpm1.noarch.rpm►► Recommendation Recommendation –– install your OS/install your OS/DistroDistro, then install a 2, then install a 2ndnd harddriveharddrive for DRBDfor DRBD

►► Recommended Recommended –– download my ha/download my ha/drbddrbd configconfig files when you get all the components installed: files when you get all the components installed: http://www.astusers.org/hahttp://www.astusers.org/ha

►► *Find a complete package or compile from source from http://*Find a complete package or compile from source from http://oss.linbit.comoss.linbit.com//

Page 12: Asterisk High Availability

DRBD DRBD ConfigConfig ContinuedContinued

►► #On both #On both nodes(serversnodes(servers):):

drbdadmdrbdadm createcreate--mdmd shareshare

►► #share is the name of your resource in /etc/#share is the name of your resource in /etc/drbd.confdrbd.conf

►► #now on the primary node:#now on the primary node:

drbdadmdrbdadm ---- ----overwriteoverwrite--datadata--ofof--peer primary allpeer primary all

►► # this may take a LONG time to run (in the background)# this may take a LONG time to run (in the background)

►► # check progress by typing: watch cat /proc/# check progress by typing: watch cat /proc/drbddrbd

►► #finally on primary node: format the drbd0 partition with a fil#finally on primary node: format the drbd0 partition with a file system:e system:

mkfsmkfs --t ext3 /dev/drbd0t ext3 /dev/drbd0

►► #now go to secondary node and type to sync up to the primary no#now go to secondary node and type to sync up to the primary node:de:

drbdadmdrbdadm attach shareattach share

►► # this may take a LONG time to run (in the background)# this may take a LONG time to run (in the background)

cat /proc/cat /proc/drbddrbd #should tell you "#should tell you "ds:UpToDate/UpToDateds:UpToDate/UpToDate""

►► #on primary mount the new file system under the new "share" fol#on primary mount the new file system under the new "share" folder to test:der to test:

mkdirmkdir /share/share

mount /dev/drbd0 /sharemount /dev/drbd0 /share

Page 13: Asterisk High Availability

Final Heartbeat Final Heartbeat configconfig

►► On both servers do the following:On both servers do the following:

►► Stop all services that need failover and set to manualStop all services that need failover and set to manual

►► Set Heartbeat service to automaticSet Heartbeat service to automatic

►► Use tar to copy all service specific Use tar to copy all service specific configconfig files to the DRBD partition files to the DRBD partition ––this only need be done on the current master server this only need be done on the current master server

►► Add said files and folders to /etc/Add said files and folders to /etc/drbdlinks.confdrbdlinks.conf to automatically build to automatically build links to the DRBD partitionlinks to the DRBD partition

►► Remove Remove amportalamportal from /etc/from /etc/rc.localrc.local, and build a new , and build a new amportalamportal script script that is HA compliantthat is HA compliant

►► Edit /etc/Edit /etc/ha.d/ha.cfha.d/ha.cf , , haresourcesharesources , and , and authkeysauthkeys, as well as , as well as /etc//etc/drbd.confdrbd.conf, to meet your needs, to meet your needs

►► Finally edit /etc/Finally edit /etc/my.cnfmy.cnf::datadirdatadir=/share/=/share/var/lib/mysqlvar/lib/mysql

socket=/share/socket=/share/var/lib/mysql/mysql.sockvar/lib/mysql/mysql.sock

Page 14: Asterisk High Availability

CaviotsCaviots of HAof HA

►► You want to avoid having 2 primary nodes You want to avoid having 2 primary nodes –– if both nodes are still up if both nodes are still up but fail to see each other they would then both become but fail to see each other they would then both become primarysprimarys, and , and data will become out of sync (known as data will become out of sync (known as ““Split BrainSplit Brain””))�� Use Use ipfailipfail and and pingd/ping_grouppingd/ping_group in your in your ha.cfha.cf to minimize this possibilityto minimize this possibility

►► If you mess something up (delete a file, change a user) If you mess something up (delete a file, change a user) –– the mistake the mistake will instantly be synchronized to both servers will instantly be synchronized to both servers –– regular backups with regular backups with offsite or removable media should still be used.offsite or removable media should still be used.

►► Normal RAID1 in each server is still recommended to increase uptNormal RAID1 in each server is still recommended to increase uptime, ime, but not required but not required -- a failed disc will disconnect all calls if failover occurs.a failed disc will disconnect all calls if failover occurs.

►► Supports auto failover of SIP, T1/PRI trunks (using Supports auto failover of SIP, T1/PRI trunks (using RedfoneRedfone TDMoETDMoEhardware) or analog lines wired in parallel.hardware) or analog lines wired in parallel.

►► So far IAX wont use a floating IP, it must use the real IP of thSo far IAX wont use a floating IP, it must use the real IP of the primary e primary server (havenserver (haven’’t really investigated this yet).t really investigated this yet).

Page 15: Asterisk High Availability

Resolving Failover ProblemsResolving Failover Problems►► Resolving "unclean" failovers in which your data becomes out of Resolving "unclean" failovers in which your data becomes out of sync between 2 primary nodes (aka sync between 2 primary nodes (aka

"Split Brain"):"Split Brain"):

►► #Check both nodes to see that they are both running #Check both nodes to see that they are both running StandAloneStandAlone status, run:status, run:►► cat /proc/cat /proc/drbddrbd

►► #Stop Heartbeat services on both nodes:#Stop Heartbeat services on both nodes:►► service heartbeat stopservice heartbeat stop

►► #One of the nodes must discard its data, and allow the other to #One of the nodes must discard its data, and allow the other to overwrite it, on node to discard run:overwrite it, on node to discard run:

►► drbdadmdrbdadm secondary share secondary share ►► #(assuming "share" is your DRBD resource name)#(assuming "share" is your DRBD resource name)

►► drbdadmdrbdadm ---- ----discarddiscard--mymy--data connect sharedata connect share

►► #On the other node (the split brain survivor #On the other node (the split brain survivor -- aka data you wish to keep), aka data you wish to keep),

►► #if its connection state is also #if its connection state is also StandAloneStandAlone, you would run: , you would run: ►► drbdadmdrbdadm connect shareconnect share

►► #(assuming "share" is your DRBD resource name)#(assuming "share" is your DRBD resource name)

►► #Allow the 2 file systems to sync up, check by:#Allow the 2 file systems to sync up, check by:►► watch cat /proc/watch cat /proc/drbddrbd

►► #Then it is recommended to reboot the 2 systems and let HA servi#Then it is recommended to reboot the 2 systems and let HA services once again manage the cluster ces once again manage the cluster and file systems, if HA seems to be having further problems, resand file systems, if HA seems to be having further problems, restart the HA services and run:tart the HA services and run:

►► tail tail ––f /f /varvar/log/ha/log/ha--loglog

Page 16: Asterisk High Availability

HA Advanced HA Advanced configconfig (HA v2)(HA v2)

►► This document does not have the capacity to cover HA v2, This document does not have the capacity to cover HA v2, and uses HAv1 which is far simpler.and uses HAv1 which is far simpler.

►► HAv1: uses 2 simple files (/etc/HAv1: uses 2 simple files (/etc/ha.d/ha.cfha.d/ha.cf + + haresourcesharesources) ) to configure and manage the clusterto configure and manage the cluster

►► HAv2: Uses a complex xml HAv2: Uses a complex xml configconfig database that offers database that offers many advanced optionsmany advanced options-- primarily/most importantly primarily/most importantly resource monitoring, rather than simple server heartbeat resource monitoring, rather than simple server heartbeat monitoring: /monitoring: /var/lib/heartbeat/crm/cib.xmlvar/lib/heartbeat/crm/cib.xml�� If your service (If your service (ieie asterisk) provides proper asterisk) provides proper ““statusstatus”” information, information, HA can monitor that status and do so on several different servicHA can monitor that status and do so on several different services, es, however if the service cannot provide failure notifications throhowever if the service cannot provide failure notifications through ugh monitoring/status queries, you must custom build such capacity monitoring/status queries, you must custom build such capacity using scripting (OCF Resource Agent)using scripting (OCF Resource Agent)

Page 17: Asterisk High Availability

Some Linux-HA Terminology

Page 18: Asterisk High Availability

Key Linux-HA Processes (r2)

Page 19: Asterisk High Availability

Linux-HA Architecture (r2)

Page 20: Asterisk High Availability

Appendix & NotesAppendix & Notes►► Good references:Good references:►► http://www.trixbox.org/forums/trixboxhttp://www.trixbox.org/forums/trixbox--forums/openforums/open--discussion/hadiscussion/ha--clustercluster <<decent guide<<decent guide

►► http://www.danielaliaman.com/blog/files/phonecube/cluster/Asterihttp://www.danielaliaman.com/blog/files/phonecube/cluster/AsteriskCluster.pdfskCluster.pdf <<notes on partitions**<<notes on partitions**

►► http://www.linuxhttp://www.linux--ha.org/DRBDha.org/DRBD

►► http://www.linuxhttp://www.linux--ha.org/DRBD/HowTov2ha.org/DRBD/HowTov2

►► http://www.drbd.org/usershttp://www.drbd.org/users--guide/guide/►► http://wiki.centos.org/HowTos/Hahttp://wiki.centos.org/HowTos/Ha--DrbdDrbd

►► http://www.voiphttp://www.voip--info.org/wiki/view/Asterisk+High+Availability+Solutionsinfo.org/wiki/view/Asterisk+High+Availability+Solutions►► http://www.tummy.com/Community/software/drbdlinks/http://www.tummy.com/Community/software/drbdlinks/

►► ftp://ftp.tummy.com/pub/tummy/drbdlinks/ftp://ftp.tummy.com/pub/tummy/drbdlinks/

►► http://www.linuxhttp://www.linux--ha.org/_cache/HeartbeatTutorials__LCA2007ha.org/_cache/HeartbeatTutorials__LCA2007--tutorial.pdftutorial.pdf <<VERY detailed<<VERY detailed

►► http://www.tournament.org.il/run/index.php?/categories/10http://www.tournament.org.il/run/index.php?/categories/10--ClustersClusters►► http://ubuntu.nad.go.id/repo/apthttp://ubuntu.nad.go.id/repo/apt--centos5/bleeding/?C=M;O=Acentos5/bleeding/?C=M;O=A CentOSCentOS 5.1 kernel modules5.1 kernel modules

►► http://www.siliconvp.us/modules.php?name=News&file=article&sid=9http://www.siliconvp.us/modules.php?name=News&file=article&sid=9►► http://www.bisente.com/blog/2007/08/26/asteriskhttp://www.bisente.com/blog/2007/08/26/asterisk--clustercluster--fonebridge2/?lan=englishfonebridge2/?lan=english

►► http://www.redhttp://www.red--fone.com/assets/documents/Trixbox_FB2_Heartbeat_Tutorial.pdffone.com/assets/documents/Trixbox_FB2_Heartbeat_Tutorial.pdf trixboxtrixbox + + RedfoneRedfone►► http://support.redhttp://support.red--fone.com/downloads/tools/HA_Whitepaper.pdffone.com/downloads/tools/HA_Whitepaper.pdf

Page 21: Asterisk High Availability

CreditsCredits►► Major credit goes to Alan Robertson for material taken from his Major credit goes to Alan Robertson for material taken from his extensiveextensive HA guideHA guide

►► Asterisk, Asterisk, DigiumDigium and the Asterisk logo are registered trademarks of and the Asterisk logo are registered trademarks of DigiumDigium CorporationCorporation

►► DRBD is a registered trademark of DRBD is a registered trademark of LinbitLinbit

►► Linux is a registered trademark of Linux is a registered trademark of LinusLinus TorvaldsTorvalds

►► RedfoneRedfone and and FonebridgeFonebridge are registered trademarks of are registered trademarks of RedfoneRedfoneCommunications LLC.Communications LLC.

►► trixboxtrixbox is a registered trademark of is a registered trademark of FonalityFonality Inc.Inc.

►► All other trademarks are property of their respective owners. All other trademarks are property of their respective owners.

►► Finally Finally –– me: This presentation was organized by John Hyde, this me: This presentation was organized by John Hyde, this document is copyright Simple Technologies under document is copyright Simple Technologies under GPLv3GPLv3 –– if you if you modify it please contribute your modifications back.modify it please contribute your modifications back.

►► Please check back at Please check back at www.astusers.org/hawww.astusers.org/ha for future additions to for future additions to this document.this document.