OpenNebulaConf 2014 - Puppet and OpenNebula - David Lutterkort
OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo...
-
Upload
opennebula-project -
Category
Technology
-
view
140 -
download
0
Transcript of OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo...
![Page 1: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/1.jpg)
Disaster recovery with OpenNebulaCarlo Daffara
![Page 2: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/2.jpg)
First, let me get some coffee.
![Page 3: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/3.jpg)
![Page 4: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/4.jpg)
![Page 5: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/5.jpg)
![Page 6: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/6.jpg)
“Disaster recovery (DR) involves a set of policies and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Disaster recovery focuses on the IT or technology systems supporting critical business functions, as opposed to business continuity, which involves keeping all essential aspects of a business functioning despite significant disruptive events. Disaster recovery is therefore a subset of business continuity.”
![Page 7: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/7.jpg)
80% of businesses affected by a major incident either never re-open or close within 18 months (Source: Axa)
![Page 8: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/8.jpg)
From “Understanding the Cost of Data Center Downtime: An Analysis of the Financial Impact on Infrastructure Vulnerability”, Ponemon Research
![Page 9: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/9.jpg)
“Let’s begin with one very interesting fact. According to a survey completed in 2010, human error is responsible for 40% of all data loss, as compared to just 29% for hardware or system failures. An earlier IBM study determined data loss due to human error was as high as 80%” (From: Business continuity and disaster recovery planning for IT professionals”, Elsevier press, 2014)
![Page 10: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/10.jpg)
![Page 11: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/11.jpg)
![Page 12: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/12.jpg)
![Page 13: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/13.jpg)
The recovery time objective (RTO) is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.
The recovery point objective (RPO), is the maximum tolerable period in which data might be lost from an IT service due to a major incident.
![Page 14: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/14.jpg)
“Alternative storage-based replication solutions cost a minimum of $10,000 per terabyte of data covered plus ongoing maintenance. For the composite organization’s 225 protected VMs with an average size of 100 gigabytes (GB), the three year costs for licenses and maintenance are estimated at $328,500” (Forrester research, “The Total Economic Impact of VMware vCenter Site Recovery Manager”, 2013)
![Page 15: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/15.jpg)
3 simple rules to make a working DR:
![Page 16: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/16.jpg)
Rule 1: never put all eggs in one basket (be it hardware, software, cloud)
![Page 17: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/17.jpg)
![Page 18: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/18.jpg)
Customer buys full DR and snapshot capability from local data center; data center updates SAN firmware and loses everything. Customer discovers that snapshots and backups were kept in the same SAN with everything else.
![Page 19: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/19.jpg)
![Page 20: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/20.jpg)
In electronics, an opto-isolator, also called an optocoupler, photocoupler, or optical isolator, is a component that transfers electrical signals between two isolated circuits by using light. Opto-isolators prevent high voltages from affecting the system receiving the signal.
![Page 21: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/21.jpg)
![Page 22: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/22.jpg)
Rule 2: RTO and RPO are usually different from VM to VM
![Page 23: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/23.jpg)
![Page 24: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/24.jpg)
![Page 25: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/25.jpg)
Needs to be replicated constantly
No one cares if this dies
![Page 26: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/26.jpg)
![Page 27: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/27.jpg)
![Page 28: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/28.jpg)
Rule 3: design a reliable oracle
![Page 29: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/29.jpg)
![Page 30: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/30.jpg)
![Page 31: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/31.jpg)
Oracle of Delphi
![Page 32: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/32.jpg)
How the others do it:
![Page 33: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/33.jpg)
![Page 34: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/34.jpg)
![Page 35: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/35.jpg)
How we do it:
![Page 36: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/36.jpg)
![Page 37: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/37.jpg)
Our approach takes advantage of three individual factors:● LizardFS’ thinly-provisioned snapshots● online replication of chunks & tiering● OpenNebula’s datastores
![Page 38: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/38.jpg)
![Page 39: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/39.jpg)
![Page 40: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/40.jpg)
# An example of configuration of goals. It contains the default values.
1 1 : _2 2 : _ _3 3 : _ _ _4 4 : _ _ _ _5 5 : _ _ _ _ _
# (...)
20 20 : _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
# But you don't have to specify all of them -- defaults will be assumed.
# You can define your own custom goals using labels if you use them, e.g.:# 14 min_two_locations: _ locationA locationB # one copy in A, one in B, third anywhere# 15 fast_access : ssd _ _ # one copy on ssd, two additional on any drives# 16 two_manufacturers: WD HT # one on WD disk, one on HT disk
![Page 41: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/41.jpg)
● Most disasters are “local”, for example a fire in the server room or a flood
● Two different DR sites, one near (eg. next building/other side of the building) and one far (external datacenter)
● near DR receives a copy of the chunks that are part of the marked datastores
![Page 42: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/42.jpg)
![Page 43: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/43.jpg)
● Remote snapshots are handled in the same way: we take a full snapshot of the datastore, and differentially replicate it
● We use the “snapshot of snapshot” approach to avoid the cost of deduplication
● This way we can prioritize sync queues, and in the receiving end we got a complete and decoupled + working OpenNebula
For example, average dedup cost for ZFS: 5 to 30 GB of dedup table data for every TB of pool data, assuming an average block size of 64K.
![Page 44: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/44.jpg)
/var/lib/one/datastore↓
DRSNAP12H
/var/lib/one/snapshots↓
<yyyymmddhh>↓
DRSNAP12H
LocalVM changes only in
snapshots
/var/lib/one/datastore↓
DRSNAP12H
/var/lib/one/snapshots↓
<yyyymmddhh>↓
DRSNAP12H
Remoteno chunk changes
in snapshots
inplace rsync
(25x speedup)
![Page 45: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/45.jpg)
![Page 46: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/46.jpg)
virsh# domblkstat instance-0012 --device vda
vda rd_req 128vda rd_bytes 2344448vda wr_req 234vda wr_bytes 618496vda flush_operations 2vda rd_total_times 106512819vda wr_total_times 960359872vda flush_total_times 1741727
![Page 47: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/47.jpg)
Our “pilot light” approach: a running OpenNebula on two nodes, with its own LizardFS store. Running only two VMs: the Oracle and the TesterThe Oracle checks if DR is needed, and may need a human confirmation for execution of the DR failover. If confirmation is given, it takes the latest valid snapshotted datastore, softlinks it and import the VMs (through snapshots, so it’s instantaneous)The Tester makes a snapshot of the current stable snapshot, import the VMs and runs them into a separate, non-routed vnet, then executes a test to see if everything works (workload dependent), then deletes the intermediate snapshots
![Page 48: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/48.jpg)
Only critical VMs are executed this way, if RTO<30 minsFor the VMs with higher RTO, buy one week of hardware on demand, auto-install a node with Puppet or Ansible, and make it join the OpenNebula cloud
Deployed usually in 30 mins. Other vendor guarantee <15 minutes.
![Page 49: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/49.jpg)
![Page 50: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/50.jpg)
![Page 51: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/51.jpg)
Ideal for harsh indoor environments that require protection from falling dirt or liquid, dust, light splashing, oil or coolant seepage. Its NEMA Zone 4 rating also makes it perfect for facilities located in earthquake-prone seismic zones or any environment prone to extreme vibration such as factories, power stations, construction areas, shipping facilities, warehouses, processing plants, railroads, airports and military installations.
![Page 52: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/52.jpg)
![Page 53: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/53.jpg)
![Page 54: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/54.jpg)
● Have a “big red button” to stop DR if needed. Sometimes you are already fighting fire here, and you know it’s better not to move everything in flight.
● Have two people that are competent as DR firefighters, and give them a second phone with a rechargeable card. And make sure both don’t go on vacation together. (Hint: don’t choose two married people)
![Page 55: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/55.jpg)
● Use a gateway machine to provide a consistent internal IP scheme, and two different configurations for the gateway router to provide unmodified routing for the remaining VMs
● Aggregate functionality in a single VM (for example, one that manages logs) to optimize writes
![Page 56: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/56.jpg)
● I favor consistency, so I tend to avoid application-level replication, unless it’s native to the app (eg. NoSQL). Otherwise you have different solutions for different machines (eg. quorum group in MS replication with same UUID…)
● Try to reduce write amplification for databases, especially MySQL. Eg. TokuDB and its fractal tree
![Page 57: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/57.jpg)
![Page 58: OpenNebulaConf 2014 - OpenNebula and MooseFS for disaster recovery_real clouds in real life - Carlo Daffara](https://reader033.fdocuments.net/reader033/viewer/2022052304/55a42c641a28ab645c8b46c1/html5/thumbnails/58.jpg)
Thank you!
Carlo Daffara@cdaffara
linkedin.com/in/cdaffara