OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden,...

Post on 16-Apr-2017

190 views 3 download

Transcript of OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden,...

Vtastic: Innovations In Distributed Systems Testing

Jack Wadden, Sr. Engineering ManagerAkamai Technologies, Inc.

©2015 AKAMAI | FASTER FORWARDTM

AKAMAI CDN OVERVIEW

• We Make the Internet Fast, Reliable and Secure

• Globally-Distributed Network of Servers• Caching Content Close to End Users• Scalable Live Media Streaming• Protocol Optimizations

• DNS-Based Load Balancing System • Chooses the Best Server to Handle Your Requests

©2015 AKAMAI | FASTER FORWARDTM

MASSIVE SCALE• 15-30% of All Internet Traffic• 3+ Trillion Hits/day (2 x 1012)• 30+ Tbps

• 215,000+ Servers• Located in 120+ Countries

• 1000+ Software Components

• 100+ of Server Roles

©2015 AKAMAI | FASTER FORWARDTM

SYSTEM TESTING AT AKAMAI

©2015 AKAMAI | FASTER FORWARDTM

TESTNETS: AKAMAI’S SYSTEM TEST ENVIRONMENT

©2015 AKAMAI | FASTER FORWARDTM

HOWEVER, AT AKAMAI TESTNETS ARE A SCARCE RESOURCE

©2015 AKAMAI | FASTER FORWARDTM

THEY ARE EXPENSIVE TO BUILD

©2015 AKAMAI | FASTER FORWARDTM

AND REQUIRE A HUGE TEAM TO MAINTAIN

©2015 AKAMAI | FASTER FORWARDTM

SHARING LEADS TO DISRUPTIONS

©2015 AKAMAI | FASTER FORWARDTM

SOMETIMES THE FIT IS POOR

©2015 AKAMAI | FASTER FORWARDTM

CONFLICTING USES NEED TO BE COORDINATED

©2015 AKAMAI | FASTER FORWARDTM

AND RESULT IN INEVITABLE DELAYS

©2015 AKAMAI | FASTER FORWARDTM

FEATURES OF A BETTER TESTNETLow barrier to access Eliminate coordination

No-block debugging Automation

Portable, restorable configuration Efficient maintenance

Permit destructive testing Optimal platform utilization

CONTINUOUS, AUTOMATED,

END-TO-END TESTINGFOR ALL ENGINEERS

ON EVERY COMPONENT ACROSS AKAMAI

The Vision:

©2015 AKAMAI | FASTER FORWARDTM

TESTNET CLONINGTest Harness

VTASTIC ResourceTracker

OpenNebulaMaster Storage

TestnetClones

©2015 AKAMAI | FASTER FORWARDTM

VTASTIC MASTER TESTNET

• Supported by SME teams

• Running Production Versions

• Vtastic Team Coordinates Changes

• Custom Clones can be Saved, Shared

Master Master MasterCandidate

Snapshot

Clone

Old Master

©2015 AKAMAI | FASTER FORWARDTM

CLONES USE PRIVATE IP SPACE

100.80.0.8 (MDT)

100.80.0.15 (KDC)

100.80.0.21 (UMP)

GWSH, SOCKS

172.26.238.16 (NAT Exit)

100.80.0.1(NAT Gateway)

IP (Anything)

VLAN #83

©2015 AKAMAI | FASTER FORWARDTM

NAT TUNNELING TOOLS

• vpoint: Testnet-Attached bash Shell• LD_PRELOAD for Transparent SOCKS Tunneling (dante-client)• Proprietary SSH-proxy client

• chrome-vpoint, firefox-vpoint• Dedicated browser session with SOCKS configuration

©2015 AKAMAI | FASTER FORWARDTM

DESIGN APPROACH

• Centrally-Managed Infrastructure• Resources Granted to Users/Groups

• Distributed Storage & Compute Platform

• Commodity Hardware

• Open Source Technology• Virtualization: Qemu/KVM• Storage: GlusterFS• Orchestration: OpenNebula!!• Vtastic VRT: Python, Django, Apache

©2015 AKAMAI | FASTER FORWARDTM

SPECS, SCALE

• 40 VM Hosts• 32 Cores• 128 GB RAM• 2 x 10 Gbps Ethernet• Average 35 VMs per Host

• 40-50+ Testnets• 30-120 Nodes per Testnet• 1500-2000+ Total VMs

• 40 Storage Nodes• 8 Cores• 32 GB RAM• 10 Gbps Ethernet• 6 x 384 GB SSD + RAID0 = 2.1 TB• Total Usable Space = 42 TB

• Master Testnet• 120 Nodes• ~1.5 TB (After virt-sparsify)

©2015 AKAMAI | FASTER FORWARDTM

1.0: GLUSTER & FUSE

• Backing Files and Scratch Images on Remote Storage• Qemu Uses POSIX Path (/glusterclient/foo)

• Problems:• Memory Leaks, Hangs in GlusterFS FUSE Mount• Occasional Loss of VMs• Performance Concerns

©2015 AKAMAI | FASTER FORWARDTM

1.1: GLUSTER DIRECT

• Qemu uses libgfapi (gluster://SERVER:PORT/foo)

• Backing Files and Scratch Images on Remote Storage

• FUSE Mount Used for Image Management

• Problems:• Frequent, Catastrophic Loss of VMs• Occasional FUSE Mount Problems (Image Management)

©2015 AKAMAI | FASTER FORWARDTM

1.2: FUSE + LOCAL SCRATCH

• Qemu Uses POSIX Path (/glusterclient/foo) for Backing Image

• FUSE Mount Used for Image Management

• Scratch Images Stored on Local Disk

• Problems:• Increased Snapshot Time• No Live Migration• Occasional FUSE Mount Problems (Image Management)• Lack of Trust (VM Loss Experienced before Re-creating Gluster Volume)

©2015 AKAMAI | FASTER FORWARDTM

IN DEVELOPMENT: CEPH

• Static and Scratch Images on Remote Storage• Live Migration Possible• Holy Grail, or New Devil?

• Challenges:• Learning Curve• Ceph Stability?• Need Support for Trees of RBD Clones

©2015 AKAMAI | FASTER FORWARDTM

FUTURE POSSIBILIES

• Incorporating Physical Hardware (Load/Performance Testing)

• Realistic Network Conditions (Latency, Loss)

• Subnetting / Internetworking

VTASTIC.AKAMAI.COM

©2015 AKAMAI | FASTER FORWARDTM

IMAGE CREDITS• http://www.huffingtonpost.com/2013/04/18/embarassing-data-disasters_n_3109254.html• http://exchange.nottingham.ac.uk/research/files/2012/08/drinks-production-line-912x343.jpg• http://machinelearningmastery.com/wp-content/uploads/2013/12/test-harness.jpg• http://www.constructionweekonline.com/pictures/drought.gif • http://static.giantbomb.com/uploads/original/23/232017/2612483-supercomputer_neu_03.jpg • http://blog.straphq.com/wp-content/uploads/sites/18/2015/02/hackathon-hackers.jpg• https://nationalsafety.files.wordpress.com/2011/07/071511_2104_safetyfails4.jpg?w=595• http://img.khelnama.com/sites/default/files/styles/gallery_content_big/public/mediaimages/gallery/2013/Feb/Tug%20of%20War%20image.jpg • http://www.globalnerdy.com/wordpress/wp-content/uploads/2013/06/WWDC-bathroom-line.jpg• http://media.masslive.com/republican/photo/2010/11/9022738-large.jpg• Unlock by Joel Bryant from the Noun Project• debug by Lemon Liu from the Noun Project• Robot by Angela Dinh from the Noun Project• Server by Mister Pixel from the Noun Project• coin by Rohith M S from the Noun Project• Waiting Room by Luis Prado from the Noun Project• users by TukTuk Design from the Noun Project• Traffic Light by Arthur Shlain from the Noun Project• Wrench by Rashida Luqman Kheriwala from the Noun Project• http://product-images.www8-hp.com/digmedialib/prodimg/lowres/c02632282.png• http://www.i2clipart.com/cliparts/2/c/3/a/clipart-database-symbol-256x256-2c3a.png • http://piedmontnewsonline.com/wp-content/uploads/awpcp/help_wanted_sign-large2.png • https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/XM12_and_XM2.png/220px-XM12_and_XM2.png• http://www.follytoxnetsystems.net/movie%20pix/cisco%20router_2801.gif • http://fcw.com/~/media/GIG/FCWNow/Topics/Records%20Management/electronic%20records%20management.jpg• play by Convoy from the Noun Project• Camera by iconoci from the Noun Project