OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden,...

27
Vtastic: Innovations In Distributed Systems Testing Jack Wadden, Sr. Engineering Manager Akamai Technologies, Inc.

Transcript of OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden,...

Page 1: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

Vtastic: Innovations In Distributed Systems Testing

Jack Wadden, Sr. Engineering ManagerAkamai Technologies, Inc.

Page 2: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

AKAMAI CDN OVERVIEW

• We Make the Internet Fast, Reliable and Secure

• Globally-Distributed Network of Servers• Caching Content Close to End Users• Scalable Live Media Streaming• Protocol Optimizations

• DNS-Based Load Balancing System • Chooses the Best Server to Handle Your Requests

Page 3: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

MASSIVE SCALE• 15-30% of All Internet Traffic• 3+ Trillion Hits/day (2 x 1012)• 30+ Tbps

• 215,000+ Servers• Located in 120+ Countries

• 1000+ Software Components

• 100+ of Server Roles

Page 4: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

SYSTEM TESTING AT AKAMAI

Page 5: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

TESTNETS: AKAMAI’S SYSTEM TEST ENVIRONMENT

Page 6: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

HOWEVER, AT AKAMAI TESTNETS ARE A SCARCE RESOURCE

Page 7: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

THEY ARE EXPENSIVE TO BUILD

Page 8: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

AND REQUIRE A HUGE TEAM TO MAINTAIN

Page 9: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

SHARING LEADS TO DISRUPTIONS

Page 10: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

SOMETIMES THE FIT IS POOR

Page 11: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

CONFLICTING USES NEED TO BE COORDINATED

Page 12: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

AND RESULT IN INEVITABLE DELAYS

Page 13: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

FEATURES OF A BETTER TESTNETLow barrier to access Eliminate coordination

No-block debugging Automation

Portable, restorable configuration Efficient maintenance

Permit destructive testing Optimal platform utilization

Page 14: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

CONTINUOUS, AUTOMATED,

END-TO-END TESTINGFOR ALL ENGINEERS

ON EVERY COMPONENT ACROSS AKAMAI

The Vision:

Page 15: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

TESTNET CLONINGTest Harness

VTASTIC ResourceTracker

OpenNebulaMaster Storage

TestnetClones

Page 16: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

VTASTIC MASTER TESTNET

• Supported by SME teams

• Running Production Versions

• Vtastic Team Coordinates Changes

• Custom Clones can be Saved, Shared

Master Master MasterCandidate

Snapshot

Clone

Old Master

Page 17: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

CLONES USE PRIVATE IP SPACE

100.80.0.8 (MDT)

100.80.0.15 (KDC)

100.80.0.21 (UMP)

GWSH, SOCKS

172.26.238.16 (NAT Exit)

100.80.0.1(NAT Gateway)

IP (Anything)

VLAN #83

Page 18: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

NAT TUNNELING TOOLS

• vpoint: Testnet-Attached bash Shell• LD_PRELOAD for Transparent SOCKS Tunneling (dante-client)• Proprietary SSH-proxy client

• chrome-vpoint, firefox-vpoint• Dedicated browser session with SOCKS configuration

Page 19: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

DESIGN APPROACH

• Centrally-Managed Infrastructure• Resources Granted to Users/Groups

• Distributed Storage & Compute Platform

• Commodity Hardware

• Open Source Technology• Virtualization: Qemu/KVM• Storage: GlusterFS• Orchestration: OpenNebula!!• Vtastic VRT: Python, Django, Apache

Page 20: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

SPECS, SCALE

• 40 VM Hosts• 32 Cores• 128 GB RAM• 2 x 10 Gbps Ethernet• Average 35 VMs per Host

• 40-50+ Testnets• 30-120 Nodes per Testnet• 1500-2000+ Total VMs

• 40 Storage Nodes• 8 Cores• 32 GB RAM• 10 Gbps Ethernet• 6 x 384 GB SSD + RAID0 = 2.1 TB• Total Usable Space = 42 TB

• Master Testnet• 120 Nodes• ~1.5 TB (After virt-sparsify)

Page 21: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

1.0: GLUSTER & FUSE

• Backing Files and Scratch Images on Remote Storage• Qemu Uses POSIX Path (/glusterclient/foo)

• Problems:• Memory Leaks, Hangs in GlusterFS FUSE Mount• Occasional Loss of VMs• Performance Concerns

Page 22: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

1.1: GLUSTER DIRECT

• Qemu uses libgfapi (gluster://SERVER:PORT/foo)

• Backing Files and Scratch Images on Remote Storage

• FUSE Mount Used for Image Management

• Problems:• Frequent, Catastrophic Loss of VMs• Occasional FUSE Mount Problems (Image Management)

Page 23: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

1.2: FUSE + LOCAL SCRATCH

• Qemu Uses POSIX Path (/glusterclient/foo) for Backing Image

• FUSE Mount Used for Image Management

• Scratch Images Stored on Local Disk

• Problems:• Increased Snapshot Time• No Live Migration• Occasional FUSE Mount Problems (Image Management)• Lack of Trust (VM Loss Experienced before Re-creating Gluster Volume)

Page 24: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

IN DEVELOPMENT: CEPH

• Static and Scratch Images on Remote Storage• Live Migration Possible• Holy Grail, or New Devil?

• Challenges:• Learning Curve• Ceph Stability?• Need Support for Trees of RBD Clones

Page 25: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

FUTURE POSSIBILIES

• Incorporating Physical Hardware (Load/Performance Testing)

• Realistic Network Conditions (Latency, Loss)

• Subnetting / Internetworking

Page 26: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

VTASTIC.AKAMAI.COM

Page 27: OpenNebulaConf 2016 - VTastic: Akamai Innovations for Distributed System Testing by Jack Wadden, Akamai

©2015 AKAMAI | FASTER FORWARDTM

IMAGE CREDITS• http://www.huffingtonpost.com/2013/04/18/embarassing-data-disasters_n_3109254.html• http://exchange.nottingham.ac.uk/research/files/2012/08/drinks-production-line-912x343.jpg• http://machinelearningmastery.com/wp-content/uploads/2013/12/test-harness.jpg• http://www.constructionweekonline.com/pictures/drought.gif • http://static.giantbomb.com/uploads/original/23/232017/2612483-supercomputer_neu_03.jpg • http://blog.straphq.com/wp-content/uploads/sites/18/2015/02/hackathon-hackers.jpg• https://nationalsafety.files.wordpress.com/2011/07/071511_2104_safetyfails4.jpg?w=595• http://img.khelnama.com/sites/default/files/styles/gallery_content_big/public/mediaimages/gallery/2013/Feb/Tug%20of%20War%20image.jpg • http://www.globalnerdy.com/wordpress/wp-content/uploads/2013/06/WWDC-bathroom-line.jpg• http://media.masslive.com/republican/photo/2010/11/9022738-large.jpg• Unlock by Joel Bryant from the Noun Project• debug by Lemon Liu from the Noun Project• Robot by Angela Dinh from the Noun Project• Server by Mister Pixel from the Noun Project• coin by Rohith M S from the Noun Project• Waiting Room by Luis Prado from the Noun Project• users by TukTuk Design from the Noun Project• Traffic Light by Arthur Shlain from the Noun Project• Wrench by Rashida Luqman Kheriwala from the Noun Project• http://product-images.www8-hp.com/digmedialib/prodimg/lowres/c02632282.png• http://www.i2clipart.com/cliparts/2/c/3/a/clipart-database-symbol-256x256-2c3a.png • http://piedmontnewsonline.com/wp-content/uploads/awpcp/help_wanted_sign-large2.png • https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/XM12_and_XM2.png/220px-XM12_and_XM2.png• http://www.follytoxnetsystems.net/movie%20pix/cisco%20router_2801.gif • http://fcw.com/~/media/GIG/FCWNow/Topics/Records%20Management/electronic%20records%20management.jpg• play by Convoy from the Noun Project• Camera by iconoci from the Noun Project