Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)
-
Upload
carlos-eduardo-santos -
Category
Science
-
view
308 -
download
0
description
Transcript of Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)
![Page 1: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/1.jpg)
Scalable load-balancing forlarge-scale big data applications
+ Brazil, São Paulo, USP, IME
Carlos Eduardo Moreira dos SantosUniversity of São Paulo
University of Tokyo, 2014-05-29
![Page 2: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/2.jpg)
Brazil
![Page 3: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/3.jpg)
Brazil
● 5th largest country (8,515,767 km²)● 27 states and over 5.5k cities● Capital: Brasília● Language: (Brazilian) Portuguese● 6th most populous (202,656,788 in 2014)● 8th largest economy (Gross Domestic Product)● Currency: Brazilian Real● Info relative to Japan
○ Size: 22.5 * Japan's○ Population: 1.6 * Japan's○ "Distance": 27h-hour flight○ Time: Japan's minus 12h
![Page 4: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/4.jpg)
Brazil
![Page 5: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/5.jpg)
São Paulo
![Page 6: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/6.jpg)
São Paulo
![Page 7: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/7.jpg)
São Paulo
![Page 8: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/8.jpg)
São Paulo
● Largest Japanese community outside Japan (665k in 2010)
● 7th largest metropolitan area (7,943.818 km²)○ 0.59 * Tokyo's
● 8th most populous (19,956,590 in 2012)○ 0.54 * Tokyo's
● "Financial capital of Brazil"● 10th largest Gross Domestic Product in the world● BOVESPA stock exchange
○ Largest in Latin America○ Second in the world, in market value
● Largest number of helicopters in the world
![Page 9: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/9.jpg)
![Page 10: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/10.jpg)
University of São Paulo
● Latin America's largest University● >25% of Brazilian scientific production● QS World University Rankings
○ Improving rank position■ 2009: 207th■ 2013: 127th
○ Global top 50 in 7 of the 30 disciplines
![Page 11: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/11.jpg)
University of São Paulo
USP(2010)
Tokyo Univ. (2013)
Professors 5,732 2,604
Undergrads 56,998 14,120
Graduate students 25,591 13,878
Main campus size 7.4 km² 1.6 km²
![Page 12: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/12.jpg)
Institute of Mathematics and Statistics (IME) CS Department
● 42 full-time professors (+ 4 active retired)● 250 undergrads● 223 graduate students (124 masters + 99
PhD)● Graduating per year
○ 40-50 Bachelors○ 44 Masters○ 10 PhDs
![Page 13: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/13.jpg)
Institute of Mathematics and Statistics (IME) CS Department
● Research Areas○ Computer Theory○ Artificial Intelligence○ Software Engineering○ Parallel, Distributed, and Grid Computing○ Continuous Optimization○ Combinatorial Optimization○ Databases○ Software Systems○ Bioinformatics
![Page 14: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/14.jpg)
FLOSS Competence Center● Founded in January/2009 at our department:
○ USP Free and Open Source Competence Centre○ Funded by European Commission, Brazilian
government, and USP● Goal: promote the use of FLOSS and work towards
improving its quality○ Teaching○ Research○ Consulting
![Page 15: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/15.jpg)
2014 Brazilian Soccer Team
![Page 16: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/16.jpg)
Parallel/Distributed Systems Group
● Professors1. Alfredo Goldman2. Daniel Batista3. Fabio Kon4. Marco Aurélio Gerosa5. Marcos Dimas Gubitoso
![Page 17: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/17.jpg)
Parallel/Distributed Systems Group
● Close Collaborators1. João Eduardo Ferreira2. Marcelo Finger3. Siang W. Song4. Flavio S. C. Silva5. Kunio Okuda6. Routo Terada7. Kelly Braghetto8. Renata Wassermann
![Page 18: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/18.jpg)
Parallel/Distributed Systems Group
● Students○ ~20 doctoral○ ~30 masters○ ~20 undergrads
![Page 19: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/19.jpg)
Parallel/Distributed Systems Group● Research Areas
○ Software Engineering○ Agile Software Development methodologies○ OOP and Patterns○ Parallel Computing / HPC○ Distributed Systems / Middleware○ Grid Computing / Cloud Computing○ Big Data○ Databases (distributed / mobile)○ Object-Orientation in Software Architectures○ Mobile Computing○ Energy Efficiency
![Page 20: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/20.jpg)
Parallel/Distributed Systems GroupEducation
● Undergraduate and graduate courses○ Parallel, Distributed, and Cloud Computing○ Advanced Object Oriented Software Development○ eXtreme Programming Laboratory○ Entrepreneurship in Software Startups
● Continuing Education and Community courses○ Grid/Cloud Computing○ Web development with advanced OO tools○ Design Patterns and Agile Software Development
![Page 21: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/21.jpg)
Parallel/Distributed Systems GroupEducation
● Consulting work - OO software development○ São Paulo Legislature (Assembléia Legislativa)○ Ministry of Health○ USP administration, CPqD, LARC, Scopus, ITM, etc.○ Entrepreneurship (for startups)
![Page 22: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/22.jpg)
![Page 23: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/23.jpg)
![Page 24: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/24.jpg)
Main Research Projects
● HP Baile (Scalable, cloud-based systems)● CHOReOS (Web Service Choreographies)● InteGrade (Opportunistic Grid Computing)● Microsoft Borboleta (Telehealth with
smartphones)● Agile Methods for Software Development● Qualipso (Quality in Open Source)● IBM Eclipse Innovation
![Page 25: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/25.jpg)
CHOReOS
Scalable Web Service Choreographies for the Future Internet● 2010 - 2013● European Commision funding● 16 partners (education/industry) from Europe
(France, Greece, Italy, Lithuania, Latvia and UK) and Brazil (IME - USP)
![Page 26: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/26.jpg)
CHOReOS
Enactment Engine● Input
○ Web Services (implementation and/or URL)○ Metadata (dependency info, etc)
● Provision cloud resources● Deploy Web Services● Configure dependencies (by roles)● Technologies: Java, SOAP, REST, Chef
![Page 27: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/27.jpg)
![Page 28: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/28.jpg)
Embraer
● 3rd biggest world's aircraft manufacturer● 20k employees● Clients in 55 countries
○ Japan Airlines (E170)○ Armed forces in 48 countries
● 2013 net income: US$ 342 millions
![Page 29: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/29.jpg)
HP Baile
Development and use of WS choreographies in large-scale environments● 2010-2012● Funded by HP Brasil● Collaboration with HP Labs● Some outcomes
○ Rehearsal: WS choreographies with TDD○ Scalability Explorer○ Tech transfer on change impact analysis for
workflow repository management.
![Page 30: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/30.jpg)
InteGrade
● 2002 - 2011● Object-oriented grid middleware● Opportunistic● My final work for undergraduation
○ Grid Computing Resource Management - Node Control Center, 2009■ Limit CPU usage on the client-side■ Web interface (C++)
![Page 31: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/31.jpg)
InteGrade Node Control Center
● Multi-core● CPU affinity
![Page 32: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/32.jpg)
Brazil
![Page 33: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/33.jpg)
Brazil
![Page 34: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/34.jpg)
Scalable load-balancing forlarge-scale big data applications
Motivations● Increasing supercomputer power
○ 2008 Blue Gene/P Intrepid system■ 40,960 nodes■ 163,840 processor cores
○ 2011/12 Blue Gene/Q Sequoia system■ 98,304 nodes■ 1.6 million processor cores
● "Unlimited" resources in cloud computing● Big Data
![Page 35: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/35.jpg)
Scalable load-balancing forlarge-scale big data applications
Questions● Can centralized systems handle the load?● What about decentralized systems?● Are distributed systems required?
![Page 36: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/36.jpg)
Scalable load-balancing forlarge-scale big data applications
Many applications and solutions are available. We will start with MapReduce.Apache Hadoop implementation (2005)● Open Source (Apache top-level project)● Useful in a wide range of applications● Global community of users and contributors
○ Commercial support for companies○ Sponsors: Yahoo!, Google, HP, IBM, Facebook, ...
![Page 37: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/37.jpg)
São Paulo - Liberdade
![Page 38: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/38.jpg)
MapReduce Paradigm
2004 by Google
![Page 39: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/39.jpg)
MapReduce Paradigmfunction map(String name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1)
function reduce(String word, Iterator partialCounts): // word: a word // partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum)
![Page 40: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/40.jpg)
Hadoop v1
![Page 41: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/41.jpg)
Hadoop v2 (YARN)
![Page 42: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/42.jpg)
Hadoop v1 vs v2
● Pros○ Job Tracker was split
■ Resource Manager (RM)■ Application Master (AM)
○ As many AMs as jobs○ 5k nodes in 2009 to 10k in 2012
■ Same price = 2x resources● RM and AM are still centralized components
![Page 43: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/43.jpg)
Research
● Study scalability in Hadoop v1, v2○ Experiments○ Understand scalability gains in YARN
● Scalability limits○ Model centralized components overhead○ Predict scalability limits by simulation
● Conceive and simulate an alternative solution
![Page 44: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/44.jpg)
![Page 45: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/45.jpg)
Related Works - MTC
● Falkon (2007)○ Less features○ 487 vs 11 tasks/sec in Condor (2004)
● MATRIX (2013)○ Fully distributed○ Work-stealing leads to better efficiency (18-82% to
vs 92-97%)● CloudKon (CCGrid 2014)
○ Based on cloud services (IaaS and SaaS)○ The only one to support 256 VMs (up to 1024)○ Blames too many open TCP connections
![Page 46: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/46.jpg)
Related Works - MTC
![Page 47: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/47.jpg)
Proposal
● Less functionalities● Distributed selfish load balancing by
Adolphs & Berenbrink● No global information
○ Less open connections● ε-approximate NE convergence in O(ln
(m/n))○ Mathematically guaranteed to be fast○ Scalable
● Can deal with different speeds, weights○ Data-awareness
![Page 48: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/48.jpg)
Progress
Measuring Hadoop v1 and v2 latency● HiBench suite● Nuvem USP Cloud up to 64 VMs● Experiments automatization with Python
○ VM Management○ Hadoop Management○ Monitoring○ Log parsing○ Graphs
● Deadline: qualifying exam in August
![Page 49: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/49.jpg)
The End
Thank you! Questions?
![Page 50: Scalable load-balancing for large-scale big data applications (+Brazil, São Paulo, USP, IME)](https://reader034.fdocuments.net/reader034/viewer/2022042601/53fb572b8d7f72b82e8b53f0/html5/thumbnails/50.jpg)
Links
● http://www.usp.br● http://www.ime.usp.br● http://ccsl.ime.usp.br● http://www.choreos.eu● http://ccsl.ime.usp.br/baile● http://www.integrade.org.br● Contact: cadu at ime.usp.br