A.R.M.S. Active Resource Management Services For Big Data Processing
description
Transcript of A.R.M.S. Active Resource Management Services For Big Data Processing
![Page 1: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/1.jpg)
A.R.M.S. Active Resource
Management Services
For Big Data Processing
Revised Presentation One
3/21/2013 1
![Page 2: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/2.jpg)
2
Outline• 1: Title• 2: Outline• 3: Members• 4: Mentor• 5-6: Societal Issue• 7: History• 8-9: Dr. Li• 10-11: Cluster Computing• 12-14: Case Study• 15: Accuracy• 16: Current Major Functional
Component Diagram• 17: Current Process Flow• 18: Problem Statement
• 19: Proposed Major Functional Component Diagram
• 20: Proposed Process Flow• 21-24: Dinosolve Walkthrough• 25: Dinosolve Issues• 26: Software• 27: Hardware• 28: Solution Statement• 29: Competition Identified• 30-32: 508 Compliance• 33: Objectives• 34: Benefits of Solution• 35: Conclusion• 36-39: References• 40-44: Appendix
3/21/2013
![Page 3: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/3.jpg)
Group Members and Roles
• Scott Pardue (Team Leader)• Michael Rajs (Risk Manager)• Adam Willis (Algorithm Specialist)• Sybil Acotanza (Documentation
Specialist)• Jordan Heinrichs (Database Designer)• David Crook (User Interface
Designer)
3/21/2013 3
![Page 4: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/4.jpg)
Dr. Yaohang Li
•Associate Professor in the Department of Computer Science at Old Dominion University.•Research interests include:
•Computational Biology: applies computational simulation techniques to solve biological problems•Markov Chain Monte Carlo (MCMC) methods: statistical algorithm for sampling from probability distributions•Parallel Distributed Grid Computing: uses multiple computers communicating via Internet to solve a problem
3/21/2013 4
![Page 5: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/5.jpg)
How do researchers handle the massive amounts of data they are collecting in
order to benefit their research?
3/21/2013 5
![Page 6: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/6.jpg)
“Every day, [mankind] create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone.”1
3/21/2013 6http://www-01.ibm.com/software/data/bigdata/
![Page 7: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/7.jpg)
7
• Large Hadron Collider 2
– 150 million sensors report 40 million times per second
• Facebook 3
– 2.5 billion – content items shared– 2.7 billion – “Likes”– 300 million – photos uploaded
• Walmart 2
– 1 million customer transactions– 2.5 x 10^15 bytes of data
3/21/2013 http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/
Data Management Examples
![Page 8: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/8.jpg)
Dr. Li’s Research• Ideally, his research can be used to
develop new protein-modeling programs. Computational approaches can be more efficient and less expensive than biologists, chemists and others experimenting in lab settings
• Leads to the manufacturing of additional drugs to fight conditions as varied as Alzheimer’s disease, cystic fibrosis and mad cow disease
http://diverseeducation.com/article/13348/
![Page 9: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/9.jpg)
Dr. Li’s Grants
• Dinosolve, his current project, was secured for a five year, $400,000 CAREER Award from the National Science Foundation
• Dr. Li has been the principal or co-principal investigator on research grants totaling more than $15.3 million
![Page 10: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/10.jpg)
Big Data Analysis Hardware• Cluster Computing 4
• A cluster consists of many nodes (computers).• Big data can be generated and analyzed quicker by
spreading the workload amongst the nodes.
3/21/2013 10
Head Node• Logging data• Job submission3 Computation Node• 2 Processors each
• 4 Execution slots per processor
24 total execution slots
Head node packages data from the computation nodes and presents it in a readable format so that it is usable by the research community
![Page 11: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/11.jpg)
Managing the Cluster
Distributed Resource Management Systems (D-RMS)
–Job management subsystem–Physical resource management subsystem–Scheduling and queuing subsystem
3/21/2013 11
![Page 12: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/12.jpg)
12
Dr. Yaohang Li and Dinosolve
• Dinosolve examines a protein sequence of amino acids and determines if the protein can be manipulated by an addition of a disulfide bond
• Each computational result enhances the prediction accuracies for future results
3/21/2013 http://hpcr.cs.odu.edu/dinosolve/index.php
![Page 13: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/13.jpg)
Dinosolve Case Study• Bioinformatics7– Disulfide bond
prediction program
– Disulfide bond creation is important to the research community
3/21/2013 13
![Page 14: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/14.jpg)
Dinosolve Users• Drug design• Pharmaceutical companies
• Antibody design• To combat viruses
• Bio-energy development• Creation of new fuels to replace diminishing
fossil fuels• Genetic mapping5• Research to cure cancer, HIV, and other diseases
3/21/2013 14
![Page 15: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/15.jpg)
Accuracy of Popular Tools
Dinosolve DiANNA Scrath Protein Predictor
Accuracy 90.8% 81% 87%
3/21/2013 Reference 13,14 and 15 15
More users use Dinosolve because of the enhanced accuracy
![Page 16: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/16.jpg)
![Page 17: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/17.jpg)
3/21/2013 17
![Page 18: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/18.jpg)
What is the problem?
• Processing time on big data sets is computationally expensive and as the volume of queries grows the system will progressively drop in performance until the system fails.
• 300 simultaneous requests will cause the web served to crash
3/21/2013 18
![Page 19: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/19.jpg)
3/21/2013 19
![Page 20: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/20.jpg)
3/21/2013 20
![Page 21: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/21.jpg)
3/21/2013 21
User interface will be improved to be more aesthetically pleasing
![Page 22: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/22.jpg)
Working with DinosolveInput titleInput protein sequenceInput e-mail addressSubmit, then wait for confirmation...
Protein Sequence: string of alphabetic characters, each of which represent a particular amino acid in the protein
3/21/2013 22
![Page 23: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/23.jpg)
23
Working with DinosolveConfirmation of requestNow wait for results
3/21/2013
![Page 24: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/24.jpg)
24
Working with DinosolveCheck your e-mail,Click the link providedThe results are displayed
![Page 25: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/25.jpg)
Dinosolve IssuesAs it continues to grow in popularity, these are expected to occur:
• Hard resources for computation– CPU cycles– Memory– Disk space– Network bandwidth
• Server crashes
Goal is to prepare the system to be able to continue to support the research community in light of its expected growth in requests
3/21/2013 25
![Page 26: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/26.jpg)
Software
• Unix operating system installed on the Dinosolve cluster
• Dinosolve algorithm• Sun Grid Engine which will be our
Distributed Resource Management System (D-RMS) installed on the cluster.
• MySQL (database software) • Web-based user interface (website)
3/21/2013 26
![Page 27: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/27.jpg)
Hardware
• MySQL database server• A computer cluster to run the
Dinosolve algorithm• Web server for web-based user
interface
3/21/2013 27
![Page 28: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/28.jpg)
How will we correct the problem?
Configure a distributed resource management system
3/21/2013 28
![Page 29: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/29.jpg)
Competing Distributed Resource Management Systems
• Sun Grid Engine (SGE)• Portable Batch System (PBS)• Load Sharing Facility (LSF)
3/21/2013 29
![Page 30: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/30.jpg)
Dinosolve DiANNA Scrath Protein Predictor
508.22 compliance percentage
67% 85% 67%
3/21/2013 30
![Page 31: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/31.jpg)
508 compliance
• Amended Rehabilitation Act of 1998– require Federal agencies to make their
electronic and information technology accessible to people with disabilities [32]
– enacted to eliminate barriers in information technology, to make available new opportunities for people with disabilities, and to encourage development of technologies that will help achieve these goals [32]
3/21/2013 31
![Page 32: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/32.jpg)
Why is it important to be compliant?
If an entity wishes to receive government funding then any electronic form the entity uses
must be 508 compliant.
3/21/2013 32
![Page 33: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/33.jpg)
Objectives
• Interpret and visualize current usage statistics
• Configure, utilize, and optimize the SGE
• Aesthetically pleasing and professional user interface
3/21/2013 33
![Page 34: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/34.jpg)
What benefits will come from attaining the goals?
• Efficient utilization of available resources• Increased throughput of the cluster• An intuitive and professional user
interface• Rise in popularity due to excellent
accuracy, efficiency, and professional design
3/21/2013 34
![Page 35: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/35.jpg)
Conclusion
With the updated user interface and correctly configured Sun Grid Engine, Dr. Li hopes to establish a
reputable, reliable, and aesthetically pleasing Disulfide
Bonding Prediction Server.
3/21/2013 35
![Page 36: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/36.jpg)
References for history1. http://www-01.ibm.com/software/data/bigdata/2. http://en.wikipedia.org/wiki/Big_data3. http
://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/
4. http://en.wikipedia.org/wiki/Computer_cluster
3/21/2013 36
![Page 37: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/37.jpg)
References for case study
5. Li, Y. (2010, September 1). CAREER: Novel Sampling Approaches for Protein Modeling Applications [Abstract]. National Science Foundation Award Abstract #1066471.
6. Li, Y., & Yaseen, A. (2012). Enhancing Protein Disulfide Bonding Prediction Accuracy with Context-based Features. Biotechnology and Bioinformatics Symposium
7. bioinformatics. 2011. In Merriam-Webster.com. Retrieved February 15, 2013, from http://www.merriam-webster.com/dictionary/bioinformatics
8. Cronk, J. D. (2012). Disulfide Bond. Retrieved February 15, 2013, from Biochemistry Dictionary: http://guweb2.gonzaga.edu/faculty/cronk/biochem/D-index.cfm?definition=disulfide_bond
9. Yan, Y., & Chapman, B. (2008). Comparative Study of Distributed Resource Management Systems–SGE, LSF, PBS Pro, and LoadLeveler. Technical Report-Citeseerx.
10. Li, Y., & Yaseen, A. (2012). Dinosolve. Retrieved from http://hpcr.cs.odu.edu/dinosolve/
3/21/2013 37
![Page 38: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/38.jpg)
References for competition11. Arvind Krishna, “Why Big Data? Why Now?”, IBM , 2011 URL: http://almaden.ibm.com/colloquium/resources/Why%20Big%20Data%20Krishna.PDF12. Yonghong Yan, Barbara M. Chapman, Comparative Study of Distributed Resource Management Systems - SGE, LSF, PBS Pro, and LoadLeveler, Department of Computer Science, University of Houston, May 2005 (pdf)13. Dr. Li’s site http://hpcr.cs.odu.edu/dinosolve/14. Scratch Predictor http://scratch.proteomics.ics.uci.edu/15. DiANNA server http://clavius.bc.edu/~clotelab/DiANNA/Portable Batch System (PBS)16. http://resources.altair.com/pbs/documentation/support/PBSProUserGuide12-2.pdf17. http://www.pbsworks.com/SupportDocuments.aspx?AspxAutoDetectCookieSupport=118. http://resources.altair.com/pbs/documentation/support/PBSProRefGuide12-2.pdf19. http://resources.altair.com/pbs/documentation/support/PBSProAdminGuide12-2.pdf20.http://www.pbsworks.com/(S(tykrsyqbemmlf3o5zwrmjrgf))/images/solutions-en-US/PBS-Pro_Datasheet-USA_WEB.pdf21.http://agendafisica.files.wordpress.com/2011/05/pbs.pdfMoab HPC Suite22.http://www.adaptivecomputing.com/publication/420/wppa_open/IBM Platform LSF23.http://public.dhe.ibm.com/common/ssi/ecm/en/dcd12354usen/DCD12354USEN.PDFApache Hadoop with Zookeeper24. http://zookeeper.apache.org/doc/current/zookeeperOver.html25. http://www.cloud-net.org/~swsellis/tech/solaris/performance/doc/blueprints/0102/jobsys.pdf
3/21/2013 38
![Page 39: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/39.jpg)
Reference for 508 Compliance
26. http://en.wikipedia.org/wiki/Section_508_Amendment_to_the_Rehabilitation_Act_of_1973
3/21/2013 39
![Page 40: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/40.jpg)
Appendix• 40: Competition Matrix for Resource Management
Systems• 41-43: 508.22 Compliance Statistics for Dinosolve
3/21/2013 40
![Page 41: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/41.jpg)
Competing Resource Management Systems
Features of systems
PBS LSF SGE
Supported platforms
Unix Unix & NT Unix
Multi-clustersupport
Yes Yes No
System level checkpoint
restart
No Yes Yes
User level checkpoint
restart
No No Yes
Large computational grid support
No No No
Massive Scalability
Yes Yes Yes
Parallel job support with Sun HPC ClusterTools
Loose Integration
Tight Integration Loose Integration
Distribution format of end
product
Source Binary only Binary and Source
Free? Yes No YesPosix 1002.2d
complianceYes No Yes3/21/2013 Reference 19 41
![Page 42: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/42.jpg)
3/21/2013 42
![Page 43: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/43.jpg)
3/21/2013 43
![Page 44: A.R.M.S. Active Resource Management Services For Big Data Processing](https://reader036.fdocuments.net/reader036/viewer/2022062305/5681645d550346895dd6307e/html5/thumbnails/44.jpg)
3/21/2013 44