Luigi De Simone Tutor: Prof. Domenico Cotroneo XXIX Cycle - I year presentation Dependability issues...

download Luigi De Simone Tutor: Prof. Domenico Cotroneo XXIX Cycle - I year presentation Dependability issues in cloud computing infrastructures.

If you can't read please download the document

Transcript of Luigi De Simone Tutor: Prof. Domenico Cotroneo XXIX Cycle - I year presentation Dependability issues...

  • Slide 1
  • Luigi De Simone Tutor: Prof. Domenico Cotroneo XXIX Cycle - I year presentation Dependability issues in cloud computing infrastructures
  • Slide 2
  • I received my M.Sc. in Computer Engineering (cum laude) from University of Naples Federico II I research within the MOBILAB group at DIETI Type of fellowship: PhD student grant Luigi De Simone 2 ::. Background
  • Slide 3
  • Luigi De Simone 3 Highly distributed Heterogeneous hardware and software components Expected to provide highly- available services requested by millions of user in parallel Very complex Ecosystems!!! 2 / 22 X X Cloud users X X X X XaaS Cloud services VoIP Video Streaming Storage Cloud providers ::. What is the problem?
  • Slide 4
  • Luigi De Simone 4 2 / 22 Source [1] ::. Cloud computing ecosystems fail? FAILURES in such ecosystems are inevitable! Too many factors are outside of our control, where almost all is driven by software! FAILURES in such ecosystems are inevitable! Too many factors are outside of our control, where almost all is driven by software!
  • Slide 5
  • We need to develop tools and methodologies to evaluate dependability issues in CCEs Luigi De Simone 5 2 / 22 RQ1: What is the nature of such faults that lead to cloud failures? RQ2: What is the impact of faults on the CCEs? RQ3: What is the weakest component/layer within CCE against faults? ::. Research questions
  • Slide 6
  • The idea is to leverage Fault Injection Testing to reproduce fault propagation within a CCE by deliberately introducing faults Recent studies have been done in testing of cloud-based applications [2][4], using cloud platforms to perform testing of applications [3] We need approaches specifically focused on the reliability evaluation of cloud services in spite of faults within the CCE. Luigi De Simone 6 ::. Existing cloud testing tool
  • Slide 7
  • Luigi De Simone 7 ::. Existing fault injection tools in cloud Current studies focused on Virtual Machines (D-Cloud [5]; DS-Bench Toolset [6], Chaos Monkey [7]) Hypervisors (CloudVal [8]) Cloud management stack (Openstack resilience study [9], PreFail [10]) But Focus mostly on the injection of hardware faults Focus on a specific component within CCE and does not provides insight to improve reliability of the CCE as a whole Focus on open-source virtualization technologies Does not provide an impact analysis (risk), in particular related to cloud user perceptions Focus mostly on the injection of hardware faults Focus on a specific component within CCE and does not provides insight to improve reliability of the CCE as a whole Focus on open-source virtualization technologies Does not provide an impact analysis (risk), in particular related to cloud user perceptions
  • Slide 8
  • Luigi De Simone 8 Enhancing Fault Injection for CCEs Define realistic fault models Define metrics have to be used Support Fault Propagation Analysis in CCEs Develop methods and tools for: Tracing CCEs Analyzing fault effects Providing useful insights to designers Testing of Cloud Management Software Define effective test scenarios Automate the testing process First Year ::. Research roadmap and challenges
  • Slide 9
  • [P1] De Simone, L., "Towards Fault Propagation Analysis in Cloud Computing Ecosystems," Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on, pp.156,161, 3-6 Nov. 2014, DOI: 10.1109/ISSREW.2014.47 BEST PRESENTATION AWARD [P2] Cotroneo, D.; De Simone, L.; Iannillo, A.K.; Lanzaro, A.; Natella, R.; Jiang Fan; Wang Ping, "Network Function Virtualization: Challenges and Directions for Reliability Assurance," Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on, pp.37,42, 3-6 Nov. 2014, DOI: 10.1109/ISSREW.2014.48 [P3] Cotroneo, D.; De Simone, L.; Iannillo, A.K.; Lanzaro, A.; Natella, R., "Improving Usability of Fault Injection," Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on, pp.530,532, 3-6 Nov. 2014, DOI: 10.1109/ISSREW.2014.37 [P4] Domenico Cotroneo, Luigi De Simone, Antonio Ken Iannillo, Anna Lanzaro, Roberto Natella, Dependability Evaluation and Benchmarking of Network Function Virtualization Infrastructures, submitted at 1st IEEE CONFERENCE ON NETWORK SOFTWARIZATION (NetSoft) Luigi De Simone 9 ::. Publications
  • Slide 10
  • Luigi De Simone 10 Further investigation about different virtualization technologies (e.g. LXC, KVM) and their impact on dependability of CCE Analyze fault propagation and their effects on CCE Credits summary: ::. Next year
  • Slide 11
  • [1] Zheng Li et al., The Cloud's Cloudy Moment: A Systematic Survey of Public Cloud Service Outage, arXiv:1312.6485 [2] X. Bai, M. Li, B. Chen, W.-T. Tsai, and J. Gao, Cloud testing tools, in Proc. Intl. Symp. SOSE, 2011, pp. 112. [3] L. Ciortea, C. Zamfir, S. Bucur, V. Chipounov, and G. Candea, Cloud9: A software testing service, SIGOPS Operating System Review, vol. 43, no. 4, pp. 510, Jan. 2010. [4] S. Bouchenak, G. Chockler, H. Chockler, G. Gheorghe, N. Santos, and A. Shraer, Verifying cloud services: Present and future, SIGOPS Operating System Review, vol. 47, no. 2, pp. 619, Jul. 2013. [5] T. Banzai, H. Koizumi, R. Kanbayashi, T. Imada, T. Hanawa, and M. Sato, D-cloud: Design of a software testing environment for reliable distributed systems using cloud computing technology, in Proc. Intl. Conf. CCGRID, 2010, pp. 631636. [6] H. Fujita, Y. Matsuno, T. Hanawa, M. Sato, S. Kato, and Y. Ishikawa, DS-Bench Toolset: Tools for dependability benchmarking with simu- lation and assurance, in Proc. Intl. Conf. DSN, 2012, pp. 18. [7] Netflix. The Chaos Monkey. [Online]. Available: https://github.com/ Netflix/SimianArmy/wiki/Chaos- Monkey [8] C. Pham, D. Chen, Z. Kalbarczyk, and R. K. Iyer, CloudVal: A framework for validation of virtualization environment in cloud infras- tructure, in Proc. Intl. Conf. DSN, 2011, pp. 189196. [9] X. Ju, L. Soares, K. G. Shin, K. D. Ryu, and D. Da Silva, On fault resilience of OpenStack, in Proc. SOCC, 2013, pp. 116. [10] P. Joshi, H. S. Gunawi, and K. Sen, Prefail: A programmable tool for multiple-failure injection, in Proc. Intl. Conf. OOPSLA, 2011, pp. 171188. Luigi De Simone 11 ::. References
  • Slide 12
  • Luigi De Simone 12 Thank you for this opportunity!
  • Slide 13
  • Luigi De Simone 13 BACKUP SLIDES
  • Slide 14
  • Luigi De Simone 14 2 / 22 Physical resources Virtualization Operating System Management Tools Application VM Container Hypervisor (VMware ESXI, Hyper-V) Host OS Kernel with Virtualization (LXC, Docker, OpenVZ) App Guest OS VM App Guest OS App Compute Storage Memory Networking ::. CCE Internals
  • Slide 15
  • Luigi De Simone 15 ToolTargetFaultloadInjection Technique D-Cloud [26] and DS-Bench Toolset [27] Server software (e.g., web applications) Network, Disk, Memory faults Emulation of faulty device; VM memory corruption Chaos Monkey [28]Virtual instances during runtime CPU, Disk, Network faultsExecuting scripts that simulates a fault on target machine PreFail [31]Distributed filesystem and algorithms (e.g., HDFS, ZooKeeper) Network and Disk faults; Process crash API exception injection OpenStack Resilience Framework [30] OpenStackService crash and Network partition API exception injection CloudVal [29]Hypervisors (e.g., Xen, KVM)CPU, Memory, VM faultsMemory corruption ::. Existing fault injection tool in cloud are enough?
  • Slide 16
  • Luigi De Simone 16 Fault Model describes the types of fault that the system is expected to experience during runtime VMware ESXi storage subsystem Which components are source of faults? How components can fail? How and which faults can be injected? ::. Fault Model Definition
  • Slide 17
  • Luigi De Simone 17 ::. Fault model examples
  • Slide 18
  • Luigi De Simone 18 The tool is aimed at virtualized systems The tool allows the injection of I/O faults (e.g., drop packets) and Compute faults (e.g., cpu register corruption) in virtualization technologies The tool does not require: availability of debugging facilities source code of hypervisors, of guest OSes, and of application software Applicable to proprietary off-the-shelf hardware and virtualization technologies. ::. Fault Injection tool development
  • Slide 19
  • Luigi De Simone 191 / 23 Network Function Virtualization (NFV) is an emerging solution that allows to turn hardware network equipment into software-based virtual entities, leveraging cloud computing and virtualization technologies It promises to reduce CAPEX (e.g., lower hardware costs) and OPEX (e.g., shorten development and test cycle) but current solutions: Does not test dependability threats in NFV systems! Does not provide risk analysis about NFV adoption! ::. Dependability evaluation of NFV infrastructures \1 Collaboration with Huawei
  • Slide 20
  • Luigi De Simone 20 In [P4], I conduct a dependability evaluation and benchmarking methodology for NFVIs, based on fault injection testing The methodology analyzes how faults impact on VNFs in terms of performance degradation and service unavailability. The case study is performed on the IMS, and showed how the methodology can point out dependability bottlenecks in the NFVI and guide design efforts. ::. Dependability evaluation of NFV infrastructures \1 Collaboration with Huawei
  • Slide 21
  • Luigi De Simone 21 ::. Dependability evaluation of NFV infrastructures \1 Collaboration with Huawei
  • Slide 22
  • Luigi De Simone 22 Availability Latency ::. Dependability evaluation of NFV infrastructures \1 Collaboration with Huawei