Post on 28-Nov-2014
description
An Operating System for ulticore and Clouds: Mechanism and Implementation
D. Wentzlaff, C. Gruenwald III, N. Beckmann, K. Modzelewski, A. Belay, L. Youseff, J. Miller, A. Agarwal
CSAIL MIT
Jiannan Ouyang
ouyang@cs.pitt.edu
Ph.D. Student
Outline
• Introduction
• Multicore and Cloud Operating System Challenges
• Architecture
• Case Studies
• Implementation and Results
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Introduction
• Multicore and cloud computers need new operating systems
– Traditional OS doesn’t scale well
– Current IaaS systems require users to explicitly manage resources and machine boundaries
• fractured and non-uniform view of resources to the programmer
• user must often build or buy server load balancers for scheduling across machines
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Challenges of Multicore & Cloud operating systems
Problems OS designers need to address in the next decade
• Scalability
• Variability of Demand
• Faults
• Programming Challenges
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Scalability
• Current OSes were designed for single processor or small number of processor systems.
• Manycore computer system
– Limitations of locks, locality aliasing, reliance on shared memory
• Data center with thousands of servers
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Variability of Demand
• Resource of a manycore system: number of cores being used.
– Map processes to live cores
• Elasticity of the cloud
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Faults
• Hardware faults:
– dying cores and bit flips.
• Performance interference
– Interference between Apps and VMs impact the QoS
• Software faults
– Parallel programming, debugging is hard
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Programming Challenges
• Resource management must be done by the cloud application
• Load balancing is hard in the cloud
• no uniform programming model
– Intra/Inter-machine communication
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Factored Operating System(FOS)
• New OS should provide scalability, elasticity, fault tolerance, simple programming model
– Single system image OS
– Micro kernel, OS services run in user space, and communication via messages
– Each service consists a group of server, called fleet, that are distributed among the underlying cores and machines
– Message passing is mapped transparently across cores and machines
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Single System Image
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
• Ease of administration
• Transparent sharing
• Informed optimizations
• Consistency
• Fault tolerance
Architecture of FOS
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
• Microkernel
– messaging, name cache, time multiplexing of cores, API
• Messaging
– Inter-process communication and synchronization
– Each process has a number of mailbox
• Naming
– All servers within a fleet register under a given name
• OS Services
– Fleet: spatially distributed, cooperating servers
– FS fleet, naming fleet, scheduling fleet, proxy network server fleet…
Architecture of FOS
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Messaging
• Messaging on shared memory or over network
• Transparent intra- and inter-machine communication
• Force programmers think carefully about the amount of shared data
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Parallel Data Structure
• Managing state associated with a particular service among the members of the fleet
• Common container interface: abstracts several implementations that provide different consistency, replication and performance properties
• Existing solutions in the P2P community
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Case Study – File System
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Case Study – Spawning Sever
Create new server process on – decided by spawn server
• Same VM
• Another Existing VM
– spawn1–>proxy1->proxy2->spwan2
• New VM
– Create vm, send request to cloud manager
– Add vm to group, exchange name information, notify all other machines
– Forward spawn request to vm
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Case Study – Elastic Fleet
• A watchdog process monitoring the queue length
• Add server to fleet
– Spawn, handshaking,
• Make global decisions of elastic fleet
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Implementation
• Xen para-virtualized machine (PVM) OS
• Run on EC2 or Eucalyptus cloud infrastructure
• Configuration
– 16 machine cluster, each has 8 cores running at 3.16 GHz, 8G main memory, 1G Ethernet
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Result - syscall
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Result – fos network stack & app
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Result - FS
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Related Works
• Traditional microkernels.
– Like: Mach, L4, …
• Distributed Oses.
– Like: Amoeba, Sprite, and Clouds
• Cloud computing infrastructure.
– Like: Google AppEngine, and MS Azure
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011
Conclusion
• Cloud computing and multicores have created new classes of platforms for application development;
• Fos seeks to surmount these issues by presenting a single system interface to the user and by providing a programming model that allows OS system services to scale with demand;
• Fos is scalable and adaptive;
Jiannan Ouyang, CS Ph.D.@PITT 5/18/2011