Lecture 06 Multicore Operating Systems.ppt [호환...

22
Real-Time Computing and Communications Lab. Hanyang University Multicore Operating Systems Minsoo Ryu Department of Computer Science and Engineering Hanyang University

Transcript of Lecture 06 Multicore Operating Systems.ppt [호환...

Real-Time Computing and Communications Lab.Hanyang University

Multicore Operating Systems

Minsoo Ryu

Department of Computer Science and EngineeringHanyang University

2

2

Real-Time Computing and Communications Lab.Hanyang University

Kernels for Small Multiprocessors1 XPage

Advantages and Limitations of SMP Kernels2 XPage

Kernels for Large Multiprocessors3 XPage

Q & A4 XPage

3

3

Real-Time Computing and Communications Lab.Hanyang University

Considerations for Multiprocessors

Diversity in HW organization Shared memory vs. distributed memory Bus-based vs. network-based Homogeneous vs. heterogeneous

Interactions among processors Memory and cache sharing Inter-processor communications

Shared memory & bus-based (UMA)

Distributed memory & network-based (NUMA)

Clustered & hybrid(NUMA)

4

4

Real-Time Computing and Communications Lab.Hanyang University

Classification of Multiprocessor Kernels

For small multiprocessors Master-slave organization Independent kernels SMP (Symmetric Multiprocessing) kernel

For large multiprocessors (more than tens of cores) Distributed kernels Factored kernels Clustered organization

5

5

Real-Time Computing and Communications Lab.Hanyang University

Master-Slave Organization

One copy of the OS on a master CPU All system calls are redirected to the master CPU All the other CPUs run user processes

Problem The master CPU can be a bottleneck

pc

MemoryCPU 1

pc

MasterrunsOS

CPU 2

Slaveruns

processes

user codeuser data

user codeuser data

kernel codekernel data

user codeuser data

CPU 3

Slaveruns

processes

CPU 4

Slaveruns

processespc

pc

pc

6

6

Real-Time Computing and Communications Lab.Hanyang University

Independent Kernels

Each CPU has its own private OS Memory is partitioned and assigned to each OS as private

memory The n CPUs operate as n independent computers

MemoryCPU 1pc

Has privateOS X

CPU 2

Has privateOS Y

user codeuser data

user codeuser data

kernel codekernel data

kernel codekernel data

pc

7

7

Real-Time Computing and Communications Lab.Hanyang University

Independent Kernel Instances

Each CPU has a different instance of the same OS All the CPUs share the OS code and make private copies

of only the data

MemoryCPU 1

pcHas

privateOS X

CPU 2

Has privateOS X

user codeuser data

user data

kernel code

kernel data

user code

kernel data

pc

8

8

Real-Time Computing and Communications Lab.Hanyang University

Advantages and Disadvantages

Advantages Better than n separate computers since it allows all the

machines to share a set of hardware resources A single kernel crash does not affect other kernels

Disadvantages Applications need to written in a distributed fashion

• Cannot exploit the tightly coupled hardware architecture There is no sharing of threads and each OS schedules their

threads by itself Using shared resources may result in inconsistent results

• e.g.) A certain block is present and dirty in multiple buffer caches at the same time

9

9

Real-Time Computing and Communications Lab.Hanyang University

SMP (Symmetric Multiprocessing) Kernels

One copy of the OS in memory and any CPU can run it When a system call is made, the CPU on which the system

call was made traps to the kernel and processes the system call

MemoryCPU 1

pc

RunssharedOS X

CPU 2

RunssharedOS X

user code

user data

user data

kernel data

user code

kernel code

pc

10

10

Real-Time Computing and Communications Lab.Hanyang University

Data Races

However, there can be data races Two CPUs can execute simultaneously in the kernel They may access the same kernel data simultaneously

MemoryCPU 1

pc

RunssharedOS X

CPU 2

RunssharedOS X

user code

user data

user data

kernel data

user code

kernel code

pc

11

11

Real-Time Computing and Communications Lab.Hanyang University

Giant Lock

Simple solution Associate a mutex (giant lock) with the OS to make the

whole system one big critical region When a CPU wants to run OS, it must acquire the mutex

This solution is not scalable If 10% of all run time is spent inside the OS, then 10 CPUs

will pretty much saturate the whole system

12

12

Real-Time Computing and Communications Lab.Hanyang University

Coarse- and Fine-Grained Locks

Better solution Split the OS into independent critical regions

• One CPU can run the scheduler while another CPU is handling a system call

• Note: great care must be taken to avoid deadlocks

SMP kernels are in widespread use: Unix, Linux, MS Windows, Apple OS X, iOS, …

13

13

Real-Time Computing and Communications Lab.Hanyang University

Advantages and Limitations of SMP kernels

Advantages: “shared everything structure” All OS services and abstractions are shared

• e.g.) abstraction of a single flat, global address space All HW resources are shared

Limitations SMP kernels require restricted (SMP) HW architectures

• Identical instruction sets across processors• Shared memory structure

SMP kernels do not scale well • Lock contention cost• Reliance on shared memory

14

14

Real-Time Computing and Communications Lab.Hanyang University

Lock Contention

Memory allocation experiments (MIT, 2009) 16-core Intel system with Linux 2.6.24.7 kernel

• Xeon E7340 CPUs at 2.40GHz and 16GB of RAM

15

15

Real-Time Computing and Communications Lab.Hanyang University

Shared Memory and Cache Coherence

Shared memory vs. MSG passing (Microsoft, 2009) Shared memory updates lead to large cache miss costs

due to the cache coherence mechanism MSG passing costs much less than shared memory

16

16

Real-Time Computing and Communications Lab.Hanyang University

Shared Memory and Cache Coherence

SHM updates A processing core updates a shared memory variable Other processing cores stall due to cache coherence

MSG passing Only a server processor accesses the memory variable Other processors send/receive values to/from the server

• There is no cache coherence and no processor stall• Each processor may experience some queueing delay

Grows linearly with the number of clientsDoes not cause processor stall

• Memory update cost at the server is low and constant

17

17

Real-Time Computing and Communications Lab.Hanyang University

Distributed Kernels: The Multikernel Model

Use ideas from distributed systems Treats the machine as a network of independent cores Replicates OS states across cores that communicate using

messages and share no memory

Microsoft Research and ETH Zurich,ACM SOSP, 2009

18

18

Real-Time Computing and Communications Lab.Hanyang University

Factored Kernels: FOS

A new operating system targeting manycore systems Each operating system service is factored into a set of

communicating servers• Space sharing rather than time sharing

Relies on message passing rather shared memory

MIT, ACM OSR, 2009

19

19

Real-Time Computing and Communications Lab.Hanyang University

Clustered Organization

Key idea Processors are grouped into clusters Each cluster runs an independent OS kernel Clusters are the unit of scalability

CPU

OS

App

CPU

App

CPU

OS

App

CPU

App

CPU

App

CPU

OS

App

20

20

Real-Time Computing and Communications Lab.Hanyang University

Hive (Stanford, ACM SOSP’95)

The original purpose of Hive was to achieve reliability and fault containment

Each cell runs an SMP kernel

21

21

Real-Time Computing and Communications Lab.Hanyang University

22

22

Real-Time Computing and Communications Lab.Hanyang University

AsyMOS: Asymmetric Independent Kernels

Two types of kernel A native OS kernel running on application processors LDK (lightweight device kernel) running on device

processors