Threads and multi threading

50
Threads Cesarano Antonio Del Monte Bonaventura Università degli studi di Salerno 7th April 2014 Operating Systems II

Transcript of Threads and multi threading

Page 1: Threads and multi threading

Threads

Cesarano AntonioDel Monte Bonaventura

Università degli studi di Salerno

7th April 2014

Operating Systems II

Page 2: Threads and multi threading

Agenda

Introduction Threads models Multithreading: single-core Vs

multicore Implementation A Case Study Conclusions

Page 3: Threads and multi threading

CPU Trends

Page 4: Threads and multi threading

IntroductionWhat’s a Thread?

Page 5: Threads and multi threading

Memory: Heavy vs Light processes

Introduction

Page 6: Threads and multi threading

Why should I care about Threads?

Pro• Responsiveness• Resources

sharing• Economy• Scalability

Cons• Hard implementation• Synchronization• Critical section,

deadlock, livelock…

Introduction

Page 7: Threads and multi threading

Thread Models

Two kinds of Threads

User Threads

Kernel Threads

Page 8: Threads and multi threading

Thread ModelsUser-level Threads

Implemented in software library Pthread Win32 API

Pro:• Easy handling• Fast context switch• Trasparent to OS• No new address space, no need to change address space

Cons:• Do not benefit from multithreading or multiprocessing• Thread blocked

Process blocked

Page 9: Threads and multi threading

Thread Models Kernel-level

Threads Executed only in kernel mode, managed by OS Kthreadd children

Pro:• Resource Aware• No need to use a new address space• Thread blocked

Scheduled

Con:• Slower then User-threads

Page 10: Threads and multi threading

Thread Models

Thread implementation models:From many to oneFrom one to oneFrom many to many

Page 11: Threads and multi threading

Thread ModelsFrom many to one

Whole process is blocked if one thread is blocked Useless on multicore architectures

Page 12: Threads and multi threading

Thread ModelsFrom one to one

Works fine on multicore architectureso Many kernel threads = High overhead

Page 13: Threads and multi threading

Thread ModelsFrom many to many

Works fine on multicore architectures Less overhead then “one to one” model

Page 14: Threads and multi threading

MultithreadingMultitasking

Single core Symmetric Multi-Processor

Page 15: Threads and multi threading

MultiThreading

Multithreading

Page 16: Threads and multi threading

Multithreading

HyperThreading

Page 17: Threads and multi threading

Multithreading

Page 18: Threads and multi threading

How can We use multithreading architectures?

Thread Level

Parallelism

Data Level

Parallelism

Multithreading

Page 19: Threads and multi threading

Thread Level ParallelismMultithreading

Page 20: Threads and multi threading

Data Level ParallelismMultithreading

Page 21: Threads and multi threading

Granularity Coarse-grained:

Multithreading

Context switch on high latency event

Very fast thread-switching, no threads slow

down

Loss of throughput due to short stalls:

pipeline start-up

Page 22: Threads and multi threading

Granularity Fine-grained

Multithreading

Context switch on every cycle Interleaved execution of multiple threads: it can hide both short and long stalls

Rarely-stalling threads are slowed down

Page 23: Threads and multi threading

GranularityMultithreading

Page 24: Threads and multi threading

Context SwitchingSingle-core Vs Multi-core

Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret

CPUESP

Thread 1regs

Thread 2

registers

Thread 1 TCB

SP: ....

Thread 2 TCB

SP: ....

Running Ready

Page 25: Threads and multi threading

Pushing old contextSingle-core Vs Multi-core

Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret

CPUESP

Thread 1regs

Thread 2

registers

Thread 1 TCB

SP: ....

Thread 2 TCB

SP: ....

Thread 1

registers

Running Ready

Page 26: Threads and multi threading

Saving old stack pointerSingle-core Vs Multi-core

Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret

CPUESP

Thread 1regs

Thread 2

registers

Thread 1 TCB

SP: ....

Thread 2 TCB

SP: ....

Thread 1

registers

Running Ready

Page 27: Threads and multi threading

Changing stack pointerSingle-core Vs Multi-core

Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret

CPUESP

Thread 1regs

Thread 2

registers

Thread 1 TCB

SP: ....

Thread 2 TCB

SP: ....

Thread 1

registers

Ready Running

Page 28: Threads and multi threading

Popping off thread #2 old context

Single-core Vs Multi-core

Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret

CPUESP

Thread 2 regs

Thread 1 TCB

SP: ....

Thread 2 TCB

SP: ....

Thread 1

registers

Ready Running

Page 29: Threads and multi threading

Done: returnSingle-core Vs Multi-core

Xthread_ctxtswitch: pusha movl esp, [eax] movl edx, esp popa ret

CPUESP

Thread 2 regs

Thread 1 TCB

SP: ....

Thread 2 TCB

SP: ....

Thread 1

registers

Ready Running

RET pops of the returning address and it assigns its value to PC reg

Page 30: Threads and multi threading

Problems

Critical Section:When a thread A tries to access to a shared variable simultaneously to a thread B

Deadlock:When a process A is waiting for resource reserved to B, which is waiting for resource reserved to A

Race Condition: The result of an execution depens on the order of execution of different threads

Page 31: Threads and multi threading

More Issues

fork() and exec() system calls: to duplicate or to not deplicate all threads?

Signal handling in multithreading application.

Scheduler activation: kernel threads have to communicate with user thread, i.e.: upcalls

Thread cancellation: termination a thread before it has completed.

• Deferred cancellation

• Asynchronous cancellation: immediate

Page 32: Threads and multi threading

Designing a thread library

Multiprocessor support

Virtual processor

RealTime support

Memory Management

Provide functions library rather than a module

Portability

No Kernel mode

Page 33: Threads and multi threading

Implementation

Posix Thread Posix standard for threads: IEEE POSIX

1003.1c Library made up of a set of types and

procedure calls written in C, for UNIX platform

It supports:a) Thread management b) Mutexesc) Condition Variablesd) Synchronization between threads

using R/W locks and barries

Page 34: Threads and multi threading

Implementation

Thread Pool Different threads available in a pool

When a task arrives, it gets assigned to a free thread

Once a thread completes its service, it returns in the pool and awaits another work.

Page 35: Threads and multi threading

ImplementationPThred Lib base operations

pthread_create()- create and launch a new thread

pthread_exit()- destroy a running thread

pthread_attr_init()- set thread attributes to their default values

pthread_join()- the caller thread blocks and waits for another thread to finish

pthread_self()- it retrieves the id assigned to the calling thread

Page 36: Threads and multi threading

Implementation Example

N x N Matrix Multiplication

Page 37: Threads and multi threading

Implementation Example

A simple algorithmfor (int i = 0; i < MATRIX_ELEMENTS; i += MATRIX_LINE){ for (int j = 0; j < MATRIX_LINE; ++j) {

float tmp = 0;for (int k = 0; k < MATRIX_LINE; k++){

tmp += A[i + k] * B[(MATRIX_LINE * k) + j];

}C[i + j] = tmp;

}}

Page 38: Threads and multi threading

Implementation Example

SIMD Approachtranspose(B);for (int i = 0; i < MATRIX_LINE; i++) { for (int j = 0; j < MATRIX_LINE; j++){ __m128 tmp = _mm_setzero_ps(); for (int k = 0; k < MATRIX_LINE; k += 4){ tmp = _mm_add_ps(tmp, _mm_mul_ps(_mm_load_ps(&A[MATRIX_LINE * i + k]), _mm_load_ps(&B[MATRIX_LINE * j + k]))); } tmp = _mm_hadd_ps(tmp, tmp); tmp = _mm_hadd_ps(tmp, tmp); _mm_store_ss(&C[MATRIX_LINE * i + j], tmp); }}transpose(B);

Page 39: Threads and multi threading

Implementation Example

TLP Approachstruct thread_params {

pthread_t id;

float* a;

float* b;

float* c;int low;int high;

bool flag;

};………

int main(int argc, char** argv){ int ncores=sysconf(_SC_NPROCESSORS_ONLN); int stride = MATRIX_LINE / ncores; for (int j = 0; j < ncores; ++j){

pthread_attr_t attr; pthread_attr_init(&attr); thread_params* par = new thread_params; par->low=j*stride; par->high=j*stride+stride; par->a = A; par->b = B; par->c = C; pthread_create(&(par->id), &attr, runner, par); // set cpu affinity for thread // sched_setaffinity

}}

Page 40: Threads and multi threading

Implementation Example

TLP Approachint main(int argc, char** argv){….int completed = 0;while (true) { if (completed >= ncores) break; completed = 0; usleep(100000); for (int j=0; j<ncores; ++j){ if (p[j]->flag) completed++;

}}….}

void runner(void* p){thread_params* params = (thread_params*) p;int low = params->low; // unpack others valuesfor (int i = low; i < high; i++) {

for (int j = 0; j < MATRIX_LINE; j++){

float tmp = 0;

for (int k = 0; k < MATRIX_LINE; k++){ tmp +=

A[MATRIX_LINE * i + k] * B[(MATRIX_LINE * k) + j]; } C[i + j] = tmp; }}params->flag = true;pthread_exit(0);}

Page 41: Threads and multi threading

Implementation Performance

Simple SIMD TLP SIMD&TLP0

1000

2000

3000

4000

5000

6000

7000

8000

9000

8 cores4 cores

Page 42: Threads and multi threading

A case study

Using threads in Interactive Systems

• Research by XEROX PARC Palo Alto

• Analysis of two large interactive system: Cedar and GVX

• Goals: i. Identifing paradigms of thread usageii. architecture analysis of thread-based

environmentiii. pointing out the most important properties of

an interactive system

Page 43: Threads and multi threading

A case studyThread model

Mesa language

Multiple, lightweight, pre-emptively scheduled threads in shared address space, threads may have different priorities

FORK, JOIN, DETACH

Support to conditional variables and monitors: critical sections and mutexes

Finer grain for locks: directly on data structures

Page 44: Threads and multi threading

A case study

Three types of thread

1. Eternal: run forever, waiting for cond. var.

2. Worker: perform some computation

3. Transient: short life threads, forked off by long-lived threads

Page 45: Threads and multi threading

A case study

Dynamic analysis

Cedar GVX0

5

10

15

20

25

30

35

40

45

# threads idle

Fork rate max

# threads max

Switching intervals: (130/sec, 270/sec) vs. (33/sec, 60/sec)

Page 46: Threads and multi threading

A case study

Paradigms of thread usage Defer Work: forking for reducing latency

print documents

Pumps or slack processes: components of pipeline Preprocessing user input Request to X server

Sleepers and one-shots: wait for some event and then execute Blink cursor Double click

Deadlock avoiders: avoid violating lock order constraint Windows repainting

Page 47: Threads and multi threading

A case study

Paradigms of thread usage Task rejuvenation: recover a service from a bad

state, either forking a new thread or reporting the erroro Avoid fork overhead in input event dispatcher

of Cedar

Serializers: thread processing a queueo A window system with input events from many

sources

Concurrency exploiters: for using multiple processors

Encapsulated forks: a mix of previous paradigms, code modularity

Page 48: Threads and multi threading

A case study

Common Mistakes and Issueso Timeout hacks for compensate missing

NOTIFY

o IF instead of WHILE for monitors

o Handling resources consumption

o Slack processes may need hack YieldButNotToMe

o Using single-thread designed libraries in multi-threading environment: Xlib and XI

o Spurious lock

Page 49: Threads and multi threading

A case study

Xerox scientists’ conclusions

Interesting difficulties were discovered both in use and implementation of multi-threading environment

Starting point for new studies

Page 50: Threads and multi threading

Conclusion