Multithreaded processors ppt

15
Multithreaded Processors Paper presentation

Transcript of Multithreaded processors ppt

Multithreaded

ProcessorsPaper presentation

Goal

Utilization of coarser-grained parallelism by CMPs and multithreaded processors

Focus is on processors that are designed to simultaneously execute threads of

same or different processes.(explicit multithreaded processors)

Explicit multithreaded processors aim is to increase the performance(low

execution time) of a multiprogramming workload, while single threaded/implicit

multithreaded and superscalar processors increase performance of single

program.

CMP – Chip Multiprocessor(2 or more processors on a single chip)

Multithreaded processors- interleaves execution of different threads of control in

the same pipeline.

What is it?

●Notion of thread

● Different from a software application thread

● coarse-grained thread-level parallelism● Implies separate logical address space

●Implicit Multithreading

● Find multiple lines of execution in a single seq. program.

●Explicit multithreading

● Multiple PCs, register contexts

● Different from RISC processors

Why do we need it?

• ILP is limited

• Memory latency problem, covering up long latency cycles by useful work.

• Div and branch interlocking. Covering up idle time of CPU

• Latency: primary cache miss/2ndary cache miss

• Several enabled instructions from diff threads that may be candidates for

execution.

• Switching in a single threaded processor is costly!

• Idle hardware utilization

Multithreaded Processors –

Principle Approaches

●Techniques

● Fast context switch(how?)

●Interleaved multithreading technique

● Instruction from different threads every cycle

●Blocked multithreading technique

● Continues until an event occurs

●Simultaneous multithreading

● Simultaneously issue multiple instructions from multiple threads(Superscalar)

Taken from [2]. Survey of processors with explicit multithreading.

Interleaved multithreading(fine-

grained)• Processor switches to a different thread after each IF

• Context switch after every clock cycle

• Eliminates data and control hazards

• Improves overall performance(execution time)

• Requires at least as many threads as pipeline stages

• Single-thread performance degrades

• Two techniques to overcome this

• Dependence lookahead technique(CRAY MTA)

• Interleaving technique

CRAY MTA

• Interleaved multithreaded VLIW processor

• uses explicit look ahead technique

• 3 bits to encode

• Supports 128 distinct threads• Hides memory latency

• VLIW

• 64 bit instructions consists of 3 operations

• <M-op, A-op, C-op>- priority from high to low

Blocked multithreading(coarse-

grained)• Continues execution until a context switch is forced

• Single thread can proceed at full speed

• Lesser threads needed compared to interleaved multithreading

• Context switch events

• Switch-on-load

• Switch-on-store

• Switch-on-branch

• Switch-on-cache-miss

• Switch-on-signals(interrupts)

• Conditional switch

MIT Sparcle

• Context switch only during a remote cache miss

• Small latencies are taken care of by the compilers.

• Implementation of fast context switching

• Also uses multiple register contexts and PCs

Simultaneous multithreading(SMT)

• Mix of superscalar and multithreading technique

• All hardware contexts are active leading to competition

• Issue multiple instructions from multiple threads each cycle

• Both TLP and ILP comes into play

• Multiple slots for different threads are filled as well multiple

issue slots are filled.

• Resource organization

• Resource sharing

• Resource replication

SMT Alpha 21164 processor

• Simulations conducted on 8 threaded 8-issue

superscalar

• 3 Floating point units and 6 integer units are

assumed

• Fetch policy

• Throughput

• 6.64 IPC on SPEC92 benchmark

Taken from [2]. Survey of processors with explicit multithreading.

Comparison

Chip Multiprocessors

1. Multiple processors on a single

chip

2. Every unit is duplicated and

works independently

3. Latency problem remains in

multiple issue cycles.

4. Every part of a processor is

duplicated so easier to

implement.

Multithreaded Processors

1. Multithreading comes into play

2. multiple threads under execution

so multiple PCs and registers

3. Latencies arising in one stream

are filled by another thread unlike

RISC architectures.

4. Hardware either shared or

replicated so complex.

References

1. Theo Ungerer, Borut Robic and Jurij Silc.

(2002) Multithreaded Processors in The

Computer Journal, Vol. 45 No. 3.

2. Theo Ungerer, Borut Robic and Jurij Silc.

(2003) A survey of Processors with Explicit

Multithreading in ACM Computing Surveys, Vol.

35 No. 1, March 2003, pp. 29-63.