Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

49
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

Transcript of Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Page 1: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Survey of multicore architectures

Marko BertognaScuola Superiore S.Anna,

ReTiS Lab, Pisa, Italy

Page 2: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Summary

CELL processor Reconfigurable devices Software-Hardware co-design Parallel programming problems

data dependencies process synchronization memory barriers locking mechanisms

Language extensions for parallel programming

Real-time multiprocessor scheduling

Page 3: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell processor

A Cell Processor

Page 4: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell History

Page 5: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell basic concepts

Page 6: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell synergy

Page 7: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell Chip

Page 8: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell features

Page 9: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell Processor Components

Page 10: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell Processor Components

Page 11: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell Processor Components

Page 12: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell Processor Components

Page 13: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Synergistic Processor Element (SPE)

Page 14: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

SPE

Page 15: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

SPE details

Page 16: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Element Interconnect Bus (EIB)

Page 17: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

EIB: Data topology

Page 18: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Example: 8 concurrent transactions

Page 19: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Theoretical peak operations

Page 20: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Cell BE performance

Page 21: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Why is Cell Processor so fast?

Page 22: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

CELL software environment

Page 23: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

System Level Simulator

Page 24: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

SPE management library

Page 25: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

CELL parallelism

Page 26: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Typical CELL sw development flow

Page 27: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

ARM’s MPcore

Page 28: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

PicoArray (by PicoChip)

Page 29: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

PicoArray scaling

Page 30: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

FPGA and Reconfigurable devices

Page 31: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Field Programmable Gate Arrays

SRAM-based matrix of integrated elements whose interconnections can be programmed statically or even dynamically

Basic block is Logic Element (LE) Chip capacities from 1k to 1000k LEs Each LE is typically composed by logic

gates, LUTs, Flip-Flops and latches Need for optimized CAD or pre-binded

design libraries

Page 32: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

FPGA

CSL organization: Basic Logic Element:

Page 33: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Altera’s Stratix IV basic block Adaptive Logic Module (ALM)

Page 34: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Flexibility vs efficiency

Page 35: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Reconfigurable devices advantages

Efficiency AND Flexibility Time to market Easier upgrade Lower cost (on scale production) Reusable IP Customable interface

Page 36: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Reconfigurable devices parameters

Block granularity Coarse grained: Functional Units, Processor

Cores, Memory Tiles Fin grained: gate and register level

Density Reconfiguration time

Compile-Time Reconfiguration (CTR) Run-Time Reconfiguration (RTR)

Partial or Total reprogramming

Page 37: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Triscend’s A7S chip

Page 38: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Example: multiplier on Altera’s Stratix IV

Page 39: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Typical FPGA software development environment

FPGA optimized module library

IO Editor Generate file.h Bind (placement and

route) file.csl Config file.cfg Download

Page 40: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Typical FPGA module library

Page 41: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Altera’s Nios II

Nios II is a soft-core processor IP that can be downloaded into an Altera’s

FPGA, obtaining the functionalities of a real RISC CPU

Logic elements are programmed so as to behave like gates of classic ASIC processors

Different Nios versions are available faster and with full functionalities bigger size medium sized compact but slower and with limited

functionalities

Page 42: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Nios II core

Page 43: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Selecting Nios II e/s/f

Page 44: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Example of a Nios II Processor system

Page 45: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Final global layout

Page 46: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Soft-core processors and FPGAs

Possible to have multiple cores on a single chip

Customizable hardware can be used to coordinate the various cores

Build and test a whole multicore system in a faster time

Detect and solve bottlenecks without needing to repeatedly return to the integration phase

Page 47: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Co-design problems with FPGAs

A task may be executed by a (soft-core or ASIC) processor or may be entirely implemented in hardware on the reconfigurable logic

“Programming in Space” versus “Programming in Time”

Centralized vs Distributed computing Sequential vs Parallel programming Interconnect Network

Page 48: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

What is a task in hardware?

Software programming

c=a+b;

result=c/2;

Hardware implementation

a

b

c+

shifter

result

Assembler expansion:ldr r0,aldr r1,badd r0,r0,r1mov r0,LSR r0str r0,result

5 operations

All in one clock cycle!

Page 49: Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Conclusions

FPGAs are interesting devices for multicore systems developers

Valid benchmark upon which to compare classic serial programming methods and parallel computing approaches

Allow reducing time-to-market for next-generation multicore systems

Provide common platforms that can easily reproduce any architecture (given a proper VHDL/Verilog description)