Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March...

17
Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007

Transcript of Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March...

Page 1: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

Cell Broadband Engine Architecture

Cell Broadband Engine Architecture

Bardia MahjourENCM 515

March 2007

Bardia MahjourENCM 515

March 2007

Page 2: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

AgendaAgenda

Introduction History Applications Architecture Features Some Statistics Programming Model CBEA as DSP Comparison with TigerSHARC Conclusion

Introduction History Applications Architecture Features Some Statistics Programming Model CBEA as DSP Comparison with TigerSHARC Conclusion

Page 3: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

IntroductionIntroduction Single Chip Multi-processor 9 processors built into a single die

Needs that arose in areas such as:

Cryptography Graphics transformations and lighting Physics Fast-Fourier Transforms (FFT) Matrix operations Scientifically compute-intensive tasks

Goals: power-efficient cost-effective high-performance processing wide range of applications including game consoles.

IBM XL Family of compilers (XL C/C++)

Single Chip Multi-processor 9 processors built into a single die

Needs that arose in areas such as:

Cryptography Graphics transformations and lighting Physics Fast-Fourier Transforms (FFT) Matrix operations Scientifically compute-intensive tasks

Goals: power-efficient cost-effective high-performance processing wide range of applications including game consoles.

IBM XL Family of compilers (XL C/C++)

Cell die photo courtesy of Thomas Way, IBM Burlington

Page 4: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

HistoryHistory

A joint venture by Sony, Toshiba, and IBM (STI)

Official Design phase started in March of 2001

Three giant companies spent 4 years and US$400M to design and develop Cell

First Commercial Use in Sony’s PlayStation 3 in November 2006.

A joint venture by Sony, Toshiba, and IBM (STI)

Official Design phase started in March of 2001

Three giant companies spent 4 years and US$400M to design and develop Cell

First Commercial Use in Sony’s PlayStation 3 in November 2006.

Page 5: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

ApplicationsApplications Console Video Games

PlayStation 3

Home Cinema Toshiba’s HDTV

Embedded Applications Medical Imaging, aerospace, telecommunication, defense, etc. Mercury Computer Systems, Inc.

Super Computing Roadrunner

Blade Servers

Console Video Games PlayStation 3

Home Cinema Toshiba’s HDTV

Embedded Applications Medical Imaging, aerospace, telecommunication, defense, etc. Mercury Computer Systems, Inc.

Super Computing Roadrunner

Blade Servers

Page 6: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

ArchitectureArchitecture

PowerPC Processor Element (PPE) - 64-bit PowerPC RISC core (can run OS)

Synergistic Processor Elements (SPEs) - Each element is a DSP processor. CBEA has 8 of them!

Element Interconnect Bus (EIB)

Memory Interface Controller (MIC)

Cell Broadband Engine Interface (BEI)

PowerPC Processor Element (PPE) - 64-bit PowerPC RISC core (can run OS)

Synergistic Processor Elements (SPEs) - Each element is a DSP processor. CBEA has 8 of them!

Element Interconnect Bus (EIB)

Memory Interface Controller (MIC)

Cell Broadband Engine Interface (BEI)

Page 7: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.
Page 8: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

FeaturesFeatures

PPE has a pipeline 10 levels deep

Each SPE has: a 128x128 register file a floating-point unit two fixed-point units VMX vector arithmetic unit Local Store DMA controller

PPE has a pipeline 10 levels deep

Each SPE has: a 128x128 register file a floating-point unit two fixed-point units VMX vector arithmetic unit Local Store DMA controller

Page 9: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

Some StatisticsSome Statistics

Observed clock speed: > 4 GHz

Peak performance (single precision): > 256 Gflops

Peak performance (double precision): >26 GFlops

Local storage size per SPU: 256KB

Area: 221 mm²

Technology 90nm SOI

Total number of transistors: 234M

Observed clock speed: > 4 GHz

Peak performance (single precision): > 256 Gflops

Peak performance (double precision): >26 GFlops

Local storage size per SPU: 256KB

Area: 221 mm²

Technology 90nm SOI

Total number of transistors: 234M

Page 10: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

Programming ModelProgramming Model

Function Offload Model

Device Extension Model

Computational Acceleration Model

Streaming Models

Shared-memory Multi-processor Model

Asymmetric Thread Runtime Model

User-Mode Thread Model

SPE Overlay

Function Offload Model

Device Extension Model

Computational Acceleration Model

Streaming Models

Shared-memory Multi-processor Model

Asymmetric Thread Runtime Model

User-Mode Thread Model

SPE Overlay

Page 11: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

Function Offload Model

Function Offload Model

Remote Procedure Call (RPC)Remote Procedure Call (RPC)

Page 12: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

/* file hello.idl */

interface greeting{[sync] idl_id_t hello ([in] int nbytes, [in, size_is(nbytes)]

char message[]);}

/* file hello.c */

#include <stub.h>

int main( ){

char* str = “Hi, from the Cell!”;

hello( strlen(str), str);

}

/* file spu_hello.c */

#include <stdio.h>

#include <stub.h>

idl_id_t hello( int nbytes, char msg[]) {

printf(“SPE: %s\n”, ms);

return 0;

}

/* file hello.idl */

interface greeting{[sync] idl_id_t hello ([in] int nbytes, [in, size_is(nbytes)]

char message[]);}

/* file hello.c */

#include <stub.h>

int main( ){

char* str = “Hi, from the Cell!”;

hello( strlen(str), str);

}

/* file spu_hello.c */

#include <stdio.h>

#include <stub.h>

idl_id_t hello( int nbytes, char msg[]) {

printf(“SPE: %s\n”, ms);

return 0;

}

Function Offload Model

Function Offload Model

Page 13: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

Thread Runtime Model

Thread Runtime Model

speid_t spe_create_thread( spe_gid_t gid, spe_program_handle_t

*spe_program_handle,void *argp, void

*envp, unsigned long *mask, int flags );

Example PPE Code:

#include <libspe.h>

#define NUM_SPES 8

extern spe_program_handle_t spe_code;

int main( ) {

for (i = 0; i < NUM_SPES; i++)

spe_ids[i] = spe_create_thread(gid,&spe_code,

NULL, NULL, -1, 0);

return 0;

}

speid_t spe_create_thread( spe_gid_t gid, spe_program_handle_t

*spe_program_handle,void *argp, void

*envp, unsigned long *mask, int flags );

Example PPE Code:

#include <libspe.h>

#define NUM_SPES 8

extern spe_program_handle_t spe_code;

int main( ) {

for (i = 0; i < NUM_SPES; i++)

spe_ids[i] = spe_create_thread(gid,&spe_code,

NULL, NULL, -1, 0);

return 0;

}

Page 14: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

CBEA as DSPCBEA as DSPStrictly speaking : Cell is a microprocessor

Designed to bridge the gap between conventional and special-purpose processors

Handles heavy digital signal processing workloads ( 3D graphics, 48 MPEG-2 Channels, etc. )

Meets most of the ideal DSP processor requirements

Strictly speaking : Cell is a microprocessor

Designed to bridge the gap between conventional and special-purpose processors

Handles heavy digital signal processing workloads ( 3D graphics, 48 MPEG-2 Channels, etc. )

Meets most of the ideal DSP processor requirements

Page 15: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

Comparison with TigerSHARC

Comparison with TigerSHARC

Size requirement

Power consumption and heat

generation

Supports floating-point ops in

hardware

Bandwidth and data-width

Avoids resource dependencies

Scalability

Ease of programming

Size requirement

Power consumption and heat

generation

Supports floating-point ops in

hardware

Bandwidth and data-width

Avoids resource dependencies

Scalability

Ease of programming

Page 16: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

ConclusionConclusion

Cell Broadband Engine Architecture is an extremely powerful, scalable and fast processor. It is not purely a digital signal processor, however, the wide range of applications it is suited for includes DSP. Furthermore, many of the requirements of DSP applications were the rationale behind CBEA’s design and architectural decisions.

Cell Broadband Engine Architecture is an extremely powerful, scalable and fast processor. It is not purely a digital signal processor, however, the wide range of applications it is suited for includes DSP. Furthermore, many of the requirements of DSP applications were the rationale behind CBEA’s design and architectural decisions.

Page 17: Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.

ReferencesReferences[1] IBM Research, The Cell Architecture, Innovation Matters.

Available at http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html

Accessed Feb 19th, 2007

[2] IBM Systems and Technology Group, Cell Broadband Engine Programming Tutorial Version 2.0, December 15, 2006

[3] Wikipedia , Cell Microprocessor Implementations.

Available at http://en.wikipedia.org/wiki/Cell_microprocessor_implementations - endnote_sti32nm

Accessed Feb 20th, 2007

[4] Signalogic 1995-2007, DSP Applications.

Available at http://www.signalogic.com/index.pl?page=dsp_app#WhatDSP

Accessed Feb 21st, 2007

[5] Wikipedia , Cell Microprocessor.

Available at http://en.wikipedia.org/wiki/Cell_Broadband_Engine

Accessed Feb 22nd, 2007

[6] IBM Journal of Research and Development, Introduction to the Cell multiprocessor (September 7, 2005) Available at http://researchweb.watson.ibm.com/journal/rd/494/kahle.html

[7] Smith, M. R. (1992). How RISCy is DSP? Micro, IEEE, Volume 12, Issue 6, 10-22.

[8] Analog Devices Inc. One Technology Way, ADSP-TS201 TigerSHARC Processor Programming Reference, Version 1.1, April 2005

[1] IBM Research, The Cell Architecture, Innovation Matters.

Available at http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html

Accessed Feb 19th, 2007

[2] IBM Systems and Technology Group, Cell Broadband Engine Programming Tutorial Version 2.0, December 15, 2006

[3] Wikipedia , Cell Microprocessor Implementations.

Available at http://en.wikipedia.org/wiki/Cell_microprocessor_implementations - endnote_sti32nm

Accessed Feb 20th, 2007

[4] Signalogic 1995-2007, DSP Applications.

Available at http://www.signalogic.com/index.pl?page=dsp_app#WhatDSP

Accessed Feb 21st, 2007

[5] Wikipedia , Cell Microprocessor.

Available at http://en.wikipedia.org/wiki/Cell_Broadband_Engine

Accessed Feb 22nd, 2007

[6] IBM Journal of Research and Development, Introduction to the Cell multiprocessor (September 7, 2005) Available at http://researchweb.watson.ibm.com/journal/rd/494/kahle.html

[7] Smith, M. R. (1992). How RISCy is DSP? Micro, IEEE, Volume 12, Issue 6, 10-22.

[8] Analog Devices Inc. One Technology Way, ADSP-TS201 TigerSHARC Processor Programming Reference, Version 1.1, April 2005