Taking Household Methodologies and GALS to Scale (by Linda Mayoux)
Configuring a Large-Scale GALS System
description
Transcript of Configuring a Large-Scale GALS System
Configuring a Large-Scale GALS System
M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*,
J.V Woods*, J. Miguel-Alonso† and S.B. Furber*
*School of Computer Science, The University of Manchester, UK
†University of The Basque Country, Spain
SpiNNaker
• Objectives– High-performance– Robust– Low-power
SpiNNaker CMP
• System RAM
• Boot ROM
• MC Router
• Sys. Controller
• Ethernet
• SDRAM
• 20 Proc. Nodes
Processing Node
• ARM968E-S
• Comm. Ctlr.
• Interrupt Ctlr.
• DMA Ctlr.
• Timer
• TCM (100K)
Communication Network• MC Router• Packets
– MC– P2P– NN
• 1Gb/s inter-chip
• 6Gb/s per Node
• Six two-way inter-chip links
*L.A. Plana et al.An On-Chip and Inter-Chip Communications Network for the Spinnaker Massively-Parallel Neural Net Simulator. In Proc. Second ACM/IEEE
International Symposium on Networks-on-Chip (NoCS 2008), pages 215 – 216, 2008.
Performance• 64K CMPs
• > 1m ARM968
• 256 tera IPS computing power
• >8 TB memory
• 6 Gb/s/Node Comm. NoC (spike channel)
• 1 Gb/s System NoC (synaptic channel)
• 109 neurons in real-time
Fault-tolerance
• Redundancy
• Fault-detection and Isolation
• Fault-recovery
• Min. single-point-of-failure
• Run-time configuration
• Run-time recovery
• Run-time application loading
Low-power• Hardware
– Asynchronous Communication– Low-power ARM968
• Software– Asynchronous Event-Driven Model
Standard Application Model
• Sleepy processors
• Event-driven application
• No scheduler• No software
threads• Only ISRs• Driven by
Interrupts
Configuration Process-I
• Min Boot-ROM code
• POST+chip components initialization
• Batch mode
POST
Load Boot code in TCM
Select Monitor Proc.
Configure Interrupts
Monitor
Configure Chip
Go to Sleep
yes
no
Configuration Process-II
• Event-driven Model
• Real-time Configuration
• Processors on Sleep
Recovery
Host System Comm.
Assign (0, 0)
Status to Host Chip
Host Chip
Frame + Packet Comm.Packet Comm.
Acc. Status to Host
Assign (x, y)
Conf. RouterConf. Router
yesno
Flood-fill Mechanism• Event-driven
model
• Droplets of data block to origin chip(s)
• A pipelined wave of data from origin(s) to other chips
1 Ethernet Connection
2 Ethernet Connections
animations from http://physics-animations.com/Physics/English/int_ref.htm#Wlb
Flood-fill Mechanism
• Various Mechs.– Broadcast– 5 Chips fwd– 3 Chips fwd– 2 Chips fwd
• Performance Vs robustness
Evaluation
• SystemC system-level model
• Cycle-accurate• Instruction
accurate• 129706 cycles
for configuration process-I
Evaluationa) Impact of System Size
1Eth - 8 KB application + 16 KB data
200
400
600
800
32x32 64x64 128x128 256x256
cycl
es (
thou
sand
s)
2msg
3msg
5msg
bcast
Evaluation
0
5
10
15
20
32KB + 32KB 32KB + 64KB 32KB +128KB 32KB +256KB 32KB +512KB
Boo
ting
tim
e (M
illi
ons
of c
ycle
s) .
2msg
3msg
5msg
bcast
b) Impact of Data SizeApplication+Data
Conclusions