Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited...
Transcript of Sketching ( in ) Hardware · Verilog clumsy and longwinded minimal abstraction Simulink limited...
Sketching ( in ) Hardware
Jonathan Bachrach +Huy Vo + Andrew Waterman + Christopher Celio
Patrick Li + Ben Keller + Palmer Dabbelt +Sebastian Mirolo + John Wawrzynek + Krste Asanovic +
many more
faculty @ EECS UC Berkeleycofounder @ Otherlab
July 21, 2013
I Have a Hardware Sketching Dream 1
i want to sketcharbitrary hardware building blocksbigger blocks from smaller blocksall the down to digital logic
pwmradio cpu
r/cservo usb
i2cmemctlreth
quaddec-oder
Sketching All The Way Down 2
Can sketch both audio scripts and enginesCan delay decision of what’s script and what’s engine
Audio Scripting
Audio Engine
DSP Code
Can Sketch Truly Reusable Modules 3
sketch as succinct specification as generatorparameterized by numbers, types, functionsabstract data typesprocedural construction
Open Source and Networkable 4
open sourcecomplete library of all componentsapt-get interfacecommon interface
pwm
radio
cpu
r/cservo
lcddriver
usb
i2c memctlr
eth filter
quaddec-oder
accel-erator
Want Powerful + Inexpensive Logic Substrate 5
=>
eat
sub
andmux
not
rnd
mux
or
rnd
not
ltand
add
reg
add eq
add
lt
sub
and
muxreg
rnd mux
add eat
=>
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat
fast clock ratesscalable parallelismfast compilationautomatically mappedlogic, blocks, chipssketchable
State of Art 6
Specification
Ctoo high levelnot enough parallelism
Verilogclumsy and longwindedminimal abstraction
Simulinklimited parameterizationWYSIWYG wiring
limited reusability!lots of manual steps!
Realization
Network of DSPslimited hardware choiceshard to meet timing
FPGAslow to compile forno virtualization
ASICcomplexexpensive
tedious to programslow to compile
today 7
chiseldesign hw like softwaresoup to nuts
DREAMERnew highly programmable hardware fabricfast, cheap and scalable
Chisel is ... 8
Best of hardware and softwaredesign ideasEmbedded within Scala languageto leverage mindshare andlanguage designNot Scala -> VerilogAlgebraic construction and wiringHierarchical, object oriented, andfunctional constructionAbstract data types and interfacesBulk connectionsMultiple targets
Simulation and synthesisMemory IP is target-specific
single source
CPUC++
FPGAVerilog
ASICVerilog
Chisel
multiple targets
The Scala Programming Language 9
Compiled to JVMGood performanceGreat Java interoperabilityMature debugging, execution environments
Object OrientedFactory Objects, ClassesTraits, overloading etc
FunctionalHigher order functionsAnonymous functionsCurrying etc
ExtensibleDomain Specific Languages (DSLs)
Primitive Datatypes 10
Chisel has 3 primitive datatypesUInt – Unsigned IntegerSInt – Signed IntegerBool – Boolean value
Can do arithmetic and logic with these datatypes
Example Literal Constructions
val sel = Bool(false)
val a = UInt(25)
val b = SInt(-35)
where val is a Scala keyword used to declare variables whose valueswon’t change
Aggregate Data Types 11
Bundle
User-extendable collection of values with named fieldsSimilar to structs
class MyFloat extends Bundle {
val sign = Bool()
val exponent = UInt(width=8)
val significand = UInt(width=23)
}
Vec
Create indexable collection of valuesSimilar to array
val myVec = Vec(5){ SInt(width=23) }
Abstract Data Types 12
The user can construct new data typesAllows for compact, readable code
Example: Complex numbersUseful for FFT, Correlator, other DSPDefine arithmetic on complex numbers
class Complex(val real: SInt, val imag: SInt)
extends Bundle {
def + (b: Complex): Complex =
new Complex(real + b.real, imag + b.imag)
...
}
val a = new Complex(SInt(32), SInt(-16))
val b = new Complex(SInt(-15), SInt(21))
val c = a + b
Example 13
class GCD extends Module {
val io = new Bundle {
val a = UInt(INPUT, 16)
val b = UInt(INPUT, 16)
val z = UInt(OUTPUT, 16)
val valid = Bool(OUTPUT) }
val x = Reg(resetVal = io.a)
val y = Reg(resetVal = io.b)
when (x > y) {
x := x - y
} .otherwise {
y := y - x
}
io.z := x
io.valid := y === UInt(0)
}
GCD
Bool
UFix
valid
z
UFix
UFix
b
a
Valid Wrapper 14
class Valid[T <: Data](dtype: T) extends Bundle {
val data = dtype.clone
val valid = Bool()
override def clone = new Valid(dtype)
}
class GCD extends Module {
val io = new Bundle {
val a = UInt(INPUT, 16)
val b = UInt(INPUT, 16)
val out = new Valid(UInt(OUTPUT, 16))
} }
...
io.out.data := x
io.out.valid := y === UInt(0)
}
Bool
T
valid
data
Function Filters 15
abstract class Filter[T <: Data](dtype: T) extends Module {
val io = new Bundle {
val in = new Valid(dtype).asInput
val out = new Valid(dtype).asOutput
} }
class FunctionFilter[T <: Data](f: T => T, dtype: T) extends Filter(dtype) {
io.out.valid := io.in.valid
io.out := f(io.in)
}
Bool
UFix
valid
data
Bool
UFix
valid
data
f
Clipping Filter 16
def clippingFilter[T <: Num](limit: Int, dtype: T) =
new FunctionFilter(min(limit, max(-limit, _)), dtype)
Bool
UFix
valid
data
Bool
UFix
valid
data
clip
Shifting Filter 17
def shiftingFilter[T <: Num](shift: Int, dtype: T) =
new FunctionFilter(_ >> shift, dtype)
Bool
UFix
valid
data
Bool
UFix
valid
data
shift
Chained Filter 18
class ChainedFilter[T <: Num](dtype: T) extends Filter(dtype) = {
val shift = new ShiftFilter(2, dtype)
val clipper = new ClippingFilter(1 << 7, dtype)
io.in <> shift.io.in
shift.io.out <> clipper.io.in
clipper.io.out <> io.out
}
Bool
UFix
valid
data
Bool
UFix
valid
data
Shift
Bool
UFix
valid
data
Bool
UFix
valid
data
clip
Bool
UFix
valid
Bool
UFix
valid
datadata
Functional Composition 19
Map(ins, x => x * y)
* y
* y
* y
ins[0]
ins[1]
ins[2]
Chain(n, in, x => f(x))
f f fin
Reduce(ins, Max)
Max
Max
Max
ins[0]ins[1]
ins[2]
ins[3]
Generator 20
def delays[T <: Data](x: T, n: Int): List[T] =
if (n <= 1) List(x) else x :: taps(Reg(x), n-1)
def FIR[T <: Num](hs: Seq[T], x: T): T =
(hs, delays(x, hs.length)).zipped.map( _ * _ ).reduce( _ + _ )
class TstFIR extends Filter(SInt(width = 8)) {
val io = new Bundle{ val x = SInt(INPUT, 8); val y = SInt(OUTPUT, 8) }
val h = Array(SInt(1), SInt(2), SInt(4))
io.y := FIR(h, io.x)
}
Chisel Audio Support 21
Flo and Dbl data types and opsAdd FP support in C++ backendAudio harness with mics, speakers, and controls
Emulated Korg Monotron 22
Monotron is a portable classic analog synthBuilt out of SawWave, LFO, mixer, and VCFUse laptop / C++ for emulationUse BCF-2000 USB based mixer for controls
Chiseled Korg Monotron 23
class Monotron extends Module {
val io = new Bundle {
val swof = Dbl(INPUT);
val lfof = Dbl(INPUT); val lfoi = Dbl(INPUT);
val vcfc = Dbl(INPUT); val vcfq = Dbl(INPUT);
val out = Dbl(OUTPUT);
}
val lfo = io.lfoi * SawWave(io.lfof);
val vco = SawWave(io.swof + lfo)
val vcf = VCF(io.vcfc, io.vcfq, vco);
io.out := vcf
}
LFO VCO VCF*
LFOI
LFOF
SWOF VCFC VCFQ
+
Wiring All The Way Down 24
Can write both audio scripts and engines in ChiselCan choose which part is baked into hardwareFor example, can map entire DSP to FPGA or ASIC
Audio Scripting
Audio Engine
DSP Code
Chisel Graph Execution on DREAMER 25
spatial fabric of graph execution tilesmap piece of graph to each corehave network route intertile dataflow valuesuse dataflow scheduling to hide latencycoarser grained high level chisel instructions
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat
DREAMER Workflow 26
=>
eat
sub
andmux
not
rnd
mux
or
rnd
not
ltand
add
reg
add eq
add
lt
sub
and
muxreg
rnd mux
add eat
=>chisel graph netlist
=>
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat =>
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat
netlist layout execution
DREAMER Properties 27
efficient to compile to – 10-100x faster than FPGAefficient to run – nearly as fast as FPGAsquick to probe any signal – no recompile necessaryeasily scalable – multiple chipseasy to map large designs – auto FAME + nice DRAM interface
additional facilitiesdebugging and tracingactivity counters for energyfault injection
eat
sub
and
mux
not rnd
mux
or
rnd
not
lt
and
addreg
add
eq add
lt
sub and
muxreg
rnd
mux
add
eat
FPGA Mapping Opportunity 28
FPGAs have great density and economies of scaleprogram FPGA with DREAMER oncethen throw away Xilinx toolsmatch DSP + BRAM densitymap to few BRAMs using port schedulingdouble pump BRAM for extra ports
BRAM
DSP
LUTs
Registers
DSP
dreamer.scala bitstream
cpu.scala
Zynq
DREAMERZynq
cpu emulator
cpu.dm
dreamer.vchisel xilinx tools
chisel
Chisel is Real 29Digital Circuits Written in Chisel
Chisel is Open Source 30
chisel.eecs.berkeley.edu
BSD Licensecomplete set of documentationone goal is creation of library of high level and reusable components
Chisel Contains Library of Modules 31
queues, pipe,prioritymux, decoders, encoders,fixed-priority arbiters, round-robin arbiters,popcount, scoreboardsROMs, RAMs, CAMs, TLB, caches, prefetcher,integer ALUs, LFSR, Booth multiplier, iterative dividerIEEE-754/2008 floating-point units
RISC-V 32
fifth Berkeley RISC ISAopen source specificationfast functional simulatorboots linuxlots of open source implementations
Teaching Computer Architecture with Sodor 33
+4
Instruction Mem
RegFile
IType SignExtend
DecoderData Mem
ir[21:17]
branchpc+4
pc_s
el
ir[21:10]
rs1
ALU
ControlSignals
wb_
sel
RegFile
rf_w
en
val
mem
_rw
PC
tohosttestrig_tohost
cpr_en
mem
_val
addrwdata
rdata
Inst
JumpTargGen
BranchTargGen
ir[26:22]
ir[31:27],ir[16:10]
PC+4jalr
12
rs2
BranchCondGen
br_eq?br_lt?
co-p
roce
ssor
regi
ster
s ir[31
:27]
jump
ir[26:7]
wa_sel
Execute Stage
br_ltu?
1
PC
addr
BType Sign Extend
ir[31:7]
JumpRegTargGen
Op2Sel
Op1SelAluFun
data
wa
wd
en
addr data
1 stage
+4
Instruction Mem
RegFile
IType Sign Extend
DecoderData Mem
ir[21:17]
branchpc+4
pc_s
el
ir[21:10]
rs1
ALU
ControlSignals
wb_
sel
RegFile
rf_w
en
val
mem
_rw
PC
tohosttestrig_tohost
cpr_en
mem
_val
addrwdata
rdata
nop
if_ki
ll
IR
JumpTargGen
BranchTargGen
ir[26:22]
ir[31:27],ir[16:10]
PC+4
jalr
12
rs2
BranchCondGen
br_eq?br_lt?
co-p
roce
ssor
regi
ster
s ir[31
:27]
jump
ir[26:7]
wa_sel
Fetch Stage Execute Stage
br_ltu?
1
PC
addr
BType Sign Extend
ir[31:7]
JumpRegTargGen
Op2Sel
Op1SelAluFun
data
wa
wd
en
addr data
2 stage
+4
Instruction Mem
RegFile
IType Sign Extend
ir[26:22]
br or jmp
pc+4
pc_s
el
ir[21:10]
Decoder
val
PC
tohosthtif_tohost
cpr_en
Data Mem
mem
_rw
mem
_val
addrwdata rdata
bubble
if_kill
IR
ir[31:27],ir[16:10]
jalr
rf_rs2
ir[26:7]
Decode Stage
BranchCondGen
br_eq?br_lt?br_ltu?
PC
addr
BType Sign Extend
ir[31:7]
Op2SelALU
AluFun
data Reg
File
rf_w
en
ir[31
:27]
wa_sel
1
wa
wd
en
addr data
PC
RS2
OP2
RS1ALUOUT WBData
RS2
RS1
rf_rs1
Execute Stage Memory Stage Writeback StageFetch Stage
pc+4
Ctrl
ir[21:17]
ControlSignalsbubble
dec_kill
}
+
Branch & JumpTargGen
<< 1
JType Sign Extend
LType Sign Extend
<< 12
adder
wb_
sel
wb_
sel
co-p
roce
ssor
regi
ster
s
+4
bypa
sses
5 stage
microcode
out of order
UC Berkeley Classes 34
2x CS152 – Undergraduate Computer ArchitectureSodorMulticore and Vector
2x CS250 – VLSI System DesignProcessorsImage Processing
1x CS294-88 – Declarative Design SeminarHigh Level SpecificationAutomated Design Space Exploration
Outside Projects 35
NOC generator – MSRMonte Carlo Simulator – TU KaiserslauternPrecision Timed Machine (PRET) – Edward Lee’s GroupChisel-Q – Quantum Backend – John Kubiatowicz’s Group
Conclusions 36
sketching all the way downpowerful new hardware substratetruly open source reusable hardwareprintable electronics ready
fundingProject Isis: DoE Award DE-SC0003624.Par Lab: Microsoft (Award #024263) and Intel (Award #024894)funding and by matching funding by U.C. Discovery (Award#DIG07-10227). Additional support came from Par Lab affiliatesNokia, NVIDIA, Oracle, and Samsung.ASPIRE: DARPA PERFECT program, Award HR0011-12-2-0016.