Summary Of Course Projects
description
Transcript of Summary Of Course Projects
S E T I AWA N S O E KA M T O P U T RA
SUMMARY OF COURSE PROJECTS
M A S T E R O F E L E C T R I C A L A N D C O M P U T E R E N G I N E E R I N G
I L L I N O I S I N S T I T U T E O F T E C H N O L O G YD E C E M B E R 2 0 1 0 G R A D U AT E
2
CONTENTS
• 32-bit Pipelined CPU• MC68K-Based Monitor Program• Pipelined MIPS Processor with hazard handler and dat
a forwarding• Simple Mesh-Like and Ring-Like Network
on Chip Design• Small office network design• 4-bit 10t adder circuit with dual-vt logic design• Single-ended 6T vs. standard 6T SRAM bitcell design• QR Matrix Factorization• Electro Active Polymer Energy Harvesting Design• Advanced Encryption Standard Hardware Design
3
SPRING 2009
• Introduction to VLSI Design• 32-bit Pipelined CPU• Multiplier with accumulator and pipeline optimization
• Microcomputer• MC68K-Based Monitor Program
• Advanced Computer Architecture• Pipelined MIPS Processor with hazard handler and data
forwarding
Return
4
32-BIT PIPELINED CPU
• Hardware Description Language• Verilog
• Tools• Compiler: Cadence Verilog XL• Logic Synthesis: Synopsys Design Compiler• Simulation tool: Cadence’s SimVision, Mentor Graphics
Modelsim • Place and Route: Cadence SOC Encounter• Mentor Graphic’s Modelsim
• Objectives• Execute ASIC Flow in this implementation using verilog• RTL, post-synthesis, and post-PR simulation for verification
• Determine maximum frequency, area, delay, and power
Return
5
32-BIT PIPELINED CPU
• 32-bit Memory File• Eight ALU functions: multiplication, add,
subtraction, OR, AND, XOR, XNOR• M:multiplicand, N: multiplier• Multiplier:• Radix 2r produce N/r partial products• Radix-4 booth-encoded Multiplier Reduces number of
partial products (N/2 vs. N)• Wallace Tree Reduces number of logic levels required to
perform summation
Return
6
32-BIT PIPELINED CPU
Return
7
32-BIT PIPELINED CPU
Return
8
32-BIT PIPELINED CPU
Return
9
32-BIT PIPELINED CPU
• Results• Maximum frequency: 40 < f < 41
MHz
Return
10
32-BIT PIPELINED CPU
• Case studies:• Case 1: Modify ALU multiplier to multiplier with
accumulator (MAC) (useful for implementing DSP)• Case 2: Pipeline optimization
• MAC benefit: reduces #instruction sets to compute the final result of sum of product functions.• Pipeline optimization is applied by inserting
registers at the critical path (in this case MAC unit)
Return
11
Case I32-BIT PIPELINED CPU
Return
12
• Case 1 results
• Case 2 results
32-BIT PIPELINED CPU
Return
13
• Case 2 Decision to put registers
32-BIT PIPELINED CPU
Return
14
• Provided:• Multiplier accumulator block diagram• Simple CPU design written in verilog• All required tools
• Implementation• Construct fore-mentioned unit in verilog and modify the
design to fit new unit• Apply numbers of registers for pipelining
• Design functionality Test• Verify in sumulation that function F= (-10)* 5 + (-60)*2 +
(-60)*8 outputs the correct result
32-BIT PIPELINED CPU
Return
15
• Results
32-BIT PIPELINED CPU
Return
16
• Additional Analysis Result• Finding the maximum frequency• Expected maximum frequency of the design: 58 MHz• Frequency vs. area vs. power consumption
32-BIT PIPELINED CPU
Return
17
MC68K-BASED MONITOR PROGRAM
• instructor: Dr. Jafar Saniie• Requirements/Specifications• Construct a simple monitor program for MC68000
processor that allows user to execute common memory and register accesses, basic exception handlers.
• Language• 68000 assembly language
• Tools• Easy68k Editor/Assembler/Simulator
Return
18
• Monitor program flowchart
MC68K-BASED MONITOR PROGRAM
Return
19
MC68K-BASED MONITOR PROGRAM
• Monitor program system diagram
Return
20
• Includes command interpreter that check and validate user inputs.
• Monitor debugger commands:• MEMD Memory display• MEMS Memory Set• SORT Memory Sort• FILL Memory Fill• MOVE Memory move• MEMM Memory Modify• FIND Block Memory Search• REGM Register Modify• REGD Register Display• RUNS Execute program at specified location
MC68K-BASED MONITOR PROGRAM
Return
21
• Monitor debugger Exception handling commands:• TBUS Bus Error Exception• TADD Address Error• TILL Illegal Exception• TPRI Privilege Violation• TDIV Division by Zero
MC68K-BASED MONITOR PROGRAM
Return
22
• Results (partial of 17 commands made)Register display
Memory display
Command interpreter
MC68K-BASED MONITOR PROGRAM
Return
23
HIGH-PERFORMANCE PIPELINED MIPS PROCESSOR
• MIPS (Microprocessor without Interlocked Pipeline Stages) is a reduced instruction set computer (RISC) instruction set architecture (ISA)
• instructor: Prof. Jia Wang• Requirements/Specifications• Design a MIPS processor with pipeline, data forwarding, and hazard handling
capabilities.• Run RTL Simulation to verify the functionalities
• Language• VHDL
• Tools• Modelsim PE 6.5• MARS 3.6 MIPS Simulator
• Provided:• Data memory unit design• Testbench code
Return
24
• Data width: 32-bit• 5-stage pipeline
• Instruction Fetch• Instruction Decode• Execute• Memory Access• Write-Back
• Main Modules• Program counter (PC)• Control Unit• ALU Control Unit• Register File• ALU• Instruction Memory• Data Memory• Hazard Detection Unit• Forwarding Unit
• Branch Hazard• Branch calculation occurred in
Instruction Decode Stage• Branch miss only costs one
cycle of stall.
• Data Hazard• Stall if data being written is
going to be used at the next instruction
• Data Forwarding• Result data is used immediately
rather than written back to register file first.
HIGH-PERFORMANCE PIPELINED MIPS PROCESSOR
Return
25
HIGH-PERFORMANCE PIPELINED MIPS PROCESSOR
• MIPS Architecture
Return
26
• Test program (Running on MARS 3.6)
HIGH-PERFORMANCE PIPELINED MIPS PROCESSOR
Return
27
• Result
HIGH-PERFORMANCE PIPELINED MIPS PROCESSOR
Return
28
FALL 2009
• Hardware/Software Co-Design• Simple Mesh-Like Network on Chip Design• Simple Ring-Like Network on Chip Design
• Introduction to Computer Network• Design of 2-story small office computer network
Return
29
HARDWARE/SOFTWARE CO-DESIGN
• Projects:• Network on chip prototype design with three
nodes• Simple Mesh-Like Network on Chip Design
Return
30
NETWORK ON CHIP PROTOTYPE DESIGN WITH THREE NODES
• Instructor: Prof. Jia Wang• Specifications• Three-node in partially connected mesh topology NoC
architecture• Three processing elements and three routers.• Queue system: FIFO
• Language• SystemC running on Visual C++
• Tools• Microsoft Visual C++
Return
31
• Three-node NoC System Diagram
• Third node function (called PE_dumpbox)• It receives all packets that cannot be processed by the
destination processing unit due to overloading in the network
NETWORK ON CHIP PROTOTYPE DESIGN WITH THREE NODES
Return
32
• Results• Overload in Router 1
network buffer at cycle 3
• 3rd processing unit PE_dumpbox receives packet
NETWORK ON CHIP PROTOTYPE DESIGN WITH THREE NODES
Return
33
• Specifications• a simple mesh-like NoC architecture.• One router has one processing unit (PE).• Queue system: FIFO• 4 by 4 matrix-like size
• Language• SystemC
• Tools• Microsoft Visual C++
MESH-LIKE NETWORK ON CHIP PROTOTYPE DESIGN
Return
34
MESH-LIKE NETWORK ON CHIP PROTOTYPE DESIGN
• Simple NoC Architecture
Return
35
• Results• Generated packets
• Result shows packets are delivered
MESH-LIKE NETWORK ON CHIP PROTOTYPE DESIGN
Return
36
• Results• Delays due to the fact
that only one packet is delivered to processing element PE at a time
MESH-LIKE NETWORK ON CHIP PROTOTYPE DESIGN
Return
37
• Benefit and drawback:• Packet arrives in the destination address with fewer
hops reducing contention and increasing average bit rate.
• Increases the complexity of the design and more wires are needed.
MESH-LIKE NETWORK ON CHIP PROTOTYPE DESIGN
Return
38
INTRODUCTION TO COMPUTER NETWORK
• Project: • Design a prototype of 2-story small office computer
network capable of serving 20 users with three department LANs, four servers and wireless Internet
• Language• N/A
• Tools• Microsoft Visio
Return
39
SMALL OFFICE NETWORK DESIGN
• Proposed configurations• IP address allocation
Return
40
• Proposed configurations• Design Topology
SMALL OFFICE NETWORK DESIGN
Return
41
• Office Layout
2nd floor
1st floor
Colored arrows show how cables are managed
SMALL OFFICE NETWORK DESIGN
Return
42
SPRING 2010
• Advanced VLSI• 4-bit 10t adder circuit with dual-vt logic design
• High Performance VLSI IC System• Single-ended 6T vs. standard 6T SRAM bitcell design
comparison
• QR Factorization• Implementing QR factorization algorithm in C
Return
43
4-BIT 10T ADDER CIRCUIT WITH DUAL-VT LOGIC DESIGN
• Project:• 4-bit 10t adder circuit with dual-vt logic design
• Specifications• Adder circuit is based on:
J. Lin, M. Sheu, and C.Ho. A Novel High-Speed and Energy Efficient 10-Transistor Full Adder Design. IEEE Trans. on Circuits and Systems, May 2007.
• Adder: cascaded Carry ripple Adders• Technology node: 45nm (FreePDK)• Voltage: 1.1V @ 25 MHz• Performance measurements (delay and power consumption) for 10T
Adder Circuit using high-threshold (Vt), low-Vt, and dual-Vt transistors
• Tools• Cadence Virtuoso Schematic Design• Synopsys HSPICE Simulator• Nanosim Simulator
Return
44
4-BIT 10T ADDER CIRCUIT WITH DUAL-VT LOGIC DESIGN
• High Vt vs. low Vt
• Full Adder Design (1-bit)• Complementary and level restoring carry logic (CLRCL)
Return
45
4-BIT 10T ADDER CIRCUIT WITH DUAL-VT LOGIC DESIGN
• Full Adder Design (1-bit) Critical Path• Dual-VT: Low-VT apply on transistors which are in critical path
for speed and High-VT for others for low leakage• NMOS at multiplexer and PMOS in inverter are low-VT
transistors
Return
46
4-BIT 10T ADDER CIRCUIT WITH DUAL-VT LOGIC DESIGN
• Logic EquationSum = (A XNOR B).Cin + (A XOR B). Cin_bar
Cout= (A XOR B) .Cin + (A XNOR B).A
• Design Components• Inverter (left) and multiplexer (right)
Return
47
4-BIT 10T ADDER CIRCUIT WITH DUAL-VT LOGIC DESIGN
• 1-bit Full Adder (consisting of multiplexers and inversters) and its symbol
• 4-bit Full Adder
Return
48
4-BIT 10T ADDER CIRCUIT WITH DUAL-VT LOGIC DESIGN
• Methodology• Using combination of input vector to measure delay and
power consumptions• Delay : Switching delay between least significant bit (bit 0) and
most significant bit (bit 3)• Power : Average and maximum power during simulation
• Results• Delay (in seconds)
High-to-Low Low-to-High0.00E+00
5.00E-11
1.00E-10
1.50E-10
2.00E-10
2.50E-10
3.00E-10
3.50E-10
4.00E-10
High-VTLow-VTDual-VT
Return
49
4-BIT 10T ADDER CIRCUIT WITH DUAL-VT LOGIC DESIGN
• Results• Power consumption (in Watt)
Average Power
(avgpwr)0.00E+00
1.00E-05
2.00E-05
3.00E-05
4.00E-05
5.00E-05
6.00E-05
High-VTLow-VTDual-VT
0.00E+001.00E-042.00E-043.00E-044.00E-045.00E-04
High-VTLow-VTDual-VT
Return
50
• Results
4-BIT 10T ADDER CIRCUIT WITH DUAL-VT LOGIC DESIGN
Return
51
• Issue• Voltage degradation specifically for high-vt or high
frequency (> 125 MHz) due to pass transistors behavior to deliver weak-1 (NMOS) or weak-0 (PMOS).
4-BIT 10T ADDER CIRCUIT WITH DUAL-VT LOGIC DESIGN
Return
52
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
• Specifications• Design from:J. Singh, et al. Single Ended 6T SRAM with Isolated Read-Port for Low-Power Embedded Systems. IEEE. 2009
• Technology node: 45nm• Use: high VT MOSFET
• Tools• Cadence Virtuoso Schematic Design• Synopsys HSPICE Simulator
Return
53
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
• Background• SRAM consumes majority of die area• Dynamic power via reads and writes activities• Static power : retaining its logic value
• Benefits/Drawbacks of Single-Ended SRAM• Faster reading logic ‘1’• One bit line (no complementary bit bar line) wire
reduction• More delay in Writing ‘1’ due to weak-1 behavior of pass
transistor NMOS (but around 85% of writes are zero writes)• Role of Isolated Read Port: Prevents bitcell content to be
exposed during READs• Considerable lower power dissipation, better read SNM
Return
54
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
Return
55
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
• Standard 6T SRAM• Read: precharge BL
and BL* WordLine=1
• Write: assert new value to BL and BL* WordLine=1
• Transistor sizing:• Access transistor:
medium• Pullup TR: weak• Pulldown TR:
Strong
Return
56
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
Return
57
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
Return
58
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
Return
59
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
Return
60
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
• Standard SRAM Design (using Cadence Virtuoso)
Return
61
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
• Single-Ended SRAM Design
Return
62
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
• Comparison Results• Write Delay (0 to 0.5Vdd or 1 to 0.5Vdd)
[3] Y. Chang, F. Lai, C. Yang. Zero-Aware Asymmetric SRAM Cell for Reducing Cache Power in Writing Zero. IEEE Trans. On VLSI Systems, Vol.12, No.8, August 2004.
“…around 85% of the instruction write bits are “0,” and over 90% of the data write bits are “0.”.. “ (quoted from [3])
Return
63
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
• Comparison Results• Power Consumption Comparison
Return
64
SINGLE-ENDED 6T VS. STANDARD 6T SRAM BITCELL DESIGN
• Noise Margin
Return
65
QR MATRIX FACTORIZATION
• Purposes:• Implementing QR factorization algorithm in C
• Specifications• Written in C under RedHat OS
• QR Factorization• Decomposition method of a matrix to solve linear problems or
equations without inverting one of the left-hand side matrix.• Applicable to: m-by-n matrix A• Decomposition: A = QR where Q is an orthogonal matrix of size m-
by-m, and R is an upper triangular• The QR decomposition provides an alternative way of solving the
system of equations Ax = b without inverting the matrix A. The fact that Q is orthogonal means that QTQ = I, so that Ax = b is
• equivalent to Rx = QTb, which is easier to solve since R is triangular.
Return
66
QR MATRIX FACTORIZATION
• Algorithm
Return
67
QR MATRIX FACTORIZATION
• Result
Return
68
FALL 2010
• Electro Active Polymer Energy Harvesting• Advanced Encryption Standard
Return
69
ELECTRO ACTIVE POLYMER ENERGY HARVESTING DESIGN
• EAP Circuitry provides mechanical to electrical energy conversion when it is stretched, given bias voltage.• EAP material VHB 4905 tape and carbon grease
Return
70
• Previous prototype:• Charge management IC:
TI’s bq2000• Li-ion battery 3V,
45mAh• Application: TI’s eZ430-
F2013• Boost Converter to
supply biasing voltage (5 V 1.5KV): • EMCO Q15N-5
• Drawbacks• High energy consumption• EAP output power is too small
to even turn on battery charging circuit (which needs 20.6 mA)
• Solutions• EAP material efficiency• Higher capacitance
• Battery and circuit that can store small energy without requiring much energy to operate
• Apply low biasing voltage eliminate use of boost converter
ELECTRO ACTIVE POLYMER ENERGY HARVESTING DESIGN
Return
71
• Simulation model using Simulink• Circuit model parameters:• EAP Model parameters, input voltage (battery), and output
capacitor Co
ELECTRO ACTIVE POLYMER ENERGY HARVESTING DESIGN
Return
72
• Simulation model using Simulink• EAP Model Parameters:• Cidle, Cforced, force frequency f(how often the EAP is stretched)
• Absolute function to create always-positive sine waveform from original sine wave
ELECTRO ACTIVE POLYMER ENERGY HARVESTING DESIGN
Return
73
• Simulation result:
Return
ELECTRO ACTIVE POLYMER ENERGY HARVESTING DESIGN
74
• Prototype:• Battery charging : Cymbet CBC5300• Battery : 2xCBC050 (3x50uAh) at 3.5V
output• Capability to harvest 1.05V • PCB Layout Tool : Altium Designer• Application: MSP430-F2274 with CC2500 2.4GHz RF
Transceiver
ELECTRO ACTIVE POLYMER ENERGY HARVESTING DESIGN
Return
75
• Input power• Tested using voltage generator at
1.042 V• Current drawn was 529 µA
• Output power• power for RF and MSP430• Power for additional load (tested by
using 330KΩ resistor)
ELECTRO ACTIVE POLYMER ENERGY HARVESTING DESIGN
Return
76
• Power for RF and MSP430• Depends on how often the device transmits data• Set to 5 seconds
• Based on SLAA378C documentation from TI, for 5 second period between transmission, average current consumption (expected) is 8.4 µA.
• Voltage is approx. 3.2V
• Power for the load
ELECTRO ACTIVE POLYMER ENERGY HARVESTING DESIGN
Return
77
• Efficiency • Pstore is power stored in the
battery.
• Ƞ= • Note that is roughly averaged
from the battery charging profile .• Also note that during experiment,
the battery still have some charge.
Battery Charging profile for CBC050
Return
ELECTRO ACTIVE POLYMER ENERGY HARVESTING DESIGN
78
ADVANCED ENCRYPTION STANDARD HARDWARE DESIGN
• Variant AES with 512-bit and 1024-bit key• Area and power consumption comparison with 128-
bit and 256-bit AES keys• CMOS technology : 45nm• Operating Voltage : 1.1 V @ 100 MHz• Verilog language• Tools:• Synthesis : Synopsys DC Compiler• Simulation : Modelsim
• Find the relationship between key size and implemented hardware area and power consumption.
Return
79
• Longer key size:• More secure• More iteration rounds• (1)
• More power and area increase
• Rijndael Algorithm
Initial Round
Normal Round
Final Round
Plaintext
AddRoundKey
SubBytes
ShiftRows
AddRoundKey
Cipher Key
Key Expansion RoundKey[0]
RoundKey[i]
MixColumns
i < Number of rounds?
i=i+1
yes
SubBytes
ShiftRows
AddRoundKey
No
Ciphered Text
ADVANCED ENCRYPTION STANDARD HARDWARE DESIGN
Return
80
• Block View of AES Operation
plaintext (in bytes)0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
XORFirst roundkey (in bytes)0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
State Block0 4 8 121 5 9 132 6 10 143 7 11 15
State BlockS0 S4 S8 S12S1 S5 S9 S13S2 S6 S10 S14S3 S7 S11 S15
SubBytes(Replaces each byte with S-box
value)
State Block(after ShiftRows)S0 S4 S8 S12
S5 S9 S13 S1 XORS10 S14 S2 S6 Per ColumnS15 S3 S7 S11
MixColumns a(x)
State Block(after MixColums) Next roundkeyM0 M4 M8 M12 K0 K4 K8 K12M5 M9 M13 M1 k1 K5 K9 K13
M10 M14 M2 M6 K2 K6 K10 K14m15 M3 M7 M11 K3 K7 K11 K15
XOR
Ready for
next round
Key Expansion ModuleCipher_key
Plain_text Mux
AddRoundKey
SubBytes and
ShiftRowsMixColumns
Mux
AddRoundKey
Mux
Initial value (zero)
Ciphered_text
ADVANCED ENCRYPTION STANDARD HARDWARE DESIGN
Return
81
• Block Diagram
Key Expansion ModuleCipher_key
Plain_text Mux
AddRoundKey
SubBytes and
ShiftRowsMixColumns
Mux
AddRoundKey
Mux
Initial value (zero)
Ciphered_text
ADVANCED ENCRYPTION STANDARD HARDWARE DESIGN
Return
82
Results
AES128 AES256 AES512 AES10240
1
2
3
4
5
6
7
f(x) = 0.85245812 x + 2.73899385R² = 0.985616267025268
power (dynamic) in mWpower (static) in mWTotal Power in mWLinear (Total Power in mW)
power (dynamic) in mW power (static) in mW Total Power in mWAES128 3.3574 0.2971603 3.6545603AES256 3.9442 0.3341722 4.2783722AES512 5.0289 0.409219 5.438119AES1024 5.6042 0.5053051 6.1095051
52500575006250067500725007750082500875009250097500
ADVANCED ENCRYPTION STANDARD HARDWARE DESIGN
Return
83
Results: Area
52500
57500
62500
67500
72500
77500
82500
87500
92500
97500
ADVANCED ENCRYPTION STANDARD HARDWARE DESIGN
Return