Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests:...

Kris Gaj

Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM

Research and teaching interests:• cryptography• computer arithmetic• VLSI design and testing

Contact:Science & Technology II, room 223

[email protected], [email protected],

(703) 993-1575

ECE 645

Part of:

MS in EE

MS in CpE

Digital Systems Design – required courseOther concentration areas – elective course

Certificate in VLSI Design/Manufacturing

PhD in IT

PhD in ECE

Spring 2006 Enrollment as of January 23, 2006

MS in CpE7

MS in EE6

BS in CpE1

PhD in ECE1

PhD in IT1

MS in ISA1

NDG1

DIGITAL SYSTEMS DESIGN

Concentration advisor: Ken Hintz

1. ECE 545 Introduction to VHDL – K. Gaj, K. Hintz, project, VHDL, Aldec/Synplicity/Xilinx and ModelSim/Synopsys

2. ECE 645 Computer Arithmetic: HW and SW Implementation – K. Gaj, project, VHDL, Aldec/Synplicity/Xilinx and ModelSim/Synopsys

3. ECE 586 Digital Integrated Circuits – D. Ioannou

4. ECE 681 VLSI Design Automation – T. Storey, project/lab, back-end design with Synopsys tools

algorithmic

Design level

register-transfer

gate

transistor

layout

devices

CoursesComputerArithmetic

Introduction to VHDL

DigitalIntegratedCircuits

ECE545

ECE645

ECE 586

ECE 684MOS Device Electronics

VLSI Design Automation

ECE681

Semiconductor Device

Fundamentals ECE 584

Prerequisites

Permission of the instructor, granted assuming that you know

VHDL or Verilog, High level programminglanguage(preferably C)

ECE 545 Introduction to VHDL

or

Course web page

ECE web page Courses Course web pages ECE 645

http://teal.gmu.edu/courses/ECE645/index.htm

Computer Arithmetic

Lecture Project

Project 1 20 %Project 2 30 %

Homework 15 %Midterm exam 1 (in class) 20 %Midterm exam 2 (take-home) 15 %

Advanced digital circuit design course covering

• addition and subtraction• multiplication• division and modular reduction• exponentiation

Efficient

Integersunsigned and signed

Real numbers• fixed point• single and double precision floating point

Elementsof the Galoisfield GF(2n)• polynomial base

Lecture topics (1)

1. Applications of computer arithmetic algorithms

2. Number representation

• Unsigned Integers• Signed Integers• Fixed-point real numbers• Floating-point real numbers• Elements of the Galois Field GF(2n)

INTRODUCTION

1. Basic addition, subtraction, and counting

2. Carry-lookahead, carry-select, and hybrid adders

3. Adders based on Parallel Prefix Networks

ADDITION AND SUBTRACTION

MULTIOPERAND ADDITION

1. Carry-save adders

2. Wallace and Dadda Trees

3. Adding multiple signed numbers

MULTIPLICATION

1. Tree and array multipliers

2. Sequential multipliers

3. Multiplication of signed numbers and squaring

DIVISION

1. Basic restoring and non-restoring sequential dividers

2. SRT and high-radix dividers

3. Array dividers

FLOATING POINT AND

GALOIS FIELD ARITHMETIC

1. Floating-point units

2. Galois Field GF(2n) units

• University of California, Santa Barbara, Behrooz Parhami, ECE252B: Computer Arithmetic.

• University of Massachusetts, Amherst, Israel Koren, ECE666: Digital Computer Arithmetic

• Lehigh University, Michael Schulte, ECE496: High-Speed Computer Arithmetic.

• Worcester Polytechnic Institute, Berk Sunar, EE-579 V Computer Arithmetic Circuits.

• Stanford University, Michael Flynn, EE486: Advanced Computer Arithmetic.

• University of California, Davies, Vojin Oklobdzija, ECE278: Computer Arithmetic for Digital Implementation.

Similar courses at other universities

New in this course

• real-life project based on VHDL or Verilog HDL

• operations in the Galois Field (with the application in cryptography and communications)

Possible topics for a Scholarly Paper or Research Project

for the CpE & EE students

Advanced Computer Arithmetic

Square rootExponential and logarithmic functionsTrigonometric functionsHyperbolic functions

Fault-Tolerant ArithmeticLow-Power ArithmeticHigh-Throughput Arithmetic

Three Curriculum Options

MS ThesisOption

Research Project Option

Scholarly PaperOption

2 corecourses

4 requiredcourses

2 electivecourses

3 electivecourses4 elective

coursesECE 799

Master’s Thesis (6 cr. hrs)

ECE 798 Research Project

Scholarly paperScholarly paper

Literature (1)

Required textbook:

Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000.

Milos D. Ercegovac and Tomas Lang Digital Arithmetic, Morgan Kaufmann Publishers, 2004.

Isreal Koren, Computer Arithmetic Algorithms, 2nd edition, A. K. Peters, Natick, MA, 2002.

Recommended textbooks:

Literature (2)

1. Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998.

2. Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004.

VHDL books (used in ECE 545 in Fall 2005)

Literature (3)

Supplementary books:

1. E. E. Swartzlander, Jr., Computer Arithmetic, vols. I and II, IEEE Computer Society Press, 1990. 2. Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of Applied Cryptology, Chapter 14, Efficient Implementation, CRC Press, Inc., 1998. 3. Christof Paar, Efficient VLSI Architectures for Bit Parallel Computation in Galois Fields, VDI Verlag, 1994.

Literature (3)

Proceedings of conferences ARITH - International Symposium on Computer Arithmetic ASIL - Asilomar Conference on Signals, Systems, and Computers ICCD - International Conference on Computer Design CHES - Workshop on Cryptographic Hardware and Embedded Systems

Journals and periodicals IEEE Transactions on Computers, in particular special issues on computer arithmetic: 8/70, 6/73, 7/77, 4/83, 8/90, 8/92, 8/94. IEEE Transactions on Circuits and Systems IEEE Transactions on Very Large Scale Integration IEE Proceedings: Computer and Digital Techniques Journal of VLSI Signal Processing

Homework

• reading assignments (main textbook + articles)

• analysis of hardware and software algorithms and implementations

• design of small hardware units using VHDL or Verilog

Optional assignments

Possibility of trading

analysis vs. design vs. coding

Midterm exams

Exam 1 - 2 hrs 30 minutes, in class multiple choice + short problems

Exam 2 – 48 hrs, take-home analysis and design of arithmetic units using VHDL or Verilog HDL

Practice exams on the web

Exam 1 - Monday, March 27Exam 2 - Saturday-Sunday, May 6-7

Tentative days of exams:

Project (1)Project I (20% of grade)

Design and comparative analysis of fast adders (several hundred bits long)

Final report dueMonday, March 20

Optimization criteria:• minimum latency• maximum throughput• minimum area• minimum product latency · area• maximum ratio throughput/area• scalability

Similar for all students Done individually

Project II (30% of grade)

Fast • multiplication• squaring• division• modular reduction, or • modular exponentiation

Project (2)

or

Fast • addition or • multiplication

Long unsigned or signed integers

Floating-point numbers

Written report & oral presentationMonday, May 15

• Real life application

• Requirements derived from the analysis of the application

• Typically both hardware and software design

• Several project topics proposed on the web

• You can choose project topic by yourself

• Can be done in a group of 1-3 students

Project II (rules)

• Cooperation (but not exchange of code) between teams is encouraged

• Every team works on a slightly different problem

• Project topics should be more complex for larger teams

Project II (rules)

Project

Hardware Software

VHDL (or Verilog) code

Latency and/or throughput

Area

High level language(C preferred)

Execution time

Memory requirements

Scalability Scalability

Degrees of freedom and possible trade-offs

speed area

power testability

ECE 645

ECE 682 ECE 586, 681

speed

area

latency

throughput

Degrees of freedom and possible trade-offs

Timing parameters

definition units pipelining

latency

throughput

delay

clock period

clock frequency

time inputoutput

#output bits/time unit

time pointpoint

rising edge rising edgeof clock

1clock period

ns

ns

Mbits/s

ns

MHz

bad

good

good

good

Project technologies

semi-custom Application Specific Integrated Circuits and Field Programmable Gate Arrays

Levels of design description

Algorithmic level

Register Transfer Level

Logic (gate) level

Circuit (transistor) level

Physical (layout) level

Level of description

most suitable for synthesis

Register Transfer Logic (RTL) Design Description

Combinational Logic

Combinational Logic

…

Clock

Registers

RTL Block Synthesis*Write RTL HDL

Code

SimulateOK

Synthesize RTLCode to Gates

ConstraintsMet?

Gate LevelTesting

OK?

HDL

No

Yes

Gate LevelNetlist

No

Yes

No

Yes

Proceed withBackend

Processing*Simplified design flow

Estimated Area

Estimated Timing

VHDL Design Styles

Components andinterconnects

structural

VHDL Design Styles

dataflow

Concurrent statements

behavioral(algorithmic)

• Registers• State machines• Test benches

Sequential statements

Subset most suitable for

use in this course

CAD software available at GMU (1)

• Aldec Active-HDL (under Windows)

• ModelSim (under Unix)

• available from all PCs in the ECE educational labs using an X-terminal emulator• available remotely from home using a fast Internet connection

• available in the FPGA Lab, S&T II, room 203

VHDL simulators

• student edition can be purchased on an individual basis ($59.95 + S&H)

http://www.aldec.com/education/students/


• Synplicity Synplify Pro (under Windows)

• Synopsys Design Compiler (under Unix)• available from all PCs in the ECE educational labs using an X-terminal emulator• available remotely from home using a fast Internet connection


Tools used for logic synthesis

• Xilinx XST (under Windows)

FPGA synthesis

ASIC synthesis


• Xilinx ISE (under Windows)


Tools used for implementation (mapping, placing & routing) in the FPGA technology

How to learn VHDL for synthesisby yourself?

• Lecture slides for ECE 545 from Fall 2005

• Sundar Rajan, Essential VHDL: RTL Synthesis Done Right, S & G Publishing, 1998.

• Volnei A. Pedroni, Circuit Design with VHDL, The MIT Press, 2004.

• Individual or small-group hands-on sessions with the TA

• Practice, Practice, Practice!!!

Testbench

testbench

design entity

Architecture 1 Architecture 2 Architecture N. . . .

Non-synthesizable

Synthesizable

Design Environment

Test Vectors

(Inputs)

Actual Resultsvs.

Expected Results

Comparison

HDL Design

(VHDL or Verilog)

Reference Model

( C )

Primary applications (1)

Execution units of general purpose microprocessors

Integer units Floating point units

Integers(8, 16, 32, 64 bits)

Real numbers (32, 64 bits)


Digital signal and digital image processing

Real numbers(fixed-point or floating point)

e.g., digital filters Discrete Fourier Transform Discrete Hilbert Transform

General purpose DSP processors

Specialized circuits


Coding

Elements of the Galois fields GF(2n) (4-64 bits)

Error detection codesError correcting codes

Secret-key (Symmetric) Cryptosystems

key of Alice and Bob - KABkey of Alice and Bob - KAB

Alice Bob

Network

Encryption Decryption


Cryptography

Integers(16, 32 bits)

Secret key cryptography

IDEA, RC6, Mars Twofish, Rijndael

Elements of the Galois field GF(2n) (4, 8 bits)

RC6

MARS

Twofish

MUL32, 2 x ROL32,S-box 9x32

Mainoperations

Auxiliaryoperations

XOR,ADD/SUB32

2 x SQR32,2 x ROL32

XOR,ADD/SUB32

96 S-box 4x4,24 MUL GF(28)

XORADD32

Rijndael

Serpent 8 x 32 S-box 4x4

XOR

16 S-box 8x824 MUL GF(28)

XOR

Public Key (Asymmetric) Cryptosystems

Public key of Bob - KBPrivate key of Bob - kB

Alice Bob

Network

Encryption Decryption

RSA as a trap-door one-way function

M C = f(M) = Me mod N C

M = f-1(C) = Cd mod N

PUBLIC KEY

PRIVATE KEY

N = P Q P, Q - large prime numbers

e d 1 mod ((P-1)(Q-1))

RSA keys

PUBLIC KEY PRIVATE KEY

{ e, N } { d, P, Q }

N = P Q

e d 1 mod ((P-1)(Q-1))

P, Q - large prime numbers


Cryptography

Long integers(1000-2000 bits)

Public key cryptography

RSA, DSS,Diffie-Hellman

Elliptic Curve Cryptosystems

Elements of the Galois field GF(2n) (150-250 bits)

Topic 1

Application: modern secret-key ciphers, candidates for the new Advanced Encryption Standard (AES):

• MARS developed by IBM • RC6 developed at MIT

Function: 32-bit unsigned multiplication and squaring modulo 232

Optimization: • maximum throughput• minimum latency• minimum area

Environment: hardware, software for 8-bit processors

C = A · B mod 232, C = A2 mod 232

Topic 2

Application: digital filters

Function: 64-bit signed multiplier-accumulator (MAC) accumulating at least 256 partial products

Environment: hardware, software for a general purpose DSP or microprocessor

Optimization: Hardware - maximum throughput limited areaSoftware – minimum execution time, limited memory

C = Ai · Bi i=1

256

Topic 3

Application: general purpose microprocessor

Function: multiplication of two 64-bit signed numbers + division of a 128-bit number by a 64-bit number

Environment: hardware, software for a 64-bit processor without multiplication and division built in

Optimization: Hardware – minimum latency maximum throughput limited areaSoftware – minimum execution time, limited memory

C = A · B C=A / B

Topic 4

Application: modern public-key ciphers • RSA • Diffie-Hellman • Elliptic Curve Cryptosystems

Function: modular exponentiation C=ME mod N M, N – arbitrary 768-bit numbers, E=216+1

Optimization: Hardware - minimum latency limited areaSoftware – minimum execution time, limited memory

Environment: hardware, software for 32-bit or 8-bit processors

C = AE mod N

Topic 5

Application: general purpose microprocessor or digital signal processor

Function: floating point addition and multiplication according to ANSI/IEEE 754

Environment: hardware, software for a 32-bit processor without floating point operations

Optimization: Hardware – minimum latency maximum throughput limited areaSoftware – minimum execution time, limited memory

Z = X+Y Z = X · Y

Famous computer arithmeticbugs and flaws

Learn to deal with approximations

• In digital arithmetic one has to come to grips with approximation and questions like:– When is approximation good enough

– What margin of error is acceptable

• Be aware of the applications you are designing the arithmetic circuit or program for.

• Analyze the implications of your approximation.

Calculators

2.....u =

10 times

v = 21/1024 = 1.000 677 131= 1.000 677 131

x = (((u2)2)…)2 = 1.999 999 963

10 times

x’ = u1024 = 1.999 999 973

y = (((v2)2)…)2 = 1.999 999 983

10 times

y’ = v1024 = 1.999 999 994

Hidden digits in the internal representation of numbersDifferent algorithms give slightly different results

Very good accuracy

Consequences of bad approximations

Example: Failure of Patriot Missile (1991 Feb. 25)

Source http://www.math.psu.edu/dna/455.f96/disasters.html

American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept incoming Iraqi Scud missile The Scud struck an American Army barracks, killing 28

Cause, per GAO/IMTEC-92-26 report: “software problem” (inaccurate calculation of the time since boot)

Specifics of the problem: time in tenths of second as measured by the system’s internal clock was multiplied by 1/10 to get the time in seconds Internal registers were 24 bits wide 1/10 = 0.0001 1001 1001 1001 1001 100 (chopped to 24 b) Error 0.1100 1100 2 –23 9.5 10 –8

Error in 100-hr operation period9.5 10 –8 100 60 60 10 = 0.34 sDistance traveled by Scud = (0.34 s) (1676 m/s) 570 m

This put the Scud outside the Patriot’s “range gate” Ironically, the fact that the bad time calculation had been improved in some (but not all) code parts contributed to the problem, since it meant that inaccuracies did not cancel out

Example: Explosion of Ariane Rocket (1996 June 4)

Source http://www.math.psu.edu/dna/455.f96/disasters.html

Unmanned Ariane 5 rocket launched by the European Space Agency veered off its flight path, broke up, and exploded only 30 seconds after lift-off (altitude of 3700 m)

The $500 million rocket (with cargo) was on its 1st voyage after a decade of development costing $7 billion

Cause: “software error in the inertial reference system”

Specifics of the problem: a 64 bit floating point number relating to the horizontal velocity of the rocket was being converted to a 16 bit signed integer

An SRI* software exception arose during conversion because the 64-bit floating point number had a value greater than what could be represented by a 16-bit signed integer (max 32 767)

Consequences of bad approximations

Pentium bug (1)October 1994

Thomas Nicely, Lynchburg Collage, Virginiafinds an error in his computer calculations, and tracesit back to the Pentium processor

Tim Coe, Vitesse Semiconductorpresents an example with the worst-case error

c = 4 195 835/3 145 727

Pentium = 1.333 739 06...Correct result = 1.333 820 44...

November 7, 1994

Late 1994

First press announcement, Electronic Engineering Times

Pentium bug (2)

Intel admits “subtle flaw”

Intel’s white paper about the bug and its possible consequences

Intel - average spreadsheet user affected once in 27,000 yearsIBM - average spreadsheet user affected once every 24 days

Replacements based on customer needs

Announcement of no-question-asked replacements

November 30, 1994

December 20, 1994

Pentium bug (3)

Error traced back to the look-up table used bythe radix-4 SRT division algorithm

2048 cells, 1066 non-zero values {-2, -1, 1, 2}

5 non-zero values not downloaded correctly to the lookup table due to an error in the C script

Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests:...

Documents

Transcript of Kris Gaj Office hours: Monday, 7:30-8:30 PM Thursday, 7:30-8:30 PM Research and teaching interests:...