Fpu Project Report
Transcript of Fpu Project Report
CARLETON UNIVERSITY
ELEC 4907
Design of a 32-bit RISC Microprocessor
with Floating Point Unit Design of a Floating Point Unit
Author: Adam Parsons Supervisor: M. Shams S/N: 100653270
April 5, 2010
Department of Electronics
2009-2010
Microprocessor Design April 5, 2010
ii
This fourth year project presents and examines the design of a microprocessor.
The project is to design a 32-bit RISC microprocessor with a floating point unit. The
design presented includes contributions from Zain Zia, Chaiya See-toh, and Adam
Parsons.
This report covers the topics of professional engineering practices as well as
project management techniques, but it centers mainly on the microprocessor, and its
design. It provides background and information on the microprocessor and its
importance to today’s society.
The more technical portion of the report focuses heavily upon the Floating Point
Unit which is can be viewed as a coprocessor to the microprocessor that was designed.
It starts by focusing on the understanding of how a microprocessor operates, which is
then followed by a more in depth study of how a floating point unit is designed and
operated.
Furthermore, the results of the successful digital design testing are presented
and explained, with suggestions of improvements and further optimization techniques.
Abstract
Microprocessor Design April 5, 2010
iii
Acknowledgements
My immediate thanks go to Maitham Shams (project supervisor), for his constant
guidance. Under his instruction for this project, I have gained valuable skills that can be
applied in the workplace.
I would also like to thank my group members Chaiyas See-toh and Zain Zia, of
whom without this project would not have been completed. Their patience and
dedication to hard work made this project a success, and they were indeed a true
pleasure to work with.
I would also like to thank all those I have met in my abundance of years at
Carleton University. You have all kept me on the right track, as you constantly remind
me of things I had often forgotten. I would also like to thank the creators of
ASICWORLD.com, as well as AJDESIGNER.com, for without their guidance, I would be
lost in the language of Verilog and floating point calculations.
Most of all I would like to thank my parents, who patiently stood by me in all my
years of studies, although they don’t always understand what I am supposed to be
learning.
April 2010
Adam Parsons
Microprocessor Design April 5, 2010
iv
This is for those who are patient.
We’re here for the long haul.
Microprocessor Design April 5, 2010
v
Table of Contents Abstract ................................................................................................................................ii
Acknowledgements ............................................................................................................. iii
Table of Figures .................................................................................................................. vii
Table of Equations ............................................................................................................. vii
Table of Tables .................................................................................................................. viii
List of Abbreviations ......................................................................................................... viii
1.0 Introduction .................................................................................................................. 1
1.1 Purpose ......................................................................................................................... 1
1.1.1 Motivation .............................................................................................................. 1
1.1.2 Applications ............................................................................................................ 2
1.2 Report Overview ........................................................................................................... 2
2.0 Health and Safety ...................................................................................................... 4
2.1 Engineering Professionalism ..................................................................................... 6
2.2 Project Management ................................................................................................. 7
3.0 Project Overview ........................................................................................................... 9
3.1 Design Specifications .................................................................................................. 10
3.2 Design Methodology ................................................................................................... 12
4.0 Background of Floating Point Representation ............................................................ 14
4.1 Floating Point Unit ...................................................................................................... 18
4.2 Addition and Subtraction ............................................................................................ 19
4.2.1 Addition ................................................................................................................ 22
4.2.2 Subtraction ........................................................................................................... 23
4.3 Multiplication and Division ......................................................................................... 24
4.3.1 Multiplier .............................................................................................................. 26
4.3.2 Division ................................................................................................................. 28
4.4 Float to Integer ........................................................................................................... 30
4.5 Integer to Float ........................................................................................................... 32
4.6 Power Approximation ................................................................................................. 33
4.7 Square-Root ................................................................................................................ 38
4.8 Floating Point Control Unit ......................................................................................... 39
5.0 Digital Testing .............................................................................................................. 42
Microprocessor Design April 5, 2010
vi
5.1 Structural Analysis....................................................................................................... 42
5.2 Timing Analysis ............................................................................................................ 44
5.3 Implementation .......................................................................................................... 45
6.0 Concluding Remarks.................................................................................................... 47
6.1 Summary of Project Accomplishments ....................................................................... 47
6.2 Considerations for Future Work ................................................................................. 48
References ........................................................................................................................ 49
Appendix A: Verilog Design Code ..................................................................................... 50
Addition Module ........................................................................................................... 50
Subtraction Module ...................................................................................................... 53
Normalization Module............................................................................................... 56
24- bit Addition Module ............................................................................................ 58
Multiplication Module ................................................................................................... 60
Division Module ............................................................................................................ 63
Floating Point to Integer Conversion Module ............................................................... 65
Integer to Floating Point Conversion Module ............................................................... 68
Power Module ............................................................................................................... 70
Square Root Module ..................................................................................................... 72
Control Module ............................................................................................................. 73
Appendix B: Digital Testing Results................................................................................... 77
Standard Case Waveforms ............................................................................................ 77
Corner Case Tables ........................................................................................................ 79
Microprocessor Design April 5, 2010
vii
Table of Figures FIGURE 1: PROJECT SCHEDULE ........................................................................................................................ 8 FIGURE 2: PROCESSOR OVERVIEW .................................................................................................................. 9 FIGURE 3: WORKLOAD PARTITIONING CHART .............................................................................................. 12 FIGURE 4: FLOATING POINT BINARY ............................................................................................................. 16 FIGURE 5: FLOATING POINT BLOCK DIAGRAM .............................................................................................. 19 FIGURE 6: ADDITION/SUBTRACTION MODULE ............................................................................................. 21 FIGURE 7: CARRY LOOK-AHEAD ADDER ........................................................................................................ 22 FIGURE 8: TWO'S COMPLIMENT ................................................................................................................... 23 FIGURE 9: MULTIPLIER AND DIVIDER MODULE ............................................................................................. 25 FIGURE 10: MULTIPLICATION ALGORITHM ................................................................................................... 26 FIGURE 11: MULTIPLICATION BLOCK DIAGRAM............................................................................................ 28 FIGURE 12: DIVISION BLOCK DIAGRAM ......................................................................................................... 29 FIGURE 13: DIVISION ALGORITHM ................................................................................................................ 30 FIGURE 14: FLOAT TO INTEGER BLOCK .......................................................................................................... 31 FIGURE 15: INTEGER TO FLOAT DIAGRAM .................................................................................................... 32 FIGURE 16: LOG2 VS IEEE ESTIMATE ............................................................................................................. 34 FIGURE 17: POWER UNIT ............................................................................................................................... 37 FIGURE 18: SQUAREROOT UNIT .................................................................................................................... 39 FIGURE 19: FLOATING POINT CONTROL UNIT ............................................................................................... 40 FIGURE 20: ALTERA DE2 IMPLEMENTATION ................................................................................................. 46
Table of Equations EQUATION 1 .................................................................................................................................................. 32 EQUATION 2 .................................................................................................................................................. 32 EQUATION 3 .................................................................................................................................................. 33 EQUATION 4 .................................................................................................................................................. 35 EQUATION 5 .................................................................................................................................................. 36 EQUATION 6 .................................................................................................................................................. 36 EQUATION 7 .................................................................................................................................................. 36 EQUATION 8 .................................................................................................................................................. 36 EQUATION 9 .................................................................................................................................................. 38 EQUATION 10 ................................................................................................................................................ 38
Microprocessor Design April 5, 2010
viii
Table of Tables TABLE 1: IEEE-754 SPECIAL REPRESENTATIONS ............................................................................................ 17 TABLE 2: LOG ESTIMATE ERROR .................................................................................................................... 35 TABLE 3: LOG ESTIMATE ERROR CORRECTION .............................................................................................. 35 TABLE 4: STANDARD TEST CASE .................................................................................................................... 43 TABLE 5: SPECIAL TEST CASES ....................................................................................................................... 43 TABLE 6: FAST TIMING ANALYSIS .................................................................................................................. 44 TABLE 7: SLOW TIMING ANALYSIS ................................................................................................................. 45
List of Abbreviations
CPU Central Processing Unit
RISC Reduced Instruction Set Computer
FPGA Field Programmable Gate Array
OPCODE Operational Code
ALU Arithmetic Logic Unit
FPU Floating Point Unit
MIPS Microprocessor without Interlocked Pipeline Stages
NaN Not a Number
INF Infinity
FMAX Maximum Frequency
TCO Clock Output Time
TH Hold Time
TSU Clock Setup Time
Microprocessor Design April 5, 2010
1
Chapter 1
1.0 Introduction
The purpose of this report is to present and examine the design of a
microprocessor. The project is to design a 32-bit RISC microprocessor with a floating
point unit. The design presented includes contributions from Zain Zia, Chaiya See-toh,
and Adam Parsons.
1.1 Purpose
Microprocessors are extremely small electrical devices built on an integrated
circuit. They are the cornerstone that today’s automated systems are built upon. Most
notably the microprocessor is used in the common computer; be it either a PC or a
MAC. There are many more applications of it in the modern world, and there is often a
microprocessor design specifically for that task. Their uses can range from simple
household devices such as washing machines and mobile phones to the automatic
check-in booths in the airport.
1.1.1 Motivation
As the microprocessor becomes more integrated into every aspect of daily life, it
becomes more important to understand the design and implementation of the device.
This allows for improvements and optimizations in order to maintain a competitive
Microprocessor Design April 5, 2010
2
marketplace, as well as a constant progression of modern technology. Modern
applications of microprocessors require them to be faster, precise and designed with
minimal hardware.
1.1.2 Applications
The 32-bit RISC microprocessor with floating point unit is a more specialized
device, but it still maintains a wide range of possible implementations. It can store and
manipulate large data sets, and handle real number calculations that may be necessary
in the field. These applications would tend to be directed to math-intensive operations,
such as data processing.
With a more specialized functionality that provides faster and more accurate
outputs compared to a general microprocessor. Due to the specialty of the processor it
is often encouraged to implement it as part of a multi-core processing set. This
particular processor can be implemented within web controllers, graphics processors, as
well as mobile GPS devices.
1.2 Report Overview
Chapter 2 outlines the engineering project as a whole. This ranges from the
Health and Safety concerns involved with designing a microprocessor, and the
appropriate procedures taken to ensure that the respective Health and Safety
Microprocessor Design April 5, 2010
3
requirements are met. It also addresses the engineering professionalism pertaining to
the project, through project management, workload partitioning, as well as workplace
synergy.
Chapter 3 will begin to present you with the more technical aspect of the
microprocessor and its design. This chapter addresses the overview of the project,
providing background information regarding the microprocessor, as well as design
specifications, and the partitioning of the actual microprocessor components in relation
to each project member.
The specialized main topic of the project is presented within Chapter 4. For
this specific report it will provide in depth technical details regarding the floating point
unit. The individual modules of the device will be explained, and the algorithms and
optimizations that were used to produce a high performing floating point unit.
In Chapter 5 the results from the digital design testing are displayed and
analyzed. This chapter also contains explanations for performance analysis and
performance restrictions of the floating point unit.
Chapter 6 concludes the report by summarizing the project’s work and
accomplishments, and possible applications for the 32-bit RISC micro processor with
floating point unit, or even just simply the floating point coprocessor. This chapter also
states proposals for future improvements to be made to the processor.
Microprocessor Design April 5, 2010
4
Chapter 2
2.0 Health and Safety
Microprocessors are relatively safe devices to operate, but within the computer
design lab it is still important to follow and respect general health and safety principles
as regulated by the Carleton University Health-And-Safety document. Some of the
relevant health and safety principle from the document include:
• usage of personal protective equipment at all times,
• using the equipment only for its designed purpose,
• keeping the lab supervisor informed of any unsafe condition,
• keeping track of the location and correct use of safety equipment,
• determining potential hazards and appropriate safety precautions before
beginning new operations.
As the microprocessor was implemented and tested on the ALTERA DE2
Development Board, extra precautions were needed to be considered to ensure a safe
work environment. The following measures ensure that the board operates within its
normal operating conditions while maintain the health and safety of all project
members.
Microprocessor Design April 5, 2010
5
• Automatic testing was incorporated to check the integrity of the following units
before the first execution: system’s Memory Units (RAM and ROM), Input and
Output signal processing circuitry, the Arithmetic Logic Unit (ALU), Control Unit,
and Registers.
• Software was developed which during predetermined time intervals monitors for
electrical parameters such as Current or Voltage in the Circuit. When fault is
sensed it sends a signal to the board which halts further execution and
terminates the program. This circuitry continually tests for proper supply voltage
to the microprocessor.
• Overcurrent is an abnormal current greater than the full load value of the circuit.
This can occur due to short-circuitry or overload currents in any unit.
• Overload is an overcurrent which persists long enough to cause dangerous
overheating. This can occur during long start time, during multiple restarts in a
short interval and if the normal duty cycle of the processor is exceeded.
• An Alarm Signal is generated by the board and the program execution is halted if
an overload was to occur.
• The board was implemented in such a fashion so that failure to execute the
program disconnects the Voltage Source to prevent any false leakage of Current.
• An asynchronous Reset Signal for the Microprocessor was designed for manual
override to reset all units in case of a danger of overload.
Microprocessor Design April 5, 2010
6
• Microprocessor is designed so that the algorithm can’t be altered by anyone
except by the designers themselves.
2.1 Engineering Professionalism
To meet the requirements for professionalism in engineering, all engineers must
abide by the Professional Engineers Act (PEA), and the Professional Engineers Ontario
(PEO) Code of Ethics. As engineering is a self-regulated profession with strict rigor on its
code of ethics, it is of upmost importance that we follow the principles of fairness,
integrity and honesty.
During the project design there have been minimal ethical dilemmas from a
professional standpoint. As the project work was fairly separate for each individual,
there were never any conflicts of points of view, as we all trusted each other to have
been working at the best of their respective abilities. Professional engineering had
occurred at all times, as the only reasonable way for this project to even possibly be
completed is for each group partner to operate without impeding the work flow of the
other group members.
The only major difficulty was meeting specific preset deadlines, as previously
outlined by the project proposal. The proposal may have produced an unreasonable
timeline for the group to keep pace with. This may have been caused by our minimal
communication outside of our weekly meetings. Consistent contact was maintained
Microprocessor Design April 5, 2010
7
through emails as to keep each other up to date with status reports, and questions
regarding project difficulties/confusion.
Although during the development of a microprocessor there are reduced
chances for unprofessional behavior there was none that had truly impeded the quality
of work, or professional decisions that had to be made for the completion of the project.
Each group member’s professional responsibilities aided in meeting each member’s
individually designated goals. It has also enabled the achievement of the group’s goal
which was to successfully designing a microprocessor.
2.2 Project Management
Several project management techniques were used in order to coordinate,
manage and perform the project.
Weekly group meetings with Prof. M. Shams kept clear the objectives and progress of
the design project. It was here that we could clarify any individual misconceptions of
the design of the project with the supervisor. This portion of the project management
was fairly relaxed, which is important as to not be intimidated or fear the supervisor.
The relatively loose regulation of supervision had encouraged the group’s members to
improve communication with each other, instead of being completely autonomous with
very little knowledge of each other’s involvement of the project.
Open communication was encouraged (via email/phone), to enable the clear flow of
design concepts and ideas. This also promoted the project’s success for when any group
Microprocessor Design April 5, 2010
8
member arrived at a difficult design decision or had any other difficulty either of the
other group members had been able to assist.
The ability to perform the project is not something that could truly fall under project
management of the group. This ability rests heavily upon the individual group member
as the software required to complete the project is available in several laboratories
within the Department of Electronics at Carleton University; a free web service of the
program was also available for use at home. The performance expectations were clearly
displayed within the initial project proposal as shown in Figure 1 below.
Figure 1: Project Schedule
The partitioning of the workload relating to the project was decided during one
of the initial group meetings that were supervised by M. Shams. Each portion
designated was selected or compromised by the individual group members as to
encourage each individual to work in the field that sparks the most personal interest,
which would therefore increase workflow productivity.
Microprocessor Design April 5, 2010
9
Chapter 3
3.0 Project Overview
Before discussing the more technical side of the design of a 32-bit RISC
microprocessor with floating point unit, it important to receive a clear overview of the
components of a microprocessor. A simple microprocessor is built from five basic
integrated blocks as shown in Figure 2. These are:
● Inputs/Outputs
● Memory
● Datapath
● Control Unit
● Arithmetic Logic Unit
Figure 2: Processor Overview
Microprocessor Design April 5, 2010
10
Figure 2 clearly shows the organization of the microprocessor, which is
consistent throughout all types of processors. Every processor performs the same basic
functions of fetching decoding and executing, which require all of the five necessary
blocks.
The processor receives instructions from the Memory, which is responsible for
storing the instruction sets as well as data sets. The flow of data between the Memory
to the processor follows the implementation of the Datapath. The Datapath interprets
the instruction signals between the Control Unit, Memory, as well as the Input/Output
devices. This interpretation of data is regulated by the Control Unit’s output signals
which then branch to the Input/Output devices. The input and output devices, usually
consist of hardware such as a keyboard, or a graphics display.
3.1 Design Specifications
The Microprocessor design requires the implementation of a memory and register
unit which temporarily stores data within the microprocessor. The memory was given a
specified size of 512 x 32 bits. The size of each register in the microprocessor is specified
to 32-bits. The standard set of instruction classes to be performed by the
microprocessor was also specified. A description of these classes follows.
Microprocessor Design April 5, 2010
11
• R-type Instruction – Arithmetic Instructions (Addition, Subtraction,
Multiplication and Division of two operands) and Logical Instructions (A
Comparison of two operands).
• Branch Instruction – Makes a jump to the provided Memory address by
comparing two operands. Operands are compared for equality and if they are
equal the branch is executed.
• Load Instruction – Loads a data word from Memory into one of the specified
registers in the processor.
• Store Instruction – Stores a data word from a specified register into the specified
Memory address.
Microprocessor Design April 5, 2010
12
3.2 Design Methodology
The microprocessor was designed using Verilog Hardware Design Language
(Verilog-HDL). This allows the user to operate comfortably within the Verilog
programming language, for design, testing, as well as synthesis of the overall
microprocessor design. The Quartus II software was used to compile and simulate the
Verilog-HDL code, as it connects fairly easily to the ALTERA DE2 development boards
that the design must be implemented upon.
The design was partitioned into three distinct portions as mentioned in Section
2, as well as shown in Figure 3.
Figure 3: Workload Partitioning Chart
The design follows the Von Neumann Architecture, which follows the standard
FETCH DECODE EXECUTE pattern of microprocessors. This particular architecture allows
the instructions and data to be stored within the same memory. This particular
architecture has been chosen due to its highly-optimized instruction set, high
performance implementations, programmability (easy to express programs) and
Microprocessor Design April 5, 2010
13
reduction in the required hardware. It does this by sharing the functional units, while
also implementing pipelining, and as a result a smaller silicon size chip with a lower
operating power can be fabricated.
Microprocessor Design April 5, 2010
14
Chapter 4
4.0 Background of Floating Point Representation
Many basic microprocessors are unable to handle real number arithmetic, but
only integer manipulations. Real number manipulation allows for the processors to
handle rational, as well as possibly irrational numbers. This is very important for data
analysis and manipulation of various signals within Digital Signal Processing (DSP)
devices.
An important part of handling real numbers is scientific notation, which is a form
of handling real numbers that may be too large to be conveniently expressed in decimal
notation. This notation is presented as
[fraction] x 10[exponent]
[real] x 10[integer]
More often than not scientific notation is expressed in its normalized format.
This is the format of when the most significant integer is of the real number is the only
one to the left hand side of the decimal point. This allows for easy comparison of the
magnitudes of two numbers as they are expressed solely within the exponent of the
notation.
Microprocessor Design April 5, 2010
15
Examples of real numbers:
11/5 = 2.2ten
𝜋𝜋 ≈ 3.141593ten
5.73ten x 10 -4 (normalized scientific notation)
235.9722 x108 (scientific notation)
The floating point representation of real binary values allows microprocessors to
manipulate real numbers. This notation deals with the fractions created by real numbers
through the placement of binary points 1
as well as scientific notation.
Examples of real binary numbers:
110111.11two = 55.75
1011two x 23 (scientific notation)
1.0001two x 2-7 (normalized scientific notation)
There are different formats for handling floating point binary, such as MIPS and
IEEE-754 standards. In the design of a floating point unit these both require specific sizes
of both the exponent and fraction. The size of the exponent and fraction (commonly
referred to as mantissa) are determined by the size of the fixed word. A large exponent
1 Binary point is the binary term for a decimal point, as we are now working in binary notation instead of
decimal notation
Microprocessor Design April 5, 2010
16
would be ideal for a large range of numbers, while a larger size of the fraction allows for
a more precise representation of the numbers within the reduced range. For a 32-bit
word neither of these are much of problem as there is a relatively large range, with
capabilities of significant precision.
MIPS floating point representation was designed by MIPS Technologies
(-1)sign x [fraction] x 2[exponent]
With 32-bit MIPS representation floating point binary is expressed as :
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
S EXPONENT [8 bits] FRACTION [23 bits] Figure 4: Floating Point Binary
This format allows for 23-bits to express the fraction, with 8-bits expressing the
exponent. The exponent holds a bias of 127, which allows for the exponential to range
from +127 to -127.
MIPS may not have many limitations but it is not the best representation for
floating point numbers for binary computing. A more commonly used standard is the
IEEE-754 representation of floating point binary.
(-1)sign x [1+fraction] x 2[exponent]
Microprocessor Design April 5, 2010
17
It still uses the 32-bit format expressed like the MIPS, but the format assumes that the
fraction is constantly normalized, which enables the most significant bit to be implied.
This hidden bit allows for the fraction to actually be 24-bits instead of 23-bits long.
This format is preferred over the MIPS format mainly because it allows for
special representations of certain values such as Inf, and NaN to prevent interrupts.
Value Exponent Fraction Binary Zero Zero Zero 0000000000000000000000000000000 Signaling NaN 255 nonzero 1111111100000000000000000000001 Quiet NaN 255 nonzero 1111111110000000000000000000000 Infinity 255 Zero 1111111100000000000000000000000
Table 1: IEEE-754 Special Representations
These special representations do not cover overflow and underflow exceptions.
Overflow occurs when the exponent is too large to be represented, while underflow
occurs when the negative exponent is also too large to be represented.
Microprocessor Design April 5, 2010
18
4.1 Floating Point Unit
The floating point unit designed in this project utilizes the IEEE-754 format for
design optimization. The actual unit performs the standard ALU operations, as well as a
few extra operations that can only be done in floating point format. These operations
include:
● Addition
● Subtraction
● Multiplication
●Division
● Power
● Square Root
● Floating Point to Integer
● Integer to Floating Point
Many of the algorithms that were utilized throughout the design of the floating
point unit were created through basic arithmetic that can be done by hand.
Microprocessor Design April 5, 2010
19
Figure 5: Floating Point Block Diagram
4.2 Addition and Subtraction
The addition and subtraction modules follow very similar algorithms, as it is very
easy to switch between the two functions. The two functions were not complimentary
together as to increase the capability of a pipelining implementation so that multiple
instructions can occur before the completion of the algorithms.
The two algorithms follow the same basic initial steps:
Microprocessor Design April 5, 2010
20
Step 1:
Compare Exponent of two numbers and shift the smaller number to the
right until exponents match
The shift allows the two numbers to have the same exponent
which enables the numbers to the easily added together with a
basic arithmetic adder/subtractor that could be designed from an
ALU.
Step 2:
Add or Subtract significands
The specific addition/subtraction function module is called in
respect to the instruction implemented.
Step 3:
Normalize the sum by shifting right or left
Normalization of the sum adjusts for over flow or underflow. This
must be done as each floating point number is normalized as to
maintain consistency of arithmetic algorithms.
Step 4:
Round the Significand
Microprocessor Design April 5, 2010
21
Rounding the significand can be done to increase accuracy, but it
was decided that it would delay the operational speed of the
device, in comparison to the relatively high accuracy that can be
determined from a 22bit mantissa. Truncation was performed
instead, as to maintain the high speeds that the unit can operate
within.
Figure 6: Addition/Subtraction Module
Microprocessor Design April 5, 2010
22
4.2.1 Addition
The addition of the significands can be done for the sake of simplicity with a
basic Carry-Save Adder (CSA). However, a Carry-Look Ahead Adder (CLA) produces
results faster as it calculates both the “propagate” and the “generate” signals for the
group to avoid waiting for the ripple to determine the first group’s generated carry. The
group generate signal is the signal that “generates” the summation by passing the two
signals through an AND gate. This is done in parallel with the group propagation signal is
the signal that determines if the signal will pass along. This signal is created by passing
the group inputs through an OR gate.
In this project a 24 bit CLA Adder was used as to increase the speed of the
function.
Figure 7: Carry Look-Ahead Adder
Microprocessor Design April 5, 2010
23
4.2.2 Subtraction
The subtraction of the significands utilized the CLA used in the previous module.
As the difference between addition and subtraction is minimal it was very elementary to
change the addition module into a subtraction module.
The only technical change from the addition to subtraction was the mantissa of
the subtractor was converted into a negative value through two’s compliment
manipulation.
Figure 8: Two's Compliment
Microprocessor Design April 5, 2010
24
4.3 Multiplication and Division
The as with the Addition/Subtraction modules the Multiplication and Division
modules follow similar premises when dealing with floating point notation.
Step 1: Addition/Subtraction exponents without bias
The exponents are added or subtracted together, just as if this was
done by hand.
Step 2: Manipulation of Significands
Multiplication or Division of the significands is done at this stage,
where a separate module is called to perform the specified
operation.
Step 3: Check if Normalized and for Overflow
As binary multiplication/division produces an output that is a
summation of the sizes of the inputs, it is important to check if the
product/quotient is normalized, as well as the exponents being
check for overflow.
Microprocessor Design April 5, 2010
25
Step 4: Rounding or Truncation
Due to the large size of the mantissa, as well as for the sake of
speed, truncation was chosen to occur as it was deemed
unnecessary for a floating point number that already holds such
precision.
Step 5: Set the Sign
The sign it set by passing the two sign bits through an XOR gate to
produce the appropriate value.
Figure 9: Multiplier and Divider Module
Microprocessor Design April 5, 2010
26
4.3.1 Multiplier
There are several various algorithms for multiplication, but the “rolled out”
binary multiplier was used, as like the addition/subtraction modules it was the most
relatable and clear to understand and explain.
A simple binary adder performs a simple shift and summation for the entire
length of the multiplicand. This can be implemented within a loop to conserve space
within the chip design. This produces a synchronous circuit which therefore relies upon
24 clock edges until it is completed.
The “rolled out” version was used to make the same basic algorithm but instead
of the synchronous loop, each stage was laid out to produce the accurate multiplication
in much less than 24 clock edges. This format allows for easier implementation of
pipelining circuitry as to support multiple function calls simultaneously.
Microprocessor Design April 5, 2010
27
Step 1:
Check the multiplier bit [n]
Step 2:
If the multiplier bit [n] holds a value
of 1 then the product is summed with the
multiplicand and placed within the product
register
Step 3:
Shift the multiplicand left by 1 bit
Step 4:
Shift the multiplier right by 1 bit
Step 5:
Check if the loop has stepped
through each multiplier bit, if not then step
to the next bit (n+1) and repeat.
Microprocessor Design April 5, 2010
28
Figure 11: Multiplication Block Diagram
4.3.2 Division
The division algorithm is identical to the multiplication algorithm, and can be
implemented in a very similar manner. This division algorithm is different from the
multiplication algorithm implemented because it was kept in the iterative loop.
Microprocessor Design April 5, 2010
29
Step 1: Check the Remainder
Step 2a: If the remainder is greater than zero the quotient is shifted by 1-bit, and
the new LSB is set to a value of one.
Step 2b: If the remainder is less than zero the quotient is shifted by 1-bit, the new
LSB is set to a value of zero, and the remainder is restored.
Step 5: Check if the loop has stepped through each remainder bit, if not then
step to the next bit (n+1) and repeat.
Figure 12: Division Block Diagram
Microprocessor Design April 5, 2010
30
Figure 13: Division Algorithm
The loop was maintained because as the multiplication algorithm was already
built, the looped divider would provide an appropriate comparison during simulations,
and timing analysis.
4.4 Float to Integer
The integer to float unit was centrally designed with the purpose of use within
the Power Module. It separates the 23-bit fraction into an integer, numerator and
denominator.
Microprocessor Design April 5, 2010
31
It does this by placing the fraction into a shift register that is twice as large as an
integer register (2x 32-bit), as to maximize the size of the integers that can be produced.
As to order to produce an integer the exponent must be zero; therefore large register is
then shifted left or right according to the value of the exponent to set the exponent to
zero. If the exponent is too large for the shift register to manipulate then the register is
shifted to the far right or the far left and the exponent is adjusted accordingly.
Figure 14: Float to Integer Block
Microprocessor Design April 5, 2010
32
The numerator and denominator are formed by stepping through the bottom
segment (32-bits) of the shift register, while counting the value of bits. As the bits are
counted they follow the equation
𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 = �1
2𝑥𝑥
𝑥𝑥
0
Equation 1
𝑓𝑓𝐵𝐵𝐵𝐵𝑓𝑓𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(𝑥𝑥) =𝑁𝑁𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐷𝐷𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵
= �
𝑁𝑁𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐷𝐷𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵
, 𝑥𝑥 = 0
𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 +𝑁𝑁𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐷𝐷𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵
, 𝑥𝑥 = 1�
Equation 2
4.5 Integer to Float
Figure 15: Integer to Float Diagram
The Integer to Float Module accepts the inputs in signed binary integer format,
and normalizes the integer, which provides it with an exponent value of its own. The
importance of normalization was previously discussed in Section 4.0
Microprocessor Design April 5, 2010
33
4.6 Power Approximation
The Power Module of the FPU was to initially use a recursive algorithm, but a
looping algorithm provided many issues than were necessary for determining power of
a floating point number.
The first issue was the fact that the loop was in fact “a loop”. As floating point
representation handles large real numbers, it would be unwise to loop for extremely
large numbers, with large exponents. The loop method would prove to be far too slow
for floating point representation.
The second issue was the difficulty in creating the power of a real number (for
example 2.523.194). The looped algorithm had initially only dealt with integers in the
exponent form, but with the application real numbers, the situation had become more
difficult to manipulate.
The first issue was addressed by changing the Power Module into a Power
Approximation Module. The Power Approximation Module uses the IEEE-754 binary
representation of a 32-bit floating point number in its estimation of LOG2(X).
LOG2(x) = Xinteger/223 - 127
Equation 3
This approximation method is fairly accurate for its respective speed.
Microprocessor Design April 5, 2010
34
Figure 16: Log2 vs IEEE Estimate
However, a problem occurs when the logarithmic value is further manipulated,
the precision becomes greatly lost in comparison to its actual value.
Real Estimate “Lossy” Estimate X = 5 5 Xinteger = 1084227584 1084227584
Y = Log2(X) = 2.3219 Y =Xinteger
223 − 127 = 2.25 2
Z = 2*Y = 4.6439 Z = 2*Y = 4.5 4 2^Z = 25 Z + 127
223 = 1103101952 1065353220
XFloat= 16 1
Microprocessor Design April 5, 2010
35
Table 2: Log Estimate Error
This issue can be resolved by shifting the value of Xinteger to the left a few binary
points before passing it through the logarithmic estimate function. In this
implementation of the algorithm the Xinteger was shifted by two places and the results
can be seen in the table below.
Real Estimate “Lossy” Estimate X = 5 Xinteger*100 = 108422758400 108422758400
Y = Log2(X) = 2.3219 Y=(𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵223 ) − (127 ∗ 100) = 225 225
Z = 2*Y = 4.6439 Z= 2*Y = 450 450 2^Z = 25 Z+127∗100
223 /100 = 1103101952 1103101952
XFloat= 24 24 Table 3: Log Estimate Error Correction
The accuracy of the estimate of the power module has greatly increased from
the implementation. This can be further improved by shifting the initial Xinteger by several
more binary places.
The second issue was resolved by utilizing the Float to Integer Converter
Module. This module converts the binary real exponent into a more manipulative
integer format.
�𝐼𝐼𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 + �𝑁𝑁𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐷𝐷𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵
�� ∗ 10𝐸𝐸𝑥𝑥𝐸𝐸𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵
Equation 4
Microprocessor Design April 5, 2010
36
With the logarithmic estimate provided, the manipulation into a power module
becomes as simple as multiplication and division of an integer.
Example:
𝑃𝑃𝐵𝐵𝑃𝑃𝐵𝐵𝐵𝐵 = �log2[𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 ]�𝑌𝑌𝐵𝐵𝐵𝐵𝐵𝐵𝑟𝑟
Equation 5
𝐹𝐹𝐵𝐵𝐵𝐵𝑓𝑓𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 =𝑌𝑌𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑌𝑌𝑑𝑑𝐵𝐵𝐵𝐵𝐵𝐵𝑁𝑁𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵
∗ log2[𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 ]
Equation 6
𝐼𝐼𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 = 𝑌𝑌𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 ∗ log2[𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 ]
Equation 7
𝑃𝑃𝐵𝐵𝑃𝑃𝐵𝐵𝐵𝐵 = 2𝐼𝐼𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 + 2𝐹𝐹𝐵𝐵𝐵𝐵𝑓𝑓𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵
Equation 8
These calculations are within the block diagram in Figure 17 which shows the
flow of the individual steps to produce the power approximation module.
Microprocessor Design April 5, 2010
37
Figure 17: Power Unit
The power block is incapable of handling exponents outside the range of
+4.2950e+009 to - 4.2950e+009 as these numbers are too large for the algorithm to
properly operate.
Microprocessor Design April 5, 2010
38
4.7 Square-Root
There are several different iterative methods (i.e. Newton’s Method) for
developing the square-root estimate of a binary real number. The issue was once again,
that the methods take several iterations. For this reason, the Square-Root Module
utilizes the same method of logarithmic approximation as the Power Module.
This is much faster than the Power Module, as it does not rely upon the Float to
Integer Converter. It simply follows the formula:
ℎ𝐵𝐵𝑟𝑟𝑓𝑓_𝑟𝑟𝐵𝐵𝑋𝑋 =12∗ ( log2 𝑋𝑋𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝑋𝑋𝐵𝐵𝐵𝐵 )
Equation 9
𝑠𝑠𝑠𝑠𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 = 2ℎ𝐵𝐵𝑟𝑟𝑓𝑓 _𝑟𝑟𝐵𝐵𝑋𝑋
Equation 10
Microprocessor Design April 5, 2010
39
Figure 18: Squareroot Unit
4.8 Floating Point Control Unit
The Floating Point Control Unit is most vital portion of the coprocessor, as it is
responsible for organizing the various operations of the coprocessor. This is done by
handling only six opcode signals, with each representing the specific module called to
produce an output value. The control module handles the input instructions and checks
Microprocessor Design April 5, 2010
40
for special cases. Although IEEE-754 floating point representation was designed to
handle certain special cases, it was deemed better to be on the side of caution.
Figure 19: Floating Point Control Unit
The various exceptions the control unit is designed to catch are cases when the
inputs or outputs would be clearly: Zeros, NaNs or INFs.
Microprocessor Design April 5, 2010
41
For example:
Input x Zero = Zero
Input + Inf = INF
Input/Zero = NaN
After the control unit checks for special cases it then calls the individual modules
in event that the predetermined opcode is received.
Microprocessor Design April 5, 2010
42
Chapter 5
5.0 Digital Testing
After the complete coprocessor was designed, the overall digital testing began.
There were two types of digital design testing that was done on the design. These tests
were regarding structural analysis, as well as timing analysis.
5.1 Structural Analysis
Structural testing is a form of testing when specific inputs are used in the testing
of the circuit. These gauge the range of the design, and detect flaws within the design.
This is different from functional testing, because in structural testing the design is
known, and so is the ability to probe points along the designated testing paths.
The first case test shown in the table below is a standard test case, which is
comfortably within the operational range of the floating point unit’s parameters. This
test case shows that the floating point unit is operating properly under reasonable
conditions.
Microprocessor Design April 5, 2010
43
Standard case: (Input1 = 5, Input2 = 0.75)
Real Value Floating Point Value “FPU” Value A 5 0_10000001_01000000000000000000000 5 B 0.75 0_01111110_10000000000000000000000 0.75 Add 5.75 0_10000001_01110000000000000000000 5.75 Sub 4.25 0_10000001_00010000000000000000000 4.25 Mul 3.75 0_10000000_11100000000000000000000 3.75 Div 6.6667 0_10000001_10101010101010101010100 6.6667 Pow 3.3437 0_10000000_10101110000101000111101 3.3599 SQRT 2.2361 0_10000000_00011110101110000101000 2.2399
Table 4: Standard Test Case
More specific cases were also used to test the corners of the design. A few
results of the specific cases that were used are shown in the table below:
A B ADD SUB MUL DIV POWER Real Value 5 5 10 0 25 1 3125 FPU Value - - 10 0 25 1 2560 Real Value 5 -5 0 10 -25 -1 3.1605e-018 FPU Value - - 0 10 -25 -1 NaN Real Value 5 0 5 5 0 NaN 1 FPU Value - - 5 5 0 NaN 1 Real Value 5 Inf Inf -Inf Inf 0 Inf FPU Value - - Inf -Inf Inf 0 Inf
Table 5: Special Test Cases
Several more extra cases were tested with the results posted within Appendix B.
These cases test the corners of the design, which range from the smallest numbers the
FPU should be able to handle all the way to the largest.
Microprocessor Design April 5, 2010
44
5.2 Timing Analysis
Two versions of timing analysis were used on the digital design. The first one,
which can be seen in Table 6, is the Fast Model Timing Analyzer. The second version is
the Slow Model Timing Analyzer which is shown in Table 7. The fast timing model
utilizes best-case timing model of the fastest device to analyze and report the fastest
delay of the timing characteristics for the design. While the slow timing model utilizes
the worst-case scenario for the design’s timing characteristics.
Type Time From To Worst-case tsu 4.702 ns opcode[0] subB[30] Worst-case tco 11.824 ns mulA[30] valueout[30] Worst-case tpd 10.560 ns B[31] valueout[29] Worst-case th 4.808 ns A[0] mulA[0] Worst-case Minimum tco
4.231 ns floatmul: floatmulA|e[25]
valueout[3]
Worst-case Minimum tpd
4.286 ns opcode[1] valueout[7]
Fast Model Clock Setup: 'clk'
4.88 MHz ( period = 204.804 ns )
Power:power| float2int:float2pow| denominator[29]
Power:power|normFr[0]
Table 6: Fast Timing Analysis
The maximum operation frequency of the Fast Timing Model is a slow 4.88MHz,
while in the Slow Timing Model the maximum operating frequency is an even slower
2.21MHz. Table 6 clearly shows that the Float to Integer Module used within the Power
module is by far the slowest module, and it greatly affects the highest operating
frequency of the device.
Microprocessor Design April 5, 2010
45
Type Time From To Worst-case tsu 9.168 ns opcode[0] subB[30] Worst-case tco 24.432 ns mulB[24] valueout[30] Worst-case tpd 20.806 ns B[31] valueout[29] Worst-case th 9.778 ns A[0] mulA[0] Slow Model Clock Setup: 'clk'
2.21 MHz ( period = 453.352 ns )
Power:power| float2int:float2pow| numerator[18]
Power:power| normFr[0]
Table 7: Slow Timing Analysis
The Slow Model Analysis was done without the power module’s float to integer
converter, and produced a maximum frequency of 88.75MHz, with a Fast Timing
Analysis fmax of 199.80MHz. The slowest clock setup time was due to the Subtractor
Module needing to switch to a two’s compliment before it passes through the binary
adder. Although the removal of the two’s compliment would make the subtractor into
another floating point adder, curiosity took over, and resulted in impressive
improvements in speed. The Slow model Analyzer produced a fmax value of 144.7MHz,
while the fast model analyzer produced more than double that speed with an fmax
characterized at 320.82 MHz.
5.3 Implementation
The coprocessor was implemented onto the ALTERA DE2 development board as
shown in Figure 20. Due to the lack of inputs provided by the board it was unreasonable
to create a complex form of setting the input values for the device for live-testing.
Instead a set of preset inputs were assigned for purpose of presentation.
Microprocessor Design April 5, 2010
46
Figure 20: Altera DE2 Implementation The various switches determined the opcode, and set the operation to be
performed by the device. The push buttons were set as the reset input for the device,
for when a new opcode was to be inputted into the board. The outputs were displayed
on both the small LCD as well as the on the 18 LEDs located above the switches.
Due to the size of the LCD display, which did not allow the floating point unit to
display large real numbers, the 18 LEDs displayed the output in floating point binary
format. The eight green LEDs clearly showing the exponential value, while the rest
displayed a truncated version of the mantissa.
Microprocessor Design April 5, 2010
47
Chapter 6
6.0 Concluding Remarks
This concluding chapter allows for a brief review of the project, and to
emphasize on a few key points that developed during the course of the year.
6.1 Summary of Project Accomplishments
The coprocessor was successfully designed and implemented upon the ALTERA
DE2 Development board, using 32-bit data registers.
The addition and subtraction modules utilized the fastest basic binary addition
algorithm. The multiplication module is optimized for the ability to be pipelined, while
the divider utilized a slow looping algorithm. The Power module used IEEE logarithmic
estimation to improve performance, but was slowed down considerably by the Float to
Integer Converter that it required to fully operate.
The digital design was put under test, and analyzed to optimized performance
characteristics. There were a few small bugs here and there, but the floating point unit
successfully passed the rigorous digital device testing, although perceptively slow to the
commercial versions of the FPU, which operate at speeds around 250MHz.
Microprocessor Design April 5, 2010
48
6.2 Considerations for Future Work
There are still many more possibilities for a faster Floating Point Unit
coprocessor. Improvements of fmax within the Float to Integer Module would greatly
increase the speed by a minimum factor of four, and with improvements in the speed of
the two’s compliment of the Subtractor, the maximum operating frequency would be at
worst case somewhat close to the standard operating of commercial FPUs.
The multiplier is ready to be pipelined, and several tests are required to see how
well the coprocessor would combine with the regular 32-bit RISC microprocessor.
Microprocessor Design April 5, 2010
49
References
[1] Carleton University, “Laboratory Health and Safety Manual”, [Online]. Available at: http://www.doe.carleton.ca/undergrads/health-and-safety.pdf [Accessed: March 28 2010]. [2] D. A. Patterson and J. L. Hennessey, Computer Organization and Design, 3rd Ed. San Francisco: Morgan Kaufmann Publishers.
[3] Carleton University, “Microprocessor Systems”, ELEC 4601. [Online]. Available at: http://www.doe.carleton.ca/~shams/ELEC4601/Course_Notes.pdf [Accessed: Oct 17 2009].
[4] Carleton University, “Digital Design Flow”, ELEC 4706. [Online]. Available at: http://www.doe.carleton.ca/courses/ELEC4706/protected/class%20material/08-09-10%20LECTURES [Accessed: Oct 13 2009].
[5] Carleton University, “Binary Manipulation”, SYSC 3006. [Online]. Available at: http://www.sce.carleton.ca/courses/sysc-3006/f09/Part3-BinaryManipulations.pdf [Accessed: Oct 12 2009].
[6] ASIC WORLD, “Verilog Tutorials”, Deepak Kumar Tala [Online]. Available at: http://www.asic-world.com/verilog/veritut.html [Accessed: Sept 25 2009].
[7] Goldberg, David. 1991. “What Every Computer Scientist Should Know About Floating-Point Arithmetic.”[Online]. Available at: http://delivery.acm.org/10.1145/110000/103163/p5-goldberg.pdf [Accessed: Oct 5 2009].
Microprocessor Design April 5, 2010
50
Appendix A: Verilog Design Code
Addition Module
module adder (A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output finish,overflow; reg[7:0] expLarge,diffreg; reg [23:0] shift,noshift,out; reg sign,snorm; wire signA,signB,expoverflow; wire[24:0] addoutput,normLarge; wire [7:0] expA,expB,diff,expNorm; wire [23:0] fractionA,fractionB,normout,shiftout; assign fractionA[22:0]=A[22:0]; assign fractionB[22:0]=B[22:0]; assign fractionA[23]=1; assign fractionB[23]=1; assign expA=A[30:23]; assign expB=B[30:23]; assign diff=diffreg; assign signA=A[31]; assign signB=B[31]; //1.0 ALU Difference and shift SHIFTR8 SHIFT8(shift[23:0],shiftout[23:0],diff); always@(posedge clk or posedge rst) begin if(rst) begin shift<=24'b0; noshift<=24'b0; expLarge<=8'b0; diffreg<=8'b0; sign<=1'b0; snorm<=1'b0; end else if(start)begin if(expA==expB)begin
Microprocessor Design April 5, 2010
51
shift<=fractionA; noshift<=fractionB; expLarge<=expA; diffreg<=8'b0; sign<=signA; snorm<=1'b1; end else if(expA>expB)begin shift<=fractionB; noshift<=fractionA; expLarge<=expA; diffreg<=expA-expB; sign<=signA; snorm<=1'b1; end else if(expB>expA)begin shift<=fractionA; noshift<=fractionB; expLarge<=expB; diffreg<=expB-expA; sign<=signB; snorm<=1'b1; end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end end // Add Significands bitadder add(noshift,shiftout,1'b0,addoutput); // Normalize
Microprocessor Design April 5, 2010
52
normalizer addnorm(expLarge,addoutput,expNorm,normLarge,clk,rst,expoverflow,snorm,fnorm); // check for overflow? assign overflow=expoverflow; // output exponent assign OUT[30:23]=expNorm;//expNorm; // output truncated mantissa assign OUT[22:0]=normLarge[22:0]; // output sign assign OUT[31]=sign; assign finish=fnorm; endmodule
Microprocessor Design April 5, 2010
53
Subtraction Module module subtractor (A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output overflow,finish; reg[7:0] expLarge,diffreg; reg [23:0] shift,noshift,out; reg sign,snorm; wire signA,signB,expoverflow,fnorm; wire[24:0] suboutput,normLarge; wire [7:0] expA,expB,diff,expNorm; wire [23:0] fractionA,fractionB,normout,shiftout,shiftout1; assign fractionA[22:0]=A[22:0]; assign fractionA[23]=1; assign fractionB[22:0]=B[22:0]; assign fractionB[23]=1; assign expA=A[30:23]; assign expB=B[30:23]; assign diff=diffreg; assign signA=A[31]; assign signB=B[31]; //1.0 ALU Difference and shift SHIFTR8 SHIFT8sub(shift[23:0],shiftout[23:0],diff); always@(posedge clk or posedge rst) begin if(rst) begin shift<=24'b0; noshift<=24'b0; expLarge<=8'b0; diffreg<=8'b0; sign<=1'b0; snorm<=1'b0; end else if(start)begin if(expA==expB)begin shift<=fractionA; noshift<=fractionB; expLarge<=expA; diffreg<=8'b0;
Microprocessor Design April 5, 2010
54
sign<=1’b0; snorm<=1'b1; end else if(expA>expB)begin shift<=fractionB; noshift<=fractionA; expLarge<=expA; diffreg<=expA-expB; sign<=1’b0; snorm<=1'b1; end else if(expB>expA)begin shift<=fractionA; noshift<=fractionB; expLarge<=expB; diffreg<=expB-expA; sign<=1’b1; snorm<=1'b1; end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end end //2.0 Add Significands // this is the slowest part by 100MHz i blame the INV wire [23:0] negtemp; assign negtemp[23:20]=~shiftout[23:20]+1'b1; assign negtemp[19:15]=~shiftout[19:15]+1'b1; assign negtemp[14:12]=~shiftout[14:12]+1'b1; assign negtemp[11:8]=~shiftout[11:8]+1'b1;
Microprocessor Design April 5, 2010
55
assign negtemp[7:4]=~shiftout[7:4]+1'b1; assign negtemp[3:0]=~shiftout[3:0]+1'b1; bitadder sub(noshift,negtemp,1'b0,suboutput); //bitadder sub(noshift,(~shiftout+1'b1),1'b0,suboutput); // Normalize normalizer addnorm(expLarge,suboutput,expNorm,normLarge,clk,rst,expoverflow,snorm,fnorm); // check for overflow? assign overflow=expoverflow; // output exponent assign OUT[30:23]=expNorm;//expNorm; // output truncated mantissa assign OUT[22:0]=normLarge[22:0]; // output sign assign OUT[31]=sign; assign finish=fnorm; endmodule
Microprocessor Design April 5, 2010
56
Normalization Module
module normalizer(expin,in,expout,out,clk,rst,overflow,start,finish); input clk,rst,start; input [7:0]expin; input [24:0]in; output [23:0]out; output [7:0] expout; output finish,overflow; reg active,first; reg [24:0] regF,fregF; reg [8:0] regE,fregE; always@(posedge clk or posedge rst)begin if(rst)begin regF<=25'b0; regE[7:0]<=8'b0; fregF<=25'b0; fregE<=9'b0; active<=1'b0; first<=1'b0; end else if(start)begin if(!first)begin fregF<=fregF; fregE<=fregE; regF<=in[24:0]; regE[7:0]<=expin[7:0]; active<=1'b1; first<=1'b1; end else if(regF[24]==1'b1)begin regF<=regF>>1'b1; regE<=regE+1'b1; // Increment Exponent active<=1'b1; first<=1'b1; end else if(regF[23]==1'b0 && regF[24]==1'b0)begin //shift left regF<=regF<<1'b1; regE<=regE-1'b1; // Decrement Exponent active<=1'b1; first<=1'b1; end else begin regE<=regE;
Microprocessor Design April 5, 2010
57
regF<=regF; fregE<=regE; fregF<=regF; active<=1'b0; first<=1'b1; end end else begin regE<=regE; regF<=regF; fregE<=fregE; fregF<=fregF; active<=1'b0; first<=1'b0; end end assign out=fregF[23:0]; assign expout=fregE[7:0]; assign overflow=fregF[8]; assign finish=~active; endmodule
Microprocessor Design April 5, 2010
58
24- bit Addition Module
module bitadder(addinA,addinB,carryin,sum); input[23:0] addinA,addinB; input carryin; output [24:0]sum; wire carryout1,carryout2,carryout3,carryout4,carryout5,carryout6; wire [3:0] sum1,sum2,sum3,sum4,sum5,sum6; fourbitadder adder1(addinA[3:0],addinB[3:0],carryin,sum1,carryout1); fourbitadder adder2(addinA[7:4],addinB[7:4],carryout1,sum2,carryout2); fourbitadder adder3(addinA[11:8],addinB[11:8],carryout2,sum3,carryout3); fourbitadder adder4(addinA[15:12],addinB[15:12],carryout3,sum4,carryout4); fourbitadder adder5(addinA[19:16],addinB[19:16],carryout4,sum5,carryout5); fourbitadder adder6(addinA[23:20],addinB[23:20],carryout5,sum6,carryout6); assign sum[24] = carryout6; assign sum[23:20] = sum6; assign sum[19:16] = sum5; assign sum[15:12] = sum4; assign sum[11:8] = sum3; assign sum[7:4] = sum2; assign sum[3:0] = sum1; assign test=addinA+addinB; endmodule
Microprocessor Design April 5, 2010
59
4-bit Addition Module
module fourbitadder(addinA,addinB,carryin,sum,carryout); input[3:0] addinA,addinB; input carryin; output [3:0]sum; output carryout; wire[3:0] generation,propagation; wire [2:0] carrybit; assign sum[0] = propagation[0]^carryin; assign generation = addinA&addinB; assign propagation = addinA^addinB; assign carrybit[0] = generation[0]|(propagation[0]&carryin); assign carrybit[1] = generation[1]|(generation[0]&propagation[1])|(propagation[0]&propagation[1]&carryin); assign carrybit[2] = generation[2]|(generation[1]&propagation[2])|(generation[0]&propagation[1]&propagation[2])|(propagation[0]&propagation[1]&propagation[2]&carryin); assign sum[3:1] = propagation[3:1]^carrybit[2:0]; endmodule
Microprocessor Design April 5, 2010
60
Multiplication Module
module floatmul(A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output finish,overflow; reg active; reg [47:0] Mplier,Mcand,product,d,e; reg [7:0]counter; wire [23:0] fractionA,fractionB; wire [7:0] expA,expB; wire [8:0] expsum; assign expA=A[30:23]-127; assign expB=B[30:23]-127; assign fractionA={1'b1,A[22:0]}; assign fractionB={1'b1,B[22:0]}; // adding exponents without bias assign expsum = ((A[30:23]-127)+(B[30:23]-127))+127; // check for overflow assign overflow = expsum[8]; // multiplying significands always@(posedge clk)begin if(rst)begin d=0; e=0; active=1'b0; end else if(start) begin active=1'b1; d={({32{fractionA[1]}}&fractionB)&({32{fractionA[0]}}&fractionB),({32{fractionA[1]}}&fractionB)^({32{fractionA[0]}}&fractionB)}; e[0]=d[0]; d={({32{fractionA[2]}}&fractionB)&d[32:1],({32{fractionA[2]}}&fractionB)^d[32:1]}; e[1]=d[0]; d={({32{fractionA[3]}}&fractionB)&d[32:1],({32{fractionA[3]}}&fractionB)^d[32:1]}; e[2]=d[0]; d={({32{fractionA[4]}}&fractionB)&d[32:1],({32{fractionA[4]}}&fractionB)^d[32:1]};
Microprocessor Design April 5, 2010
61
e[3]=d[0]; d={({32{fractionA[5]}}&fractionB)&d[32:1],({32{fractionA[5]}}&fractionB)^d[32:1]}; e[4]=d[0]; d={({32{fractionA[6]}}&fractionB)&d[32:1],({32{fractionA[6]}}&fractionB)^d[32:1]}; e[5]=d[0]; d={({32{fractionA[7]}}&fractionB)&d[32:1],({32{fractionA[7]}}&fractionB)^d[32:1]}; e[6]=d[0]; d={({32{fractionA[8]}}&fractionB)&d[32:1],({32{fractionA[8]}}&fractionB)^d[32:1]}; e[7]=d[0]; d={({32{fractionA[9]}}&fractionB)&d[32:1],({32{fractionA[9]}}&fractionB)^d[32:1]}; e[8]=d[0]; d={({32{fractionA[10]}}&fractionB)&d[32:1],({32{fractionA[10]}}&fractionB)^d[32:1]}; e[9]=d[0]; //-----------10-----------d={({32{fractionA[11]}}&fractionB)&d[32:1],({32{fractionA[11]}}&fractionB)^d[32:1]}; e[10]=d[0]; d={({32{fractionA[12]}}&fractionB)&d[32:1],({32{fractionA[12]}}&fractionB)^d[32:1]}; e[11]=d[0]; d={({32{fractionA[13]}}&fractionB)&d[32:1],({32{fractionA[13]}}&fractionB)^d[32:1]}; e[12]=d[0]; d={({32{fractionA[14]}}&fractionB)&d[32:1],({32{fractionA[14]}}&fractionB)^d[32:1]}; e[13]=d[0]; d={({32{fractionA[15]}}&fractionB)&d[32:1],({32{fractionA[15]}}&fractionB)^d[32:1]}; e[14]=d[0]; d={({32{fractionA[16]}}&fractionB)&d[32:1],({32{fractionA[16]}}&fractionB)^d[32:1]}; e[15]=d[0]; d={({32{fractionA[17]}}&fractionB)&d[32:1],({32{fractionA[17]}}&fractionB)^d[32:1]}; e[16]=d[0]; d={({32{fractionA[18]}}&fractionB)&d[32:1],({32{fractionA[18]}}&fractionB)^d[32:1]}; e[17]=d[0]; d={({32{fractionA[19]}}&fractionB)&d[32:1],({32{fractionA[19]}}&fractionB)^d[32:1]}; e[18]=d[0]; //---------20----------- d={({32{fractionA[20]}}&fractionB)&d[32:1],({32{fractionA[20]}}&fractionB)^d[32:1]}; e[19]=d[0]; d={({32{fractionA[21]}}&fractionB)&d[32:1],({32{fractionA[21]}}&fractionB)^d[32:1]}; e[20]=d[0]; d={({32{fractionA[22]}}&fractionB)&d[32:1],({32{fractionA[22]}}&fractionB)^d[32:1]}; e[21]=d[0]; d={({32{fractionA[23]}}&fractionB)&d[32:1],({32{fractionA[23]}}&fractionB)^d[32:1]}; e[22]=d[0]; //---again!!! for N+1 iterations or good luck
Microprocessor Design April 5, 2010
62
d={({32{fractionA[23]}}&fractionB)&d[32:1],({32{fractionA[23]}}&fractionB)^d[32:1]}; e[22]=d[0]; //--------- e[47:23]=d; active=1'b0; end else begin d=0; e=0; active=1'b0; end end // truncation // output the mantissa assign OUT[22:0]=e[45:22];//e[45:23];//46:22 // output exponent assign OUT[30:23]=expsum[7:0]; // set the sign // xor the signs together assign OUT[31]={A[31] ^ B[31]}; assign finish=~active; endmodule
Microprocessor Design April 5, 2010
63
Division Module
module floatdiv(A,B,OUT,clk,rst,overflow,start,finish);//floatdiv input clk,rst,start; input[31:0] A,B; output[31:0] OUT; output overflow,finish; wire [7:0] expA,expB; wire [8:0] expsub; assign expA=A[30:23]-127; assign expB=B[30:23]-127; reg active; reg [46:0] remainder,divisorreg;//46:0 reg [23:0] quotientreg,outreg; reg [7:0] counter; //adding exponents without bias assign expsub =((A[30:23]-127)-(B[30:23]-127))+127; // check for overflow assign overflow = expsub[8]; //the divider starts here always@(posedge clk or posedge rst) begin if(rst)begin remainder<={22'b0,1'b1,A[22:0]}; quotientreg<=24'b0; divisorreg<={1'b1,B[22:0],23'b0}; counter<=7'b0; active<='b0; outreg<=24'b0; end else if(start)begin if(counter<25)begin//25 remainder<=remainder-divisorreg; if(remainder[46])begin // shift quotient to the left quotientreg<={quotientreg[22:0],1'b0}; end else begin// restore if less than zero remainder<=remainder+divisorreg; // shift quotient to the left quotientreg<={quotientreg[22:0],1'b1}; end // shift divisor to the right divisorreg<={1'b0,divisorreg[46:1]}; counter<=counter+1'b1;
Microprocessor Design April 5, 2010
64
active<=1'b1; outreg<=outreg; end else begin quotientreg<=quotientreg; divisorreg<=divisorreg; remainder<=remainder; counter<=counter; active<=1'b0; outreg<=quotientreg; end end else begin quotientreg<=quotientreg; outreg<=outreg; divisorreg<=divisorreg; remainder<=remainder; counter<=counter; active<=1'b0; end end assign OUT[30:23]=expsub[7:0]; assign finish=~active; assign OUT[22:0]=outreg[22:0]; // set the sign // xor the signs together assign OUT[31]={A[31] ^ B[31]}; endmodule
Microprocessor Design April 5, 2010
65
Floating Point to Integer Conversion Module module float2int(IN,clk,rst,integerOUT,numerator,denominator,sign,INTexp,start,finish); input [31:0] IN;//the_float input clk,rst,start; output [31:0] integerOUT,numerator,denominator; output sign,finish; output [7:0] INTexp; wire signed [7:0] diff; wire unsigned [7:0]expIN; wire [63:0]fraction; wire [31:0] fractionIN,integerIN,integerOUT; reg active; reg [31:0] bincount,denominator,numerator; reg [63:0]fractionshift; reg [7:0] counter,intexp; assign fraction[31:9]=IN[22:0]; assign fraction[32]=1; assign expIN=IN[30:23]; assign diff=expIN-127; // normalize the exponent assign integerIN=fractionshift[63:32]; assign fractionIN=fractionshift[31:0]; assign integerOUT=fractionshift[63:32]; //the integer assign sign=IN[31]; //the positive/negative sign assign INTexp=intexp-127; assign finish=~active; //shift A into integer and fraction always@(posedge rst or posedge clk)begin if (rst) begin fractionshift<=64'b0; intexp<=8'b0; end else if(expIN<= 159 && expIN>= 95)begin if(expIN<127)begin fractionshift<=fraction>>(-diff); intexp<=expIN+(-diff); end else begin fractionshift<=fraction<<diff; intexp<=expIN-diff; end
Microprocessor Design April 5, 2010
66
end else if(expIN>159)begin // for a large integer fractionshift<=fraction<<31; intexp<=expIN-5'b11111; // decrement exponent end else if(expIN<95)begin // for a small fraction fractionshift<=fraction>>31; intexp<=expIN+5'b11111; // increment exponent end else begin fractionshift<=fraction; intexp<=intexp; end end // find the numerator and denominator integers of the floating point // by adding the fractions 1/2+1/4+1/8..etc = 0.875=7/8 always@(posedge clk or posedge rst)begin if(rst)begin counter<=32;//0 bincount<=1; numerator<=1'b0; denominator<=1'b1; active<=1'b0; end else if(start)begin if(counter>0)begin counter<=counter-1'b1; bincount<=bincount*2'b10; active<=1'b1; if(fractionIN[counter])begin //cross multiplying denominator<=bincount*denominator; numerator<=bincount*numerator+denominator; end else begin numerator<=numerator; denominator<=denominator; end end else begin counter<=counter; bincount<=bincount; numerator<=numerator; denominator<=denominator;
Microprocessor Design April 5, 2010
67
active<=1'b0; end end end endmodule
Microprocessor Design April 5, 2010
68
Integer to Floating Point Conversion Module module INT2FLOAT(in,out,clk,rst,start,finish); input clk,rst,start; input [31:0]in; output [31:0] out; output finish; reg [64:0] shiftreg,fshiftreg; reg [7:0] shiftexp,fshiftexp; reg active,first,sign; always@(posedge clk or posedge rst)begin if (rst)begin shiftreg<=65'b0; shiftexp<=8'b10111111;// 159=8'b10011111 //191=10111111 active<=1'b0; first<=1'b1; fshiftreg<=65'b0; fshiftexp<=8'b10001110; sign<=1'b0; end else if(start)begin if(first)begin if(in[31])begin// if negative shiftreg[31:0]<=~in[31:0]+1'b1; sign<=1'b1; end else begin shiftreg[31:0]<=in[31:0]; sign<=1'b0; end shiftexp<=shiftexp; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=1'b1; first<=1'b0; end else if(!shiftreg[64])begin shiftreg<=shiftreg<<1'b1; shiftexp<=shiftexp-1'b1; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=1'b1; first<=1'b0; sign<=sign; end
Microprocessor Design April 5, 2010
69
else begin shiftreg<=shiftreg; shiftexp<=shiftexp; fshiftreg<=shiftreg; fshiftexp<=shiftexp; active<=1'b0; first<=1'b0; sign<=sign; end end else begin shiftreg<=shiftreg; shiftexp<=shiftexp; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=active; first<=first; sign<=sign; end end assign out={sign,fshiftexp[7:0],fshiftreg[63:41]}; assign finish=~active; endmodule
Microprocessor Design April 5, 2010
70
Power Module module Power(A,B,OUT,clk,rst,start,finish); input clk,rst,start; input [31:0] A,B; output finish; output [31:0] OUT; wire [63:0] log,mullog,mullog2; wire [31:0] integerOUT,numerator,denominator,OUTmul; wire [7:0] expPow; wire ffloat; reg [63:0] normInt,normFr; reg [31:0] check,checkout; reg active; float2int float2pow(B,clk,rst,integerOUT,numerator,denominator,sign,expPow,start,ffloat); assign log=(A*100)/8388608-127*100; // convert to log A/(2^(23))-127; assign mullog=(numerator*log/denominator); //+log apply to the power of B assign mullog2=log*integerOUT; //seperately include the integer //check for invalids always@(posedge clk or posedge rst)begin if(rst)begin normInt<=63'b0; normFr<=63'b0; active<=1'b1; end else if(ffloat)begin if(A<=10'd1065353216)begin // if negative or zero normInt<=32'b1111111100000000000000000000000;// NaN normFr<=32'b0; active<=1'b0; end else if(numerator==0)begin normFr<=32'b0; normInt<=((mullog2+127*100)*8388608)/100; active<=1'b0; end else if(integerOUT==0)begin normInt<=32'b0; normFr<=((mullog+127*100)*8388608)/100; active<=1'b0; end
Microprocessor Design April 5, 2010
71
else begin //because it can't do log2(0) normInt<=((mullog2+127*100)*8388608)/100; // convert from log (A+127)*(2^(23)); normFr<=((mullog+127*100)*8388608)/100; active<=1'b0; end end else begin normInt<=normInt; normFr<=normInt; active<=1'b1; end end always@(posedge clk or posedge rst)begin if(rst)begin check<=32'b0; checkout<=32'b0; end else if(!active)begin if(B[31]==1'b1)begin check<=normFr+normInt; checkout<={check[31],(~check[30:23]+1'b1),check[22:0]}; end else begin check<=check; checkout<=normFr+normInt; end end else begin check<=check; checkout<=checkout; end end assign OUT=checkout; assign finish=~active; endmodule
Microprocessor Design April 5, 2010
72
Square Root Module module SQRT(A,OUT,clk,rst); input [31:0]A; input clk,rst; output [31:0]OUT; wire [63:0] logrt,mulrt; reg [31:0] normIntr; assign logrt=(A*100)/8388608-127*100; // convert to log A/(2^(23))-127; assign mulrt=logrt/2; // apply the root always@(posedge clk or posedge rst)begin if(rst) normIntr<=0; else if(A<=10'd1065353216) // if negative or zero normIntr<=32'b1111111100000000000000000000000;// NaN else normIntr<=((mulrt+127*100)*8388608)/100; // convert from log (A-127)*(2^(23)); end assign OUT=normIntr; endmodule
Microprocessor Design April 5, 2010
73
Control Module module Control(opcode,A,B,clk,rst,valueout); input clk,rst; input[2:0] opcode; input[31:0] A,B; output[31:0] valueout; reg [31:0]OUT; reg [31:0] addA,addB,subA,subB,divA,divB,mulA,mulB,powA,powB,sqrtA; reg sdiv,spow,sadd,ssub,smul,ssqrt,finish; wire [31:0] addOUT,subOUT,OUTdiv,OUTmul,OUTpow,root; // declare constants wire[31:0] Inf,NaN,Zero,One; wire /*fpow,fdiv,fadd,fsub,fmul,fsqrt,*/addof,subof,mulof,divof; assign Inf=32'b1111111100000000000000000000000; assign NaN=32'b1111111110000000000000000000000; assign One=32'b0011111110000000000000000000000; assign Zero=32'b0000000000000000000000000000000; adder addition(addA,addB,addOUT,clk,rst,addof,sadd,fadd); //A+B subtractor subtraction(subA,subB,subOUT,clk,rst,subof,ssub,fsub); //A-B floatmul floatmulA(mulA,mulB,OUTmul,clk,rst,mulof,smul,fmul);// A*B floatdiv floatdivA(divA,divB,OUTdiv,clk,rst,divof,sdiv,fdiv);// A/B Power power(powA,powB,OUTpow,clk,rst,spow,fpow);//A^B SQRT squareroot(sqrtA,root,clk,rst); // check for Zeros NaN & INFs inputs // check for Special Case Statements always@(posedge clk or posedge opcode)begin // opcode case statements case(opcode) 0: begin sdiv<=1'b0;spow<=1'b0;sadd<=1'b0;ssub<=1'b0;smul<=1'b0;ssqrt<=1'b0; addA<=1'b0;addB<=1'b0;subA<=1'b0;subB<=1'b0;divA<=1'b0;divB<=1'b0;mulA<=1'b0;mulB<=1'b0;powA<=1'b0;powB<=1'b0;sqrtA<=1'b0; OUT<=NaN; end //For the Adder =============================================================== 1: begin if(A[30:0]==Zero[30:0]) OUT<=B; else if(B[30:0]==Zero[30:0]) OUT<=A;
Microprocessor Design April 5, 2010
74
else if(A==Inf || B==Inf) OUT<=Inf[30:0]; else if(A[30:0]==B[30:0] && A[31]!=B[31]) //A+(-A) or (-A)+A OUT<=Zero; else if(A[31]==1'b1 && B[31]==1'b0)begin //-A+B = B-A subB<={1'b0,A[30:0]}; subA<=B; OUT<=subOUT; end else if (A[31]==1'b0 && B[31]==1'b1)begin //A+-B = A-B subB<={1'b0,B[30:0]}; subA<=A; ssub<=1'b1; OUT<=subOUT; end else if (A[31]==1'b0 && B[31]==1'b1)begin //-A + -B = -(A+B) addB<={1'b0,B[30:0]}; addA<={1'b0,A[30:0]}; OUT<={1'b1,addOUT[30:0]}; sadd<=1'b1; end else begin addA<=A; addB<=B; sadd<=1'b1; OUT<=addOUT; end end //For the Subtractor ============================================================= 2: begin if(A==B) OUT<=Zero;// just make it zero else if(A[30:0]==Zero[30:0]) OUT<={~B[31],B[30:0]}; else if(B[30:0]==Zero[30:0]) OUT<=A; else if(A[31]==1'b1 && B[31]==1'b0)begin //-A - B = -(B+A) addA<={1'b0,A[30:0]}; addB<=B; sadd<=1'b1; OUT<={1'b1,addOUT[30:0]}; end else if (A[31]==1'b0 && B[31]==1'b1)begin //A - -B = A+B addB<={1'b0,B[30:0]}; addA<=A; sadd<=1'b1;
Microprocessor Design April 5, 2010
75
OUT<={1'b0,addOUT[30:0]}; end else if (A[31]==1'b1 && B[31]==1'b1)begin //- A - -B = B-A subA<={1'b0,B[30:0]}; subB<={1'b0,A[30:0]}; ssub<=1'b1; OUT<=subOUT; end else begin subA<=A; subB<=B; ssub<=1'b1; OUT<=subOUT; end end // For the Mulitplier ============================================================= 3: begin if (A[30:0]==Zero[30:0]|| B[30:0]==Zero[30:0]) //if(A*Zero) OUT<={{A[31]^B[31]},Inf[30:0]}; // Inf else if(A[30:0]==Inf[30:0]||B[30:0]==Inf[30:0]) //if(A*Inf) OUT<=Zero; // Zero else begin mulA<=A; mulB<=B; smul<=1'b1; OUT<=OUTmul; end end // For the Divider ============================================================ 4: begin // varieties of Zero or NaN or Inf if(A[30:0]==Zero[30:0]) OUT<=Zero; // Zero else if(B[30:0]==Zero[30:0]) OUT<={{A[31]^B[31]},Inf[30:0]}; // Inf else if(B[30:0]==Inf[30:0]) //if(A/Inf) OUT<=Zero; // Zero else if(A[30:0]==B[30:0]) // 1 OUT[31:0]<={{A[31]^B[31]},One[30:0]};//One else begin divA<=A; divB<=B; sdiv=1'b1; OUT<=OUTdiv;
Microprocessor Design April 5, 2010
76
end end // For the Power ============================================================== 5: begin if(A[31]) OUT<=NaN; else if (A[30:0]==Zero[30:0]) // +/- Zero OUT<=Zero; if(B[30:0]==Zero[30:0]) OUT<=One; else begin powA<=A; powB<=B; spow<=1'b1; if(fpow) OUT<=OUTpow; end end // For the SquareRoot ============================================================ 6: begin if(A[31]) OUT<=NaN; else if (A[30:0]==Zero[30:0]) // +/- Zero OUT<=Zero; else begin sqrtA<=A; OUT<=root; end end // Default Case =============================================================== default: begin sdiv<=1'b0;spow<=1'b0;sadd<=1'b0;ssub<=1'b0;smul<=1'b0;ssqrt<=1'b0; addA<=1'b0;addB<=1'b0;subA<=1'b0;subB<=1'b0;divA<=1'b0;divB<=1'b0;mulA<=1'b0;mulB<=1'b0;powA<=1'b0;powB<=1'b0;sqrtA<=1'b0; OUT<=NaN; end endcase end // output the output value assign valueout=OUT; endmodule
Microprocessor Design April 5, 2010
77
Appendix B: Digital Testing Results Standard Case Waveforms
Addition
Subtraction
Multiplication
Division
Power
Microprocessor Design April 5, 2010
78
Square-root
Microprocessor Design April 5, 2010
79
Corner Case Tables
Real Value Floating Point Value “FPU” Value A SMALLEST 0_00000001_00000000000000000000000 5.8774717e-39 B 5 0_10000001_01000000000000000000000 5 Add 5 0_10000001_01000000000000000000000 5 Sub 5 0_10000001_01000000000000000000000 5 Mul 2.9387e-038 0_00000011_11000000000000000000000 8.2284604e-38 Div 1.1755e-039 0_11111111_11001100110011001100101 INF Pow* 7.0138e-192 0_01110011_10000101000111101011100 3.7109374e-4 SQRT 2.6484e-096 0_01000000_00000000000000000000000 1.0842021e-19 Real Value Floating Point Value “FPU” Value A LARGEST 0_11111110_11111111111111111111110 3.4028232e+38 B 5 0_10000001_01000000000000000000000 5 Add 0_11111110_11111111111111111111110 3.4028232e+38 Sub 1_11111110_11111111111111111111110 3.4028232e+38 Mul 1.7014e+039 0_00000000_01111111111111111111100* Overflow(INF) Div 6.8056e+037 0_11111100_11001100110011001100100 7.6563520e+37 Pow 4.5624e+192 1_01111101_11110011001100110011001 -4.8749998e-1 SQRT 2.1360e+096 0_10111110_11111101011100001010001 1.8354509e+19 Real Value Floating Point Value “FPU” Value A -SMALLEST 1_00000001_00000000000000000000000 -5.8774717e-39 B 5 0_10000001_01000000000000000000000 5 Add 5 0_10000001_01000000000000000000000 5 Sub 5 1_10000001_01000000000000000000000 5 Mul -2.9387e-038 1_00000011_01000000000000000000000 -8.2284604e-38 Div -1.1755e-039 1_11111111_11001100110011001100101 - INF Pow -7.0138e-192 1_10001000_00000000000000000000000 -5.1200000e+2 SQRT NaN 0_11111111_10000000000000000000000 NaN Real Value Floating Point Value “FPU” Value A -LARGEST 1_11111110_11111111111111111111110 -3.4028232e+38 B 5 0_10000001_01000000000000000000000 5 Add -3.4028232e+38 1_11111110_11111111111111111111110 -3.4028232e+38 Sub -3.4028232e+38 1_11111110_11111111111111111111110 -3.4028232e+38 Mul -1.7014e+039 1_00000000_01111111111111111111100* Overflow(INF) Div -6.8056e+037 1_11111100_11001100110011001100100 7.6563520e+37 Pow 4.5624e+192 0_01111101_11110011001100110011001 4.8749998e-1 SQRT NaN 0_11111111_10000000000000000000000 NaN * note: the corner cases are too large for the power unit algorithm to handle