Floating Point Numbers
-
Upload
jolene-riley -
Category
Documents
-
view
18 -
download
0
description
Transcript of Floating Point Numbers
![Page 1: Floating Point Numbers](https://reader035.fdocuments.net/reader035/viewer/2022072015/56813028550346895d95b3ee/html5/thumbnails/1.jpg)
Floating Point Numbers
Muddsar JamilCS 147
![Page 2: Floating Point Numbers](https://reader035.fdocuments.net/reader035/viewer/2022072015/56813028550346895d95b3ee/html5/thumbnails/2.jpg)
Introduction & Representation
Provides the ability to represent very large numbers, as well as very small numbers.
Example: 1 Trillion = 40 bits to left of radix pt. Retaining as much precision as needed
increase calculation efficiency.
A great deal of extra hardware is required in order to store/manipulate numbers with 80 bits or more.
![Page 3: Floating Point Numbers](https://reader035.fdocuments.net/reader035/viewer/2022072015/56813028550346895d95b3ee/html5/thumbnails/3.jpg)
Computer Representation of Floating Point Numbers
_ _ _ _ . _ _ _ _ _ _ _ _ _ _ _ _
This format makes for easier comparison: =, =/=, <=, ≥
Example: Convert (358)10
in to the above format to be used as a floating point number.
Java: Float x = new Float(358.0f)
Sign Bit0 = +1 = -
3-bitexponent
Three base 16 digits
![Page 4: Floating Point Numbers](https://reader035.fdocuments.net/reader035/viewer/2022072015/56813028550346895d95b3ee/html5/thumbnails/4.jpg)
Example Continued
First step is to convert 358 from base 10 to 16. Using Horner's method:
358/16 = 22 --- R 6 22/16 = 1 --- R 6 358
10 = 166
16 Next, convert to floating-point and Normalize
(166)10
= (166.)16
x 160
Normalize: ( .166 )16
x 163
The exponent is 3, but we represent it in excess 4: 0 1 1 (+3)
10 Excess 4 + 1 0 0 (+4)
10
= 1 1 1
0 1 1 1 . 0 0 0 1 0 1 1 0 0 1 1 0 + 3 1 6 6
Sign Expon. Fraction
![Page 5: Floating Point Numbers](https://reader035.fdocuments.net/reader035/viewer/2022072015/56813028550346895d95b3ee/html5/thumbnails/5.jpg)
Fractional -> Fixed Point Conversion
Convert (XYZ.375)10
to Binary First, convert XYZ using Horner's method. Next, Convert the .375
10 as following:
.375 x 2 = 0.75 .75 x 2 = 1.5 .5 x 2 = 1.0
So (.375)10
= (.011)2
Most Significant Bit
Least Significant Bit
![Page 6: Floating Point Numbers](https://reader035.fdocuments.net/reader035/viewer/2022072015/56813028550346895d95b3ee/html5/thumbnails/6.jpg)
IEEE 754 Floating Point Standard
Created in 1985 to ensure standard representation among different systems.
Most new architectures support IEEE 754.
Two Formats: Single Precision
1 8 23
Double Precision
=32 bits totalSign Expon. Fraction
=64 bits1 11 52Sign Expon. Fraction
![Page 7: Floating Point Numbers](https://reader035.fdocuments.net/reader035/viewer/2022072015/56813028550346895d95b3ee/html5/thumbnails/7.jpg)
IEEE 754 Representations
Can Represent (among others) Non-zero, normalized numbers
Clean zero All 0s in exponent and fraction Sign bit can be 0, or 1, to represent +0 or -0
Infinity / Overflow / NaN Exponent contains all 1s, Fraction is all 0s Sign bit can be 0, or 1 0 / 0; Sqrt(-1);