![]() |
(4.50) |
The significand
is a sequence of
bits
,
where
or
, with an implied binary point (analogous to a
decimal point) between bits
and
. Thus, the value of
is calculated as:
![]() |
(4.51) |
For double precision arithmetic, the standard defines
,
, and
. The number
is
represented as a 64-bit quantity with a 1-bit sign
, an 11-bit biased
exponent
, and a 52-bit fractional mantissa
composed of the bit string
. Since the
exponent can always be selected such that
(and thus,
), the value of
is constant and it does not need to be
stored in the binary representation.
63 | 62
![]() ![]() ![]() |
51
![]() ![]() ![]() |
![]() |
![]() |
![]() |
The integer value of the 11-bit biased exponent
is calculated as:
![]() |
(4.52) |
The standard divides the set of representable numbers into the following five categories:
![]() |
(4.53) |
![]() |
(4.54) |
Table 4.2 summarizes all of the representable double
precision numbers. The binary representation is presented with
spaces separating the four 16-bit subsets of the 64-bit value, and the
symbol
separating the sign bit, exponent bits, and mantissa
bits. The numbers in the first column refer to the
aforementioned five categories of representable numbers.
![]() |
|
1![]() ![]() |
|
1 |
![]() |
![]() |
|
1![]() ![]() |
|
2 | ![]() |
1![]() ![]() |
|
![]() |
|
1![]() ![]() |
|
![]() |
|
![]() |
|
1![]() ![]() |
|
3 |
![]() |
![]() |
|
1![]() ![]() |
|
![]() |
|
![]() |
|
1![]() ![]() |
|
![]() |
|
1![]() ![]() |
|
4 |
![]() |
![]() |
|
1![]() ![]() |
|
![]() |
|
1![]() ![]() |
|
5 | ![]() |
0![]() ![]() |
|
![]() |
|
0![]() ![]() |
|
4 |
![]() |
![]() |
|
0![]() ![]() |
|
![]() |
|
0![]() ![]() |
|
![]() |
|
![]() |
|
0![]() ![]() |
|
3 |
![]() |
![]() |
|
0![]() ![]() |
|
![]() |
|
![]() |
|
0![]() ![]() |
|
2 | ![]() |
0![]() ![]() |
|
It is possible that the result of an operation on two normalized
numbers will not itself be representable as a normalized number.
Consider the normalized numbers
and
. Clearly,
. However, in finite
precision normalized floating point arithmetic
because
, which is too small
to be represented as a normalized number. It is therefore rounded to
the value of 0 [128, pp. 23-24].
The use of denormalized numbers ensures that the relationship
![]() |
(4.55) |
The IEEE standard can represent
normalized numbers, but only
denormalized numbers. Denormalized numbers are generally not
encountered in routine calculations. The ratio of denormalized to
normalized numbers is
.
Furthermore, the denormalized numbers are not uniformly distributed
throughout the representable floating point space; rather, they occupy
two contiguous groups on either side of 0. Certain operations,
however, such as root finding, iteratively generate numbers that are
increasingly close to 0. Therefore it is important to allow for the
possibility of encountering denormalized numbers when creating robust
arithmetic software.