next up previous contents index
Next: 4.8.5 Implementation of rounded Up: 4.8 Rounded interval arithmetic Previous: 4.8.3 Comparison of two   Contents   Index

4.8.4 Hardware rounding for rounded interval arithmetic

Since floating point numbers are represented in finite precision, many values may need to be rounded to a representable bit pattern [4]. The IEEE-754 standard defines four rounding modes [10]: 1) Round to nearest (the default mode); 2) Round to positive infinity; 3) Round to negative infinity; and 4) Round to zero. We can examine the effects of these rounding modes by calculating intermediate values between two adjacent exactly representable floating point numbers:
$\displaystyle X_1 = +2.0000000000000009
= 0\!\cdot\!10000000000\!\cdot\!0000
\; 0000000000000000 \; 0000000000000000 \; 0000000000000010$      
$\displaystyle X_2 = +2.0000000000000013
= 0\!\cdot\!10000000000\!\cdot\!0000
\; 0000000000000000 \; 0000000000000000 \; 0000000000000011$      

which differ only in the last bit of the mantissa.

Since IEEE-754 represents the mantissa with 52 bits, to exactly represent the three uniformly spaced intermediary values, $ X_1 +
\frac{1}{4}(X_2-X_1)$ , $ X_1 + \frac{1}{2}(X_2-X_1)$ , and $ X_1 +
\frac{3}{4}(X_2-X_1)$ , would require two additional bits in the mantissa, as shown in Table 4.4. To represent the negative values $ -X_1$ and $ -X_2$ only the sign bit is changed from 0 to 1; the exponent and mantissa bit patterns remain the same. The rounding mode round to zero is not depicted in the table since it is not relevant to our application. Round to zero is equivalent to round to negative infinity for positive values, and to round to positive infinity for negative values.


Table 4.4: Comparison of rounding modes (adapted from [4])
Actual Round To Nearest Round To $ +\infty$
Value Mantissa Rounded Represented Rounded Represented
52 bits +2 Mantissa Value Mantissa Value
$ +2.0 \ldots 009$ $ 00 \ldots 010$ 00 $ 00 \ldots 010$ $ +2.0 \ldots 009$ $ 00 \ldots 010$ $ +2.0 \ldots 009$
$ +2.0 \ldots 010$ $ 00 \ldots 010$ 01 $ 00 \ldots 010$ $ +2.0 \ldots 009$ $ 00 \ldots 011$ $ +2.0 \ldots 013$
$ +2.0 \ldots 011$ $ 00 \ldots 010$ 10 $ 00 \ldots 010$ $ +2.0 \ldots 009$ $ 00 \ldots 011$ $ +2.0 \ldots 013$
$ +2.0 \ldots 012$ $ 00 \ldots 010$ 11 $ 00 \ldots 011$ $ +2.0 \ldots 013$ $ 00 \ldots 011$ $ +2.0 \ldots 013$
$ +2.0 \ldots 013$ $ 00 \ldots 011$ 00 $ 00 \ldots 011$ $ +2.0 \ldots 013$ $ 00 \ldots 011$ $ +2.0 \ldots 013$
$ -2.0 \ldots 009$ $ 00 \ldots 010$ 00 $ 00 \ldots 010$ $ -2.0 \ldots 009$ $ 00 \ldots 010$ $ -2.0 \ldots 009$
$ -2.0 \ldots 010$ $ 00 \ldots 010$ 01 $ 00 \ldots 010$ $ -2.0 \ldots 009$ $ 00 \ldots 010$ $ -2.0 \ldots 009$
$ -2.0 \ldots 011$ $ 00 \ldots 010$ 10 $ 00 \ldots 010$ $ -2.0 \ldots 009$ $ 00 \ldots 010$ $ -2.0 \ldots 009$
$ -2.0 \ldots 012$ $ 00 \ldots 010$ 11 $ 00 \ldots 011$ $ -2.0
\ldots 013$ $ 00 \ldots 010$ $ -2.0 \ldots 009$
$ -2.0
\ldots 013$ $ 00 \ldots 011$ 00 $ 00 \ldots 011$ $ -2.0
\ldots 013$ $ 00 \ldots 011$ $ -2.0
\ldots 013$
Actual Round To $ -\infty$
Value Mantissa Rounded Represented
52 bits +2 Mantissa Value
$ +2.0 \ldots 009$ $ 00 \ldots 010$ 00 $ 00 \ldots 010$ $ +2.0 \ldots 009$
$ +2.0 \ldots 010$ $ 00 \ldots 010$ 01 $ 00 \ldots 010$ $ +2.0 \ldots 009$
$ +2.0 \ldots 011$ $ 00 \ldots 010$ 10 $ 00 \ldots 010$ $ +2.0 \ldots 009$
$ +2.0 \ldots 012$ $ 00 \ldots 010$ 11 $ 00 \ldots 010$ $ +2.0 \ldots 009$
$ +2.0 \ldots 013$ $ 00 \ldots 011$ 00 $ 00 \ldots 011$ $ +2.0 \ldots 013$
$ -2.0 \ldots 009$ $ 00 \ldots 010$ 00 $ 00 \ldots 010$ $ -2.0 \ldots 009$
$ -2.0 \ldots 010$ $ 00 \ldots 010$ 01 $ 00 \ldots 011$ $ -2.0
\ldots 013$
$ -2.0 \ldots 011$ $ 00 \ldots 010$ 10 $ 00 \ldots 011$ $ -2.0
\ldots 013$
$ -2.0 \ldots 012$ $ 00 \ldots 010$ 11 $ 00 \ldots 011$ $ -2.0
\ldots 013$
$ -2.0
\ldots 013$ $ 00 \ldots 011$ 00 $ 00 \ldots 011$ $ -2.0
\ldots 013$

For a given unlimited precision floating- point value $ x$ , which may not be exactly representable under IEEE-754 (i.e. it may require more than 52 bits to represent the mantissa of $ x$ ), we want to construct the tightest possible interval $ [x_\ell,x_u]$ such that the lower bound $ x_\ell$ is the largest possible representable number not greater than $ x$ , and the upper bound $ x_u$ is the smallest possible representable number not less than $ x$ :

$\displaystyle x_\ell \leq x \leq x_u\;.$     (4.58)

This condition is satisfied by rounding to negative infinity when calculating the lower bound, and rounding to positive infinity when calculating the upper bound. Note that if $ x$ is exactly representable, then $ x_\ell = x = x_u$ .


next up previous contents index
Next: 4.8.5 Implementation of rounded Up: 4.8 Rounded interval arithmetic Previous: 4.8.3 Comparison of two   Contents   Index
December 2009