Topics: Numerical Analysis - Rounding Error - Normalised Form of Real Numbers
The floating point form of a number in its normalised form can be obtained by terminating the fractional part of the number at digits. There are two ways to do this:
- Truncate: the digits of the number up to the position that the word size allows (i.e. )
- Round:
- If , we add to , ”rounding up”
- If , we ignore the digits from onward, ”rounding down”
is dependant on the size of the computer word.
Bit Assignation and Range
Floating point numbers use specific bits of the computer word for specific purposes. Normally, there’s 1 bit dedicated to the sign, several bits for the exponent, and some more for the mantissa (the fractional part of the number).
Range and Exponent
The size of the word defines the range of the exponent of the floating point number:
Total digits | Range of the exponent | Exponent |
---|---|---|
to |
…where:
- : number of digits for the exponent
- : number of digits for the mantissa
- : decimal form of the number in the exponent part of the word
Mantissa Bits
Please note that the mantissa section of the word must always have in its first position (left to right). The mantissa section cannot be in total.
Final Number
The final number as represented on the computer is given by:
…where:
- : the bit corresponding to the sign
- : the exponent (i.e. )
- : the decimal number in the mantissa section
- : the base for the numbering system
Example
For instance, take the following number as represented in bits:
Exponent Mantissa 1 1 0 0 1 1 1 0 0 1 0 1 0 0 0 0 In this case, we have that:
Thus, the number is: