The general name for the Decimal Point is Radix Point
Computers store the sign, exponent and mantissa of a floating-point number
Mantissa is also called Fraction
Biasing
It is the process of offsetting numbers in a series by a fixed value (offset)
Assume we have 4 bits to store the exponent of a floating-point number
Using 4 bits we can represent 16 unique values
Exponents can be positive and negative so the range is equally divided
Now we can represent numbers ranging from -8 to 7
Next we find the largest number in the series (in our case is 8)
This value is added to all the numbers in the series
This will give us a new series with numbers ranging from 0 to 15
Using the new series negative exponents can also be stored as a positive value
Assume we have 10 bits to store floating point numbers
1 bit for sign, 4 bits for exponent and 5 bits for mantissa
Normalization
The process of representing a floating-point number in scientific notation
Explicit Normalization
Move radix point to the LHS of the most significant 1 in the bit sequence
Formula:
The last 1 is dropped since the machine does not have space to store it
Converting to Decimal:
Implicit Normalization
Move radix point to the RHS of the most significant 1 in the bit sequence
Formula:
Implicit nomination allows to stores values with higher precision
Converting to Decimal:
IEEE 754 Standard
Name | Common Name | Significant bits | Exponent bits | Exponent Bias |
---|---|---|---|---|
binary16 | Half Precision | 11 | 5 | 15 |
binary32 | Single Precision | 24 | 8 | 127 |
binary64 | Double Precision | 53 | 11 | 1023 |
binary128 | Quadruple Precision | 113 | 15 | 16383 |
binary256 | Octuple Precision | 237 | 19 | 262143 |
Significant Bits: Sign + Mantissa
Programming languages implement Single and Double Precision Floats
When 5 bits are reserved for exponent we have 32 unique combinations (0-31)
If we consider signed numbers as well then the range becomes -16 to 15
In the IEEE 754 standard the exponent pattern all 0s and all 1s are reserved
So the range of the exponent becomes -14 to 15
Exponent | Mantissa | Represents | |
---|---|---|---|
All 0s | All 0s | ||
All 1s | All 0s | ||
Any value | Implicit Normal Form | ||
All 0s | Fractional Form | ||
All 1s | NaN | Exception Handling |
Precision
Decimal Precision:
Single Precision Floats:
Double Precision Floats:
What is the difference between float and double? - Stack Overflow