The general name for the Decimal Point is Radix Point
Computers store the sign, exponent and mantissa of a floating-point number
Mantissa is also called Fraction

Biasing

It is the process of offsetting numbers in a series by a fixed value (offset)

Assume we have 4 bits to store the exponent of a floating-point number
Using 4 bits we can represent 16 unique values
Exponents can be positive and negative so the range is equally divided
Now we can represent numbers ranging from -8 to 7

Next we find the largest number in the series (in our case is 8)
This value is added to all the numbers in the series
This will give us a new series with numbers ranging from 0 to 15
Using the new series negative exponents can also be stored as a positive value

Assume we have 10 bits to store floating point numbers
1 bit for sign, 4 bits for exponent and 5 bits for mantissa

Normalization

The process of representing a floating-point number in scientific notation

Explicit Normalization

Move radix point to the LHS of the most significant 1 in the bit sequence
Formula:


The last 1 is dropped since the machine does not have space to store it
Converting to Decimal:

Implicit Normalization

Move radix point to the RHS of the most significant 1 in the bit sequence

Formula:

Implicit nomination allows to stores values with higher precision

Converting to Decimal:

IEEE 754 Standard

NameCommon NameSignificant bitsExponent bitsExponent Bias
binary16Half Precision11515
binary32Single Precision248127
binary64Double Precision53111023
binary128Quadruple Precision1131516383
binary256Octuple Precision23719262143

Significant Bits: Sign + Mantissa
Programming languages implement Single and Double Precision Floats

When 5 bits are reserved for exponent we have 32 unique combinations (0-31)
If we consider signed numbers as well then the range becomes -16 to 15
In the IEEE 754 standard the exponent pattern all 0s and all 1s are reserved
So the range of the exponent becomes -14 to 15

ExponentMantissaRepresents
All 0sAll 0s
All 1sAll 0s
Any valueImplicit Normal Form
All 0sFractional Form
All 1sNaNException Handling

Precision

Decimal Precision:
Single Precision Floats:
Double Precision Floats:

What is the difference between float and double? - Stack Overflow