IEEE Floating Point Numbers

Introduction

Floating point numbers are a way of representing real numbers inside computers. In other words, floating point numbers, or float as they are more commonly known, are numbers with fractional part.

Early digital computers all used integer arithmetic and could not represent fractions. This lead to large inaccuracies while computing complex problems. The first programmable computer to use floating point numbers was the Z1 in 1938. It was able to do 24 bit floating point arithmetic calculation.

Real numbers are represented in exponential form in computers. The exponent form has two parts, namely mantissa and exponent. The mantissa represents the magnitude of the number while the exponent represents the power to which the number should be raised to get the actual value.

All these are perfectly good concepts. But with computers, there lies a caveat in the form of computer memory. Memory in computers are limited. The range of values that a floating number can take and its precision are both dependent on the memory. So computer scientists started limiting the precision and possible ranges of floats such that they could take most commonly used value ranges while still representing them with essential accuracy.

These limits are imposed by specifying how many number of bits are required to represent the mantissa and exponent. But there was hardly a consensus between scientists on what the limits should be. Every computer ended up having its own implementation of floating point numbers and this badly affected portability.

While dealing with floating point arithmetic, programs would often crash when exceptions arose. All these lead to the formation of a common standard for representing of floats in computers. The standard was put forward by the Institute of Electrical and Electronics Engineers. It is described in detail in the next section.

IEEE Floating point format

The Institute of Electrical and Electronics Engineers introduced the IEEE standard for Floating point arithmetic in 1985. Also known as IEEE 754, it fixed the myriad number of problems that had been existing with floating point representation and computation. It became a universal standard that most of the later computers used.

The IEEE floating point format defines the following:

Arithmetic format, ie. The format which is used to represent binary and decimal floating point values.
Interchange format, ie. How the data should be encoded for for exchanging such that it can be transferred efficiently and not using excessive bandwidth.
Rounding rules: Computers have finite precision and during operations, it is often necessary to round the numbers to properly represent the results inside computers. The IEEE format defines standards for performing the rounding.
Operations: The IEEE format defines how the operations should be performed of floating point numbers.
Exception handling: The IEEE format defines how exceptions should be handled, when they arise.
The latest revision of the IEEE standard is known as IEEE-754-2008, which was finalized in 2008 after a seven year revision process.

Representing numbers

The IEEE-754 format clearly defines how things should be represented. They are explained in detail below.

Finite numbers, which can be decimal or binary comprise of three integers. The first one is the sign(s), second one is the significant(c) and the third one is exponent(q). Eg. (1)^s x c x b^9
Two infinities: +inf and -inf
Two NaNs. NaN stands for Not a Number. They are produced when exceptions arise.