Tag Archives: floating-point

A (hopefully) simple stab at the difference between floating- and fixed point numbers

Since Pro Tools 10 has introduced the 32-bit floating point file format I thought it would be a good time to give a (hopefully simple) explanation on why the 32-bit floating point format yields a so much larger dynamic range than the conventional 24- or 16-bit fixed-point formats.

Note: The text below is illustrative and does not deal with the implementation of floating-point arithmetic in DAWs.

Fixed- and floating point systems are two different ways to use digits and encode them into a value. Let’s have a look:

Fixed-point numbers

In the decimal system a sequence of digits like ‘123’ is usually interpreted like this:

value = a*10^0 + b*10^1 + c*10 ^2 + …

for example:

123 = 1*10^2 (100) + 2*10^1 (20) + 3*10^0 (3)

If I introduce a decimal point all digits to the right of it will be multiplied with negative exponents:

1.23 = 1*10^0 (1) + 2*10^-1 (0.2) + 3*10^-2 (0.03)

Assume now that I have 5 digits (in the binary system these would be your bits) to represent a number, using the above system. Fixed-point means that I have to choose where my decimal point is going to be. So I can either represent small numbers with high accuracy (e.g. 1.1234, 9.4324) or large numbers with low accuracy after the decimal point (1000.2, 9500.9).

In fixed point arithmetic choosing the position of your decimal point will set the upper and lower limit of the value you can represent with your digits – your dynamic range. The resolution behind the decimal point will be the same for all encoded values.

Floating point numbers

Instead of the representation used above I will now use a different way to encode digits into values:

value = significand * 10^exponent

Here, the significand is a sequence of digits with an implied decimal point after the first one. I need to choose how many of my 5 digits I will use to describe the significand and the exponent.

Let’s imagine that I use 4 digits for the former and 1 for the latter. This means that, using the same 5 digits that I have used before, I can now encode values from (excluding 0)

0.001 * 10^0 = 0.001

to

9.999 * 10^9 = 9999000000

This range of values is not accessible using fixed-point (where, if I chose to have no digit after the decimal point, I could encode values from 0 to 99999).

Summing up, floating-point arithmetic uses a different way of interpreting your digits (or your bits in your computer) than fixed-point, giving you the possibility to encode a larger range of values (or dynamic range in your audio file).

As said in the beginning this text is meant to explain the basic idea in the decimal system – the principle is the same in the binary one.

Signing off,

Norbert