Encoder Front Page
SRS Home | Front Page | Monthly Issue | Index
Search WWW Search seattlerobotics.org


Fixed point math

Kevin Ross


[This is a post I made to the list server last month. The basis of the discussion is how to do navigation on a 'small' computer. One proposal was to use floating point. My reply was that floating point can be useful, but is also very slow. I have attempted to describe how a fixed point number works, and specifically how to modify your thinking a little bit to perhaps do your math in Base256 or above, rather than Base10].

Sent: Sunday, July 23, 2000 10:05 PM
Subject: Re: [SeattleRobotics] IEEE floating point
> How do you think you will do navigation without floating point?  If one
> wheel turns a quarter of a turn more than the other what direction are you
> heading?  How can you traverse a random path and then return to directly
> opposed to retracing your steps).  It  takes trig functions, and they do
> work well when rounded to the nearest integer.

It is quite possible to do navigation using something other than floating
point. I hope I didn't tout the floating point representation of a number
too much, as it has lots of extremely serious drawbacks. The first and most
important is that most floating point software routines are very, very, very
slow! In reality, the best way to write software is to NOT use floating
point, unless your target machine has floating point hardware.

The reason most people end up using floating point is because we are used to
dealing with decimal based numbers. It is intuitive for us, since that is
what we learned in school. In reality, there is absolutely nothing magical
about base 10 numbers. In reality, base 10 numbers really suck on a
differently based machine such as a microcontroller. I was going to say
binary, but that isn't really true. Most microcontrollers are really byte
sized. Some are word sized. Some are long word (32-bit)  sized.  If you
could write your equations all in base 256 (or base 0xFFFF for a 16 bit
machine), you would find the math works out pretty well!

If you take a couple of steps back, you might realize that the only thing
that floating point numbers help you with is the ability to do math on
numerical values with ranges that exceed the maximum integer size. For
example, with floating point, I can multiply 1,234,567 by 0.1234567. This
is, however, not very accurate because I can't represent the entire range of
numbers in a 32-bit value. I will get something that is close to the answer.
Floating point is mostly used to work with numbers of different magnitude.
It helps because the calculations keep the magnitude of the values as an
integral part of the calculation. It costs you accuracy, however, when the
magnitude of the values are drastically different.

Fixed point is definitely a much faster way to handle these calculations.
Remember, fixed point, in reality, is just integer math. You just need to
keep the factoring straight and have enough digits of accuracy. Digits is a
poor choice of words, since this implies base 10. Bits would be a better
term. This might require writing some math routines that keep your values in
extremely large integers, and to have libraries that do extremely large
integer math. This might not be as bad as you think, but will require you to
re-learn some of your old human 10-finger math.

Note that a common mistake is to assume that you can just factor everything
by a large enough magnitude so the decimal point goes away, making
everything just an integer. The answer is a resounding 'well, sort of!'. The
problem is that many multiplication operations are really a combination of
multiplication and division. If there is a fractional part, then you are
really doing a division. Yes, you could scale things up by some large
factor, do the division, then divide the result back down. However, you lose
accuracy by doing so, plus you will find that some operations won't work
correctly. Division, for example, on a shifted value may get you the wrong
result. You could carefully factor all of your operations to avoid such
errors, but it is difficult to get them straight.

Fixed point is a good way to go, but does require more thought than just
integer math.

Gory details:

Lets think for a moment about what a decimal number really means. You have
an integer and a fractional part. Most people understand the integer part,
but most really never give the fractional part much thought, other than to
use it like you were told to in grade school. Integer, or
whole numbers are easy. I assume you realize that 1 + 1 == 2, 1 + 2 == 3,

The fractional part is a little more interesting. As the name implies, it is
a fraction, which is a division. For example:

1.5 can be written as 1 1/2 which can be written as 1 + 1/2 or (1 + 5/10).

I call this out because I am going to mess with your head a little. I would
like to explain how to implement a fixed point number system. It actually
isn't too hard to write, but it can be a little bit difficult to
conceptualize. The problem is we are so fixed on Base10 numbers that it is
difficult to think of anything else. Even when you get yourself thinking in
other number bases, you will find that you always come back to Base10.
That's OK, everyone does, including me. I don't know my hexidecimal
multiplication tables either!

For the first trick, I am going to start talking about numbers that only
have two digits. The whole part and the fractional part. I am going to start
writing numbers using a colon instead of a period. For example, 1.5 just
became 1:5

1:5 means that there is a whole part with value 1, and a fractional part
with value :5. Note that the colon is going to be attached to the fractional
part. Why? Because the fractional part is a stand in for 5/10, or 5/Base10.
You will see in a while, the Base10 is important.

Our two digit world has a range from 0:0 to 9:9. These map, for the moment,
to Base10, which means 1:5 times 2:0 == 3:0, just like before. For the
moment, assume that a 1:5 requires a 16-bit word to hold the value. The high
byte is the 1, the low byte is the fractional :5

To do math on this number is pretty easy, but will require some additional
support in the form of routines to keep the fractional and whole parts

Addition:   2:3 + 4:1 = 6:4   (Add the fractional, add the whole)
                6:8 + 1:5 = 8:3  (Add the fractional, add the whole plus a
                9:7 + 3:3 = 2:1  (Add the fractional, plus a carry, wrap the
high byte)

Note that in the example above, everything works just like unsigned decimal.
Since there are only the digits 0-9, when the result is greater than 10, the
high digit is lost, and the value wraps. Overflowing an integer works the
same way. Subtraction, just like addition, also pays attention to the
possible borrow. In either case, if the fractional part over or underflows,
then it carries/borrows from the whole part.

Multiplication turns out to be a little trick, and requires some thought, I
already hinted at the idea that the fractional number really implies a
division. If you think about it, 2:0 * 2:5 is really the same as 2:0 * 2 +
5/10, This simple example shows that there is some division required. Why?
Fractional parts of numbers are just that, fractions. The fractional part of
a number, such as the :5, is really a representation of 5 / (Number Base).
In normal notation, the number after the decimal place is X / Base 10.
(Note: In the standard mathmatical notation, the base for division is
adjusted to insure that the base is correct for the number of digits. For
example, 2.125 is really 2 + 125/1000. The rule is that the fraction on the
right is always < 1. In our example, we are limited to 1 digit, which allows
me to simplify this into BaseX).

By doing this with a single digit, I am able to simplify this operation into
a single equation for multiplication. Since you are at the moment stuck on
the fact that a single digit would appear to limit your range from 0:0 to
9:9, I will hint that you are wrong, and the upper limit is really much
larger, and actually depends on the hardware that you are using. A 68HC11
would have a 256:256 limit, a 16-bit machine would do 65535:65535, and of
course a 32-bit machine would allow for even higher. However, even 256:256
is really quite a large range. In fact, there are 65535 possible values.

For this set of equations, you need to keep in mind that I can cheat and
factor 5 * 2.1 into 5 * 2 + 5 * 0.1. Do the math, it works! To do this with
single digits, I am going to use 4 letters so you can see how the formula
develops. A, :B, C, and :D are the 4 digits. A:B are the first number, C:D
are the second. Note that the :B and :D represent the fractional parts.

Start with the sample equation A:B * C:D

Factor this out as:

(A:B * C) + (A:B * :D)

Note that the fractional :D was seperated from C. Not too bad. Now,
seperating A:B yields the following

(A * C) + (:B * C) + (A * :D) + (:B * :D)

This goofy looking thing is a factor out multiplication of A:B * C:D, but
there are a few interesting things to note. First of all, these terms
require a 2 digit result. If you think about it, multiplying two single byte
values may require a full 2 bytes to hold the result. Most hardware with
multiply operations can handle this. The lower byte of these 2 bytes is your
result, the upper byte is a carry byte. How to handle this is coming right

Another thing to note: All terms with a ':' represent fractions. In other
words, the final value needs to be divided by the Base. For example, assume
our base is 10, and (:B * C) ends up being 21, then the result is a
fractional number 2:1.

Here comes an example: 1:3 * 4:5

(1 * 4) + (:3 * 4) + (1 * :5) + (:3 * :5)
(4) + (3/10 * 4) + ( 1 * 5/10) + (3 / 10 * 5 / 10)

Note how we needed to divide by the bases all terms with a ':'. Note that
the last term is going to generate a BaseX ^ 2. Thats OK, we will truncate
the lower digit.

4:0 + 1:2 + 0:5 + (15 / 100)
4:0 + 1:2 + 0:5 + (0:15)
5:7 + (0:15)
5:8    (5.85 truncated)

Implementing this in Base10 isn't too bad, and might be something you feel
really good about doing. However, if you want these routines to go faster,
then you should consider doing this in whatever data size your favorite CPU
wants to use. For example, you should consider doing 256:256 on your 68HC11.
Why? Well, you can eliminate some steps if you do!

Specifically, if you use Base256, then you can cheat and ignore several
divisions. In fact, you don't need any! This is very good since the
divisions by the base turn out to be the expensive parts of this. Since you
are doing byte sized multiplications followed by a byte sized divide, you
can basically ignore the divide and just grab the result out of the 16-bit
result of the multiplicatoin. The lower byte is the fraction, the upper byte
is the whole. If you work this out by hand, you will see what I mean.

Other bases have the same quality. If you have a 16-bit machine, then you
should have a 32-bit result from your multiply, etc.

Division takes longer to explain and I have to go to bed now!