Sign In
Forgot Password?
Sign In | | Create Account

Floating point

Colin Walls

Colin Walls

Posted Aug 28, 2012
1 Comment

Nowadays, most embedded systems are built using 32-bit CPUs. These devices give plenty of scope for performing the arithmetical processing required for various applications. Calculations can be performed on signed or unsigned integers and 32 bits gives a good range of values: +/- 2 billion or up to 4 billion respectively. Extending to 64 bits is reasonably straightforward.

If you need to stray outside of these ranges of values or perform more sophisticated operations, then you need to think in terms of floating point and this presents a range of new challenges …

The concept of a floating point number is simple enough – the value is stored as two integers: the mantissa and the exponent. The number represented is the mantissa multiplies by 2 to the power of the exponent. Typically, these two integers are stored in bit fields in a 32-bit word, but higher precision variants are also available. The most common format is IEEE 754-1985.

The clear benefit of using floating point is the wide range of values that may be represented, but this comes at a cost:

Performance. Floating point operations take a lot of time compared with integers. If the processing is done in software, the execution time can be very long indeed. Hardware floating point units speed up operations to a reasonable extent.

Precision. Because of the way that values are represented in floating point, a value may not be exactly what you expect. For example, you may anticipate a variable having the value 5.0, but it actually is 4.999999 This need not be a problem, but care is needed in coding with floating point.

Obviously, code like this would be foolish:

if (x == 3.0)
...

as x may never be precisely 3.0

Similarly, coding a loop like this might produce unexpected results:

for (x=0.0; x<5.0; x++)
...

You would expect the loop to be performed 5 times for x values 0.0, 1.0. 2.0, 3.0 and 4.0 This might work, but it is quite possible that an extra iteration will occur for x being 4.999999

The solution is to use an integer loop counter:

for (i=0,x=0.0; i<5; i++,x++)
...

Broadly speaking, floating point should only be used if it is essential and only after every creative way to do the calculations using integers has been investigated and eliminated.

floating point

More Blog Posts

About Colin Walls Follow on Twitter

Colin WallsI have over twenty-five years experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, I am a member of the marketing team of the Mentor Graphics Embedded Systems Division, and am based in the UK. Away from work, I have a wide range of interests including photography and trying to point my two daughters in the right direction in life. Learn more about Colin, including his go-to karaoke song and the best parts of being British: http://go.mentor.com/3_acv Visit The Colin Walls Blog

More Posts by Colin Walls

Comments 1

Post a Comment
Interesting post! Rather different from my perspective, coming originally from computational simulation where floating-point is standard. Just a couple of minor points: Your "for" loop, as written, would be perfectly fine with floating-point numbers. The mantissa for an integer is just the same binary bits as the integer, so it can be exactly represented -- until it gets so large that you run out of mantissa bits for it. And so all the arithmetic in computing that loop will happen exactly. It's still a bad idea to write a loop like that, though, because someone's going to come along and rewrite it to something like "for (x=0.0; x<0.5; x+=0.1)" -- and, since 1/10th is not exactly representable in binary, they will get roundoff errors and may well get an extra iteration when they do that. The more-fun case is if you tried to loop to a very large integer. One you get large enough, there aren't bits left in the mantissa to represent the ones digit, and so "x++" becomes a no-op. (Well, first it adds 2 because of rounding, but after doing that a while it then becomes a no-op.) Then you have an infinite loop! Because of things like that and the accumulation of errors in general, it's really better to write your revised loop with something like "for (i=0; i<5; i++) {x=i*0.1; ...}" rather than having a long sequence of "x+=0.1" increment operations that will accumulate roundoff errors.

Brooks Moses
6:12 AM Aug 30, 2012

Add Your Comment

Please complete the following information to comment or sign in.

(Your email will not be published)

Archives

Tags

 
Online Chat