Sign In
Forgot Password?
Sign In | | Create Account

Undefined behavior and other delights of (bad) C programming

I started a couple of weeks ago, when I received an email from Sandeep Vasant from Ahmedabad University in India. For reasons that he has yet to reveal, he was having trouble with some code like this:

int a=10, b=20, c=0;
c = a++ + a++ + b++ + b++ + ++a + ++b;

He tried this with one compiler and the resulting values of a, b and c were 13, 23 and 96 respectively. He was satisfied with this result. Then he tried a different compiler, which yielded a final value for c of 98, which he found confusing.

I started looking into this, certain that the explanation was simple …

359f538e69461c8de26e816c3f981dd7As I work for a company that sell embedded software development tools, I have a natural interest in programming languages and their quirks – even if the code is not specifically embedded. I believe that embedded developers are often more interested in what is going on “behind the scenes” than their desktop computer software development counterparts.

My first thought was that, although the precedence of the + and ++ operators is clear and the functionality of the pre-increment and post-increment versions of ++ is unambiguous, the order of evaluation of the the operators is not defined. Maybe they are evaluated left to right or perhaps right to left. So, instead of using a compiler, I decided to work it out by hand. Going left to right, I got 10 + 11 + 20 + 21 + 13 + 23 = 98; going the other way, I got 21 + 11 + 21 + 22 + 11 + 12 = 98. So it made no difference. In both cases I got the result that Sandeep had been confused by.

Now it was time to try a compiler [I used CodePad]. I wrote this:

int a, b, c;
a=10, b=20, c=0; // original code
c = a++ + a++ + b++ + b++ + ++a + ++b;
printf("%d %d %d\n", a, b, c);
a=10, b=20, c=0; // sequence of sub-expressions reversed
c = ++b + ++a + b++ + b++ + a++ + a++;
printf("%d %d %d\n", a, b, c);

The results were:
13 23 92
13 23 96

Now I was confused, as neither result seemed correct. I rewrote the code:

a=10, b=20, c=0; // explicit left to right evaluation
c = a++;
c += a++;
c += b++;
c += b++;
c += ++a;
c += ++b;
printf("%d %d %d\n", a, b, c);
a=10, b=20, c=0; // explicit right to left evaluation
c = ++b;
c += ++a;
c += b++;
c += b++;
c += a++;
c += a++;
printf("%d %d %d\n", a, b, c);

I was much happier with the results this time:
13 23 98
13 23 98

So what is happening here? I consulted my colleague Jon Roelofs, who provided a straightforward explanation: The order of evaluation is unspecified – it could be right to left or left to right, but it could also be any other order that the compiler felt was appropriate. When side effects of the evaluations of sub-expressions occur [like the increment operators], there are undefined results. Needless, coding an algorithm which has an undefined result is rather pointless. Some compilers would give a warning/error in this situation.

Undefined behavior only happens when there is reading and writing to variables on the right hand side on an assignment more than once. For example:

a = b++ + c++ + d++;

does not exhibit undefined behavior.

Out of interest Jon suggested that I try this code:

int foo()
return 0;
int bar()
return 1;
int baz()
return 2;
void main()
printf("%d %d %d\n", foo(), bar(), baz());

The result I got was unsurprising:
0 1 2

He pointed out that the three lines of text could have come out in any order; the numeric data will always be displayed last and in the correct order.

There are numerous examples of this kind of challenge. Again, Jon suggested that I try this:

c = a+++b;

Is this treated as c = a++ + b; or c = a + ++b; ? It must be the former. Even though it looks as if it could have gone either way, the C language standard nails it. So this is not undefined behavior.

Without wishing to sound superior, I think that it would be very unlikely that I would encounter this problem in “real” code that I had written. This is because, as I started out writing assembly language, I am naturally inclined to keep my statements in C very simple and, hence, do not introduce such complexity.

More Blog Posts

About Colin Walls Follow on Twitter

Colin WallsI have over twenty-five years experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, I am a member of the marketing team of the Mentor Graphics Embedded Systems Division, and am based in the UK. Away from work, I have a wide range of interests including photography and trying to point my two daughters in the right direction in life. Learn more about Colin, including his go-to karaoke song and the best parts of being British: Visit The Colin Walls Blog

More Posts by Colin Walls

Comments 4

Post a Comment
There are three things to consider here: (1) "Order of evaluation", which applies to sub-expressions and is undefined in C (and C++), except in a few specific circumstances. (2) "Precedence", which applies to operators and is rigorously defined in the language specifications. (3) "Associativity", also rigorously defined, which also applies to operators and acts as a tie-breaker when precedence alone is not enough. The difference between (1) and the combination of (2) and (3) takes some getting to grips with, but is essential for a proper understanding of either language. I agree with you, Colin, about sticking to the KISS principle. The most prevalent C "sin" in the examples in your article is the (ab)use of embedded assignments in expressions. I *never* permit myself the dubious luxury of *any* embedded assignments in my code, even though the ++ operators were specifically invented for that kind of use.

Peter Bushell
12:27 PM Aug 4, 2014

Interesting way to put it Peter - thinking of a ++ as being a bit like a =. If I saw a = on the RHS of an assignment, even if valid, I would be wary and want to simplify the code.

Colin Walls
7:54 AM Aug 5, 2014

Some high-level languages define evaluation order, e.g. for Lisp it is always left-to-right and such ambiguities cannot happen. C cannot do this since the goal is to squeeze out the last CPU cycle. Whether this still makes sense or not today is questionable, considering some mass-targeted operating systems are becoming bloater and bloater on purpose, in order to sell more powerful hardware. And they are still coded in C.

Antonio Bonifati
10:42 AM Dec 5, 2014

You make a good point Antonio.

Colin Walls
8:31 AM Dec 8, 2014

Add Your Comment

Please complete the following information to comment or sign in.

(Your email will not be published)


Online Chat