C Programming Pitfalls

©2005 Check Point Software Technologies Ltd. Proprietary & Confidential

C Programming Pitfalls

Uri GorenKernel Stateful Enforcement I/SDecember 2005


Part I: C Macros

And how they can damage your code


Problems Caused by Macros

This presentation shows several ways, in which macros can cause unexpected problems in your code.

Goals:– Make you scared. You should think twice before using macros.– If you do write macros, help you do it safely.– If you run into macro related trouble, help you figure out what’s

wrong. I have suggested solutions to each problem. But:

– Each solution solves just one problem. You may need to combine several solutions.

– Some solutions don’t completely solve even one problem.– Some can’t be implemented is some cases.– Some solutions contradict other solutions.– Some solutions make your code ugly.

So – DON’T use this as a programming guide. Just use it to know the risks.


Operator Order #1

Here’s a nice macro:#define double(x) x+x

Let’s try to use it:int a=5;printf(“a=%d a*6=%d\n”, a, double(a)*3);

Code after preprocessing:printf(a=%d a*6=%d\n”, a, a+a*3);

Result – 20 instead of 30. Solution:#define double(x) (x+x)


Operator Order #2

Let’s try this one:#define sqr(x) (x*x)

And use it:int a=3, b=2;printf(“sqr(a+b)=%d\n”, sqr(a+b));

Code after preprocessingprintf(“sqr(a+b)=%d\n”, (a+b*a+b));

Result: 11 instead of 25. Solution:

#define sqr(x) ((x) * (x))


Multiple Evaluation

Here’s the fixed version of the previous macro:#define sqr(x) ((x)*(x))

How many sqares need to be added, to get 1000?int n=1, sum=0;while (sum<1000) sum += sqr(n++);

After preprocessing:sum += ((n++)*(n++));

Result – n is incremented twice in each loop. Solutions:

– Don’t pass parameters with side effects to macros.– Don’t evaluate twice. May be hard without using braces,

which isn’t possible when the macro returns a value.• With gcc, it’s possible. But we’re writing cross-platform.


Multiple / Late Evaluation

We implement pools, and have this macro:#define move(srcpool, dstpool, num) { \

move_the_elements; \srcpool->size -= num; \dstpool->size += num; \

} Let’s move everything from pool1 to pool2:

move(pool1, pool2, pool1->size); Code after preprocessing:

move_the_elements;pool1->size -= pool1->size;pool2->size += pool1->size;

Result – pool2->size isn’t changed. Solution:

#define move(srcpool, dstpool, num) { \int move_num = num; \move_the_elements; \srcpool->size -= move_num ; \dstpool->size += move_num ; \

}

Set to 0 – OK

Add 0 –Bug!


Variable Name Conflicts

Here’s one:#define add_square(x, a) { /* Add a*a to x */ \

int s = a; /* Avoid multiple eval */ \x += s*s; \

} Let’s try to use it:

int s, t;add_square(s, t);

After preprocessing:{ int s = t; s += s*s; }

Result: Caller’s ‘s’ is shadowed, and isn’t changed. Solutions:

– Don’t define variables in macros. But it contradicts previous suggestions.– Compiler warning about shadowing (as if we look at them).– Use more descriptive names. Only reduce the chances.– Use names “nobody uses” – with many ___underscores.

• Everybody use names which “nobody uses”.


Flow Control #1

Here’s a useful macro:#define dbgprint(msg) if (dbgflag) printf(msg);

And let’s use it:if (x==3)

dbgprint(“very bad\n”);else

dbgprint(“very good\n”); After preprocessing:

if (x==3)if (dbgflag) printf(“very bad\n”);;

elseif (dbgflag) printf(“very good\n”);;

Result – because of the extra “;”, the compiler won’t relate the “else” to the “if” – compilation error.

Solution:– Make sure to use braces around the conditional statement, even if

there’s only one. But as the macro writer, you can’t assure this.– Omit the “;” from the macro – leave it to the caller.


Flow Control #2

Look at this debugging macro:#define dbgprint(msg) if (dbgflag) printf(msg)

Suppose we use it, this way:if (x==3)

dbgprint(“very bad\n”);else

dbgprint(“very good\n”); After preprocessing:

if (x==3)if (dbgflag) printf((“very bad\n”);

elseif (dbgflag) printf (“very good\n”);

Result - whose “else” is it? The else will be related to if(dbgflag), not to if (x==3) – wrong results.

Solutions:– Use braces with the if. But the macro writer doesn’t control it.– Use one of these structures:

• #define dbgprint(msg) if (!dbgflag) {} else printf(msg)• #define dbgprint(msg) do {if (dbgflag) printf(msg);} while(0)


Macro / Symbol Name Collision

In some header, we define a nice macro:#define sum(a, b) ((a)+(b))

In some unrelated code, which just happens to include this header:int sum = 3;

Result – variable sum treated as macro – compilation error.

Solution - prefix name with component name:#define cp_math_sum(a, b) ((a)+(b))– Quite annoying – name too long.


Part II:Integer Arithmetic Pitfalls


Preview

Do we really understand integers?– We take mathematical operations for granted.– We assume that things work just like we have learned in

elementary school.– We take integer arithmetic as something basic, that doesn’t

require any bothering. Question: Which integers satisfy the condition (x == -x) ?

– In normal math, there’s only one – 0.– In fact there are two – 0 and 0x80000000. Check for yourself.– Conclusion – integers are not as simple as you may think.

In this presentation you will find:– Many functions and code segments, doing integer arithmetic.– All are “mathematically correct” – if integers were simple numbers,

they would give correct results.– All are buggy – they fail because of how integers work.


Intermediate Results

When evaluating a complex expression, there are intermediate results.– Normally, we ignore them – we look at the big picture.

Intermediate results have their types, and their data range.– They are not stored in an arbitrary size and precision.– It’s just as if you have declared them explicitly :– int x = a + b * c; is the same as:– int temp1 = b * c; (Assuming that b,c are ints)int x = a + temp1;

What if the intermediate result wraps, but the final result doesn’t?– With addition and subtraction, it’s usually OK.

• In “(1-2)+3”, though 1-2 is 0xffffffff, we eventually get 2 – correct.– With multiplication and division, we usually can’t.

• In “2GB * 3 / 10”, 2GB * 3 will overflow, and division won’t fix it.


Unsigned Integers – loop counter

Here’s a nice loop:int a[SIZE];unsigned int pos;for (pos=SIZE-1; pos>=0; pos--)

a[pos] = 555;

Any problem?– “pos >= 0” is a meaningless condition!– pos will go down from SIZE-1 to 0, then to

0xffffffff. This is still positive.


Unsigned Integers Subtraction #1

Here’s a nice macro:#define DIFF(x,y) (x-y) #define PDIFF(x,y) ((DIFF(x,y) > 0 ? DIFF(x,y) : 0)

Very nice and simple – returns the difference, if it’s positive.

Now let’s use it:unsigned int x=3, y=5;printf(“%d\n”, PDIFF(x,y));

We get -2!!!– x,y are unsigned, so (x-y) is unsigned.– “(x-y) > 0” is the same as “(x-y) != 0”.


Unsigned integers Subtraction #2

Let’s fix the macro above:#define PDIFF(x,y) ((int)(DIFF(x,y) > 0 ? (int)(DIFF(x,y) : 0)

Now, does it work?unsigned int x = 3*1024*1024*1024 + 3;unsigned int y = 3;printf(“%u\n”, PDIFF(x,y));

We get 0!– x is greater than y, but (x-y), when viewed as an

integer, is negative.


Unsigned Integers Overflow

Take a look at this function:int f(unsigned int x, unsigned int y) {

if (x+y > 1000) return TRUE;return FALSE;

}

We expect it to return FALSE only if both x and y are pretty small.

Now how about this:unsigned int a = 2 * 1024 * 1024 * 1024;if (!f(a, a)) printf(“boom!\n”);

Obviously, a is very large, so a+a must also be large.– However, a+a equals 0!– The function will return FALSE.


Signed Integers Boundaries #1

Boundary checking is important. As in:int f(int x) {

static int y[SIZE] = { … };if (x >= SIZE) return ERROR;return y[x];

}

But does it check the argument properly? How about f(-1) ?

– The error won’t be caught.


Signed Integers Boundaries #2

Here’s a lovely function:int dy(int x) {

static int y[SIZE] = { ... };if (x<0 || x+1>=SIZE) return ERROR;return (y[x+1] – y[x]);

}

This time, we carefully check the arguments, so we won’t exceed the array boundaries.– Do we?

How about “dy(0x7fffffff)”?– The condition is now:

if (7fffffff < 0 || 0x80000000 > SIZE) return ERROR;– 0x80000000 is negative! it’s not greater than SIZE.– The function will not catch the error!


Using Constants #1

Consider the following function:int big_enough(int size) {

return (size > sizeof(int));}

Tells you whether a given size is big enough.– Obviously, a negative size is not big enough.– Or is it?

big_enough(-1) will return true!– When comparing, size is converted to u_int.– So we’re comparing 0xffffffff with 4 – certainly big enough!

Constants have a type, and can be signed/unsigned:– Signed constants - e.g. 100, 100L.– Unsigned constants – e.g. 100U, sizeof(anything).


Using Constants #2

Here’s another function:int big_enough2(int size) {

return ((size - sizeof(int)) > 100);}

Tells you if the size is big enough for something.– Again – negative sizes are surely not big enough.– And again…

big_enough2(-1) will return true!– When subtracting sizeof(int), the result is unsigned.– So we’re comparing 0xfffffffb with 100 – certainly big enough!


Calculating Average

Here’s a simple exercise:– Write a function to calculate the average of two numbers.– Actually two exercises – signed and unsigned.

Solutions:unsigned int u_avg(unsigned int x, unsigned int y) {

return (x + y) / 2;}int s_avg(int x, int y) {

return (x + y) / 2;}

Now let’s check if it works:– Unsigned:

unsigned int x = 2 * 1024 * 1024 * 1024;printf(“%u\n”, u_avg(x, x));

– We get 0!• x+x equals 0, so (x+x)/2 does also.

– Signed:int x = 0x7fffffff;printf(“%d\n”, s_avg(x, x));

– We get -1!• x+x equals 0xfffffffe, which is -2. So (x+x)/2 is -1.


Percentage #1

How much is 30% out of something?– That’s easy. Can you program it?

Sure. Let’s do it nice, clean and modular:#define PCT(p) (p / 100)int f(int x) { return PCT(30) * x; }

Oops, it never works.– “30 / 100” is 0. We always return 0.


Percentage #2

The last percentage function was stupid. Let’s write a better one:int f(x) { return x * 30 / 100; }

Now it works.– Always?

Company X is worth 143,165,600$. I have 30% of the shares. What’s my fortune?– Using the above function we get 7$. Not very

exciting.– Why? “x * 30” is more than 4G, so it wraps around.

Division can’t fix it any more.


Percentage #3

Writing a percentage function can’t be that hard. This time, we’ll do it right:int f(int x) { return x / 100 * 30; }

Let’s how it does with the last example:– f(143165600) reurns 42,949,680.– I like this one much better.

But how about something easier?– How much is 30% of 10?– f(10) returns 0!

• “10 / 100” is 0.


Percentage #4

Here’s a more general percentage function:int p(int x, unsigned int p) {

if (x>1000 || x<-1000 || p>100) return OUT_OF_RANGE;

return x * p / 100;}

We don’t support large x – so we can’t overflow. But…

How much is 50% of -30? Let’s try p(-30, 50):– We get 42949657. How come?

x is signed, p is unsigned (makes sense). – In C, it means x*p is unsigned.– We put -1500 in an unsigned integer – it wraps around.– Division treats it as a large positive number, and returns a

smaller positive number.


Percentage #5

Here’s a harder question – what percentage is 30 out of 50?– Or generally, what percentage is x out of y?

Here are all the simple ways to calculate it:– (x / y * 100)

• Returns 0 when x < y (all normal cases).– (x * 100 / y)

• Overflows when x is large (what’s 5M out of 8M?)– x / (y / 100)

• Crashes when y is small (what’s 5 out of 8?)• Inaccurate when y is not very large (what’s 500 of 599?)

– 100 / y * x• Inaccurate when y less than 100.• 0 when y is more than 100.


Signed/Unsigned Division

Here’s a nice function:int cut(int x, unsigned int factor) {

return x / factor;}

How much is a half of -6? Try cut(-6, 2):– We get 2147483645. How come?

x is signed, factor is unsigned. – So before dividing, x is converted to unsigned –

we get 4G-6.– After dividing, we get 2G-3 – converting it back to

signed keeps it a large positive.


Bit Fields

Bit fields are very nice – they save memory. Here’s a program that uses them:

struct x {int flag:1;int count:31;

}const char *flag_set(struct x *s) {

const char *n[] = { “FALSE”, “TRUE” };return n[s->flag];

} What happens if we set flag to 1 and call flag_set?

– It returns an invalid string!– flag is signed, so it can get either 0 or -1.– So our program returns n[-1].

This example is platform dependant.– In Solaris cc, bit fields are unsigned (unless explicitly signed).

• This program works fine on Solaris (if compiled with cc).– In gcc (on all platforms), bit fields are signed.


Shifting

Check out this function:void printb(int x, char *buf) {

buf[0] = ‘\0’;for (; x != 0; x >>= 1)

strcat(buf, (x & 1) ? “1” : “0”);}

It creates a string with x in binary (reversed). How about count1s(0x80000000)?

– We expect a lots of zeros, ending with 1. The function will loop infinitely (until it crashes)!

– Shifting right a signed integer duplicates the high order bit!– 0x80000000 >> 1 == 0xc0000000. (8=1000b, c=1100b).– Shift right is like division – preserves the sign.


Conclusions

Be aware. Remember that:– Your code may not mean what you think it means.– The variable’s type and valid range are important. – Intermediate data has a type and valid range.

• Especially important with multiplication and division. Unsigned integers are more dangerous:

– The wrap around value (0) is closer to the expected values.– A single unsigned makes the whole expression unsigned.

Test you arithmetic:– Copy the arithmetic into a simple test program, and test all cases.

• Much easier to cover all possible values this way.– Even code that seems simple and correct may surprise you.

Use types explicitly:– When types matter, don’t let the compiler cast automatically. Cast

yourself, to make things clear.– Use variables for intermediate results, even when not needed.

• This may remind you of the intermediate values’ importance.


Part III:Miscellaneous C PitfallsUri GorenAugust 2005


Alignment

Consider the following function:char buf[SIZE];void write_num(int off, int num) {

int *p = &buf[off];*p = num;

} It writes a number in a given offset within a

buffer. What if the offset isn’t a multiple of 4?

– Intel based platforms – will work a bit slower.– Sun Sparc (Solaris) – crash!

So pay attention to alignment.


Operator Precedence

We all know the precedence of some operators:– Multiplication and division before addition and subtraction.

• “a * b + c” is the same as “(a * b) + c”.– Assignment after almost everything:

• “a = x + y” is the same as “a = (x + y)”.– Not “(a = x) + y”!

But do we always know the precedence?– a + b << 2– a ^ b & c– a > 3 ? c = d : x = y

You can find the full precedence table easily.– Don’t do it!– When you’re not 100% sure – use parenthesis.


The Importance of Prototypes

Prototypes are great, but optional.– They allow the compiler to catch more errors.– Omitting them just causes a warning.– The code works fine without them.

• Most of the time… Look at this case:

/* char *get_name(int id); No prototype! */printf(“%s\n”, get_name(MY_ID));

Will this work?– On 64bit platforms, the returned value will be assumed “int”.– The higher 32 bits will be ignored.– If the string is located above 4GB – it will crash.

Sometimes we get away with it.– In Solaris 64bit kernel, all global and static variables are located

below 4GB.– The problem is when returning a pointer to dynamic memory.


Arrays with Offset -1

Normally, we can access only positive array offsets. But look at this trick:int _a[SIZE+1];int *a = &_a[1];

Now we can access a[-1] to a[SIZE-1]. But it will fail, under two conditions:

– The index is an unsigned variable.– The index is of a type smaller than a pointer.

• u_char or u_short on 32 bits.• On 64 bits - also u_int.

So this trick should be done carefully (or not at all).


Part IV:Examples from Our Code


Array With a Negative Index

Here’s an array, defined in fwdrv.c:static struct fwiftab _fwiftab[MAXIFP+1];struct fwiftab *fwiftab = &_fwiftab[1];

This should allow access to fwiftab[-1]. But – what if the index is unsigned?

– It will crash on 64bit platforms.– It will crash if the index is u_char or u_short.

In practice:– It’s always called with a signed int.– -1 is possible only on Nokia, which isn’t 64bit.– We’re lucky.


Macro Affecting Flow Control

Here’s a very useful kernel macro:#define FW_ASSERT(caller, cond, msg) { \

if (fw_assert_on && !(cond)) { \kdprintf("FW-1: %s: %s (%s:%d)\n", \

caller, msg, __FILE__, __LINE__);\fw_panic(msg); \

} \}

What happens when used in an “if”?if (x > 0)

FW_ASSERT(rname, y > 0, “too small”);else

printf(“OK\n”);

This will not compile!– The semicolon after FW_ASSERT will “break” the if statement.

So use FW_ASSERT carefully.


Macro – Parameter Names

Look what I’ve found in fwlddist.c:#define FWSYNC_FCU_SET_TMOUT_TTL(timeout, ttl) \

do { info.timeout = &(timeout); \ info.ttl = &(ttl); \

} while (0) The macro was written carefully:

– “do {} while(0)” used –works fine with if-else.– Parenthesis around all parameters.

One real bug:– “timeout” and “ttl” are both parameters and structure members.– FWSYNC_FCU_SET_TMOUT_TTL(3, 4) won’t compile.

In practice: – It’s called many times, always with variables named timeout and ttl.– This is the only case where the macro can work.


Possible Overflow

Here’s a piece of code from fwatom.c:u_int fw_hmem_size_new, fw_hmem_maxsize_new;...if (fw_hmem_size_new * 2 > fw_hmem_maxsize_new)

fw_hmem_size_new = fw_hmem_maxsize_new / 2;

Makes sure that the new size doesn’t exceed half the new limit.– Both sizes are in bytes.

But – what if the size is 2GB or more?– “fw_hmem_size_new * 2” will wrap around.– The size won’t be decreased.

In practice:– The size can’t be more than 2GB minus something.– This is because we currently can’t use more than 2GB.– The bug is just around the corner.


Wrong Parameter Checking

A function from fwdrv.c:char *fw_func_getname(int func_id){

if (func_id < fwfuncs.nfunc)return fwfuncs.funcdesc[func_id].funcname;

return NULL;}

What if func_id is negative?– It will return a bad pointer.

In Practice:– func_id isn’t negative, unless there’s another bug.– The string is used only if debug is enabled.

C Programming Pitfalls

Documents

Transcript of C Programming Pitfalls