The pros and cons of using gcc

There are many ARM compilers that one can use to do embedded microcontroller development.

Some of them create better code than gcc, some of them are worse. Some come with and IDE, libraries and tools that write half the code for you. So, why would you choose to do development work using gcc? Well, I don't try to sell gcc. I use it, you may or may not. I try to give you some of the pros and cons and then you have to decide yourself. On this page you will also find some discussion about the problems that I have encountered using gcc, so you can't say that I'm trying to flog it. This is what you will find here:

The pros and the cons

The Good

The Bad

...and the FUD:

GCC's extensions to the C language

There are many extensions that gcc provides. This short description only covers a fraction of them, for a more comprehensive description read gcc's info document. They are really good and very useful in many ways. There is no particular order or any logic in what is mentioned here and what is left out, other than what came to my mind when I was putting this library together.

Anonymous structures and unions

You can define structures or unions within other structures or unions without giving a name to the inner one. Instead of a detailed explanation here is an example:

// We define an object that contains either an int or a float, as well as
// a flag that tells us what kind of object it holds.

struct {

    // The flag that is true if the data is integer

    int is_int;

    // The data, either an int or a float

    union {

        int i;
        float f;

    } data;

} variadic;

// Now we can access the members like this:

if ( variadic.is_int )

    integer_data = variadic.data.i;
else
    float_data = variadic.data.f;

// Using anonymous unions you can save some typing:

struct {

    int is_int;

    union {

        int i;
        float f;
    };              // Note that the union has no name!

} variadic;

// Now we can access the members of the union as if they were
// members of the surrounding structure:

if ( variadic.is_int )

    integer_data = variadic.i;
else
    float_data = variadic.f;

Naturally, the names of the members of the anonymous object must be unique for the surrounding object. You can nest anonymous objects within each other just like you can named ones, as long as the uniqueness in field names is observed at every nesting level.

Note that if you specify -pedantic for gcc it will complain, because those were not specified in the C99 standard. However, if you do not specify -pedantic but specify -std=gnu99 then even with -Wall -Wextra -Werror the compiler remains quiet about them.

Non-standard predefined macros

GCC has a a lot of built-in macros that are an extension of the C99 standard. Now you will not find them by reading through the gcc man or info page. However, if you take a look at then info page for cpp, the C preprocessor that gcc uses, you will find them there. I don't list them here, there are way too many for that. Read the info page, it's worth it.

Inline assembly functions

For pieces of code where you really need to use assembly, you can use the inline assembly feature of gcc. It allows you to insert small assembly fragments inside your C code and tell the compiler enough about it so that it can work out its effects on registers and the like. Quite often you put your inline assembly into an inline C function and then for all intents and purposes it looks like a normal C function but get expanded to your assembly instructions without any function call overhead. Naturally, the whole thing is interesting for you only if you know ARM and/or THUMB assembly. In the examples below I will use ARM code, but the same thing works equally for THUMB.

The syntax of inline assembly is a bit funky, but you can get used to it. Here we have a simple example, a function that calculates the one's complement of the sum of two numbers. Note that this is a rather useless example, but will serve our purpose here.

// Returns ~(a+b)

static inline int NegSum( int a, int b )
{
int     s;

    asm volatile
    (
        "add  %[sum],%[aop],%[bop] \n\t"
        "mvn  %[sum],%[sum]        \n\t"
        : [sum] "=r" (s)    // Output parameter
        : [aop] "r"  (a),   // Input parameter
          [bop] "r"  (b)    // Input parameter
    );

    return s;
}

You start your assembly with the asm volatile keyword sequence. After that, enclosed by parenthesis, is your actual inline assembly bit. It starts with the code itself, which is a single C string (note that there is no comma at the end of the individual instruction strings, so they will be concatenated into a single string). The assembly variables are in the form of %qual[name] where name is an arbitrary name (an otherwise legal C symbol) and qual is an optional qualifier character, which in our example is not used.

The individual instruction strings get concatenated then gcc substitutes the variables with the relevant register (or other asm operand) references. That is, if gcc decides that the operand you called [foo] will be in r3, then it will replace every occurance of the substring '%[foo]' with 'r3' in the entire assembly instruction(s) string.

When that is done, the whole string simply gets printed into the compiler's output assembly code. That's why you need a separator at the end of every line. The \n\t at the end of the line makes the assembler happy and if you tell gcc to keep the assembly, your instructions will neatly line up with the rest.

The instruction(s) string is followed by up to three fields, each starting with a ':'. The first field describes the output(s) of the assembly, if any. The second describes the input(s), if any, and the last descibes the side effects of the stub, if any. In our case there are no side effects, so that field is omitted. Note that you may have more than one output of an inline assembly routine, although in our example we only had one. If a field is empty, it can be omitted, or left empty. If the side effect (or clobber) field is empty, it must be omitted.

The input and output parameter fields have the same syntax:
[name] "cons" (expr)
where name is a valid C symbol, the name under which this parameter is referenced in the actual assembly code cons is the constraint, soon to be detailed, and expr is a C expression. Expressions are an interesting issue, so some more details will be given, but as a basic rule the expression for the output parameters must be an lvalue, while for the input it can be anything. The input expressions are evaluated and their results are loaded or bound to the input. The output expression is also evaluated as an lvalue and the result is transformed to match its type and then bound to assembly operand expressions. Now this sounds very complex and it indeed is, but in our simple example all it means that the compiler will bind the C expressions (all of which refer to a simple int C variable) to registers, and that's all.

If we now call the above function from an other function, like this:

int     foo( int w, int x, int y, int z )
{
    return NegSum( w, x ) - NegSum( y, z );
}

this is what is generated by the compiler:

foo:
    add  r0,r0,r1
    mvn  r0,r0
    add  r2,r2,r3
    mvn  r2,r2
    rsb  r0,r2,r0
    bx  lr

which is pretty cool. Of course, if foo() contained the instruction

    return ~( w + x ) - ~( y + z );

we would have gotten the same result. As I told you, the example was rather useless, because what it did could have been done in C just as efficiently. Here's a more useful example:

// Returns 0 if the interrupt is enabled and !0 if it is disabled.

static inline int   IsTheIrqDisabled( void )
{
int     stat;

    asm volatile
    (
        "mrs    %[sr],cpsr          \n\t"
        "and    %[sr],%[sr],#0x20   \n\r"
        : [sr] "=r" (stat)
    );

    return stat;
}

You can call (from ARM code) the above function and will cost you all together 2 instructions (and 2 clocks), which is a lot less than the cost of a function call.

Now let's see about the constraints. A constraint formally is a sequence of characters enclosed in double quotes. The last character of the sequence tells the compiler where it is. The most frequent case is that you have a 32-bit object that you want in a register. This is denoted by the letter 'r'. In our case all parameters are like that. The compiler will allocate a register for the object, at its discretion. You can not specify a particular register (well, you can, in a round-about way, but not through the constraints). The compiler can allocate the same register for different input operands if they are bound to the same expression. For example,

foo()
{
int     x;

    asm volatile
    (
        "some insns"
        :
        : [op1] "r" (x),
          [op2] "r" (x)
    );
}

will probably have op1 and op2 bound to the same register, but there's no guarantee either way.

Output operands must indicate that they are output, that is done by using the '=' character as the first character of their constraint. That is, an output operand in a register is denoted by "=r". All output operands are allocated to different registers but, and this is very important, they can be allocated in the same register as an input operand. It is because gcc assumes that the whole assembly you wrote is one single atomic operation that consumes all of its inputs before generating the output. It also assumes that unless it puts an output to the same register as an input you will not modify the registers containing input variables. Indeed that is the case with single instruction code and even in our two-instruction example, but in any more involved code that doesn't work like that. So, you have to use additional qualifiers to tell the compiler about restrictions in register allocation. If you use the '&' character between the '=' and the 'r', like this "=&r", then the compiler will guarantee that that particular output operand will not coincide with any input operand. On the other hand, how about guaranteeing that a particular output is in the same register as a particular input? Well, you can't do that, but you can do the opposite: forcing an input operand to coincide to an output. If you want to do that, instead of specifying "r" for the input operand constraint, you put the bracketed name of the output into the constraint string:

// Increment a variable
asm volatile
(
    "add %[outreg],%[inreg],#1  \n\t"
    : [outreg] "=r" (some_var)
    : [inreg]  "[outreg]" (some_var)
);

Note that the fact that both [inreg] and [outreg] are bound to the same C expression does not guarantee that they will be in the same register. Only specifying the "[outreg]" constraint for [inreg] forces them to occupy the same register.
The forcing an input to be the same as an output is the method you can use to want the compiler that you will modify an input operand. What you do is that you create an output variable for your code, that you bind to an otherwise unused local variable in the surrounding inline function. Then you force the input argument that your code modifies to be in the same register as that output. That way the compiler will know that you change the input (since it coincides with an output) but since you do not use that output, it will do nothing else with that register, except saving it before invoking your asm if it needs its value any further:

static inline int DoSomething( int x )
{
int     retval;
int     dummy;

    asm volatile
    (
        "insn   %[in],%[in],blah blah   \n\t"   // Changes the input
        "insn   %[out],%[in],etc        \n\t"   // Sets the output

        // Declare the real output and bind it to retval.

        : [out] "=&r" (retval),

        // Declare an other output and bind it to 'dummy'.

          [in] "=r" (dummy)

        // Force the input argument to be in the same register as the
        // dummy output. That allows us to change the input reg.

        :      "[in]" (x)
    );

    return retval;
}

Before we go into any more details about constraints, let's take a look at the so-far neglected third specification field after the actual code, the side effect, or more precisely, clobber field. In this field you have to declare the registers that you changed (except, of course, the output registers). You simply list the registers you change, enclosed in double quotes and separated by commas. If, for example, you use r10 and r11, you would write:

asm volatile
(
    "instructions"
    : outputs
    : inputs
    : "r10", "r11"
);

The compiler guarantees that no input or output operand will be allocated into a clobbered register (if you force an operand to occupy a clobbered register, the compiler will issue an error). If you modify the status register, you should put "cc" onto your clobber list. If you modify memory (other than a memory operand that you listed as an output, explained later) then you should also put "memory" onto the clobber list. That would tell gcc that registers containing cached copies of memory variables should be deemed invalid.

It was mentioned earlier that there is a way to force a particular variable to be in a particular register. Indeed there is. Let's assume that we have some spiffy assembly routine named 'weird' that for some unknown reason takes an argument in r7, returns a value in r9 but otherwise does not modify any registers. Of course with an interface like that you can't call it from C. So, you would add a little assembly stub to call it:

// This function is a wrapper to call 'weird' from C code
// As per C calling conventions, the (first) input is in r0 and the
// output is expected in r0; we can freely use r0-r3 and r12.
call_weird:
    mov     r1,r7           // Save r7
    mov     r2,r9           // Save r9
    mov     r3,lr           // Save the return address
    mov     r7,r0           // Copy the input to r7
    bl      weird           // Call the weird function
    mov     r0,r9           // Copy the result to r0
    mov     r7,r1           // Restore r7
    mov     r9,r2           // Restore r9
    bx      r3              // Return (to the saved lr)

That's a solution, but we can do better. The general syntax of forcing a variable into a register is to put asm( "reg" ) at the end of the declaration, like this:

register int x asm( "r3" );

If the variable is a local var of a function, then the function will simply use the particular register for that variable. If the variable is a global variable, then things get more complex (but can be extremely useful under some circumstances). That complexity is not explained here but you are encouraged to read about them in the "Explicit Reg Vars" chapter under the "C Extensions" of the gcc info page.

With the ability of forcing variables to specific registers, we can have a better way of calling that weird assembly function:

static inline int CallWeird( int x )
{
register int ireg asm( "r7" );  // ireg is r7
register int oreg asm( "r9" );  // oreg is r9

    ireg = x;                   // Get the input into ireg

    volatile asm
    (
        "bl weird \n\t"         // Call the assembly - that's all
        : "=r" (oreg)           // Bind oreg, i.e. r9 to the function output
        : "r" (ireg)            // Bind ireg, i.e. r7 to the function input
        : "lr"                  // Tell gcc that we clobbered lr
    );

    return oreg;
}

The above has two advantages over the assembly stub. First, it is only one call, not two. Second, in the inline assembly case the compiler knows that it has to put the input into r7, that the output is in r9 and that the only register modified is lr. With the stub it has to assume that r0-r3,r12 and lr get modified (which, indeed they do) plus it first moves the input from wherever it was to r0, then we move it to r7, call, move the output to r0 and then the compiler moves it wherever it wants to keep it.

So far we had everything as long as every operand fits in a 32-bit register and we only clobber registers. That's not always the case. So, here's the rest, or at least the most useful subset of them.

If you have 64-bit arguments, then they will always be in consecutive registers. In you inlined code if you use %Q[operand] it will refer to the least significant word and %R[operand] will be the most significant word (regardless of the endianness of the chip you use). In case of double operands the situation is a bit more complex. If you use EABI, then everything is as described above, the IEEE-754 double, as a 64-bit word will be loaded and can be accessed as above. If, however, you use ELF, then the double is always in big endian order, which is the opposite of the little-endian processor's natural order (note that although the ARM7TDMI can be configured to big-endian, I am not aware of any such actual chips). Unfortunately gcc is a bit confused about that and in case the Q and R qualifiers don't work. In that case you should use %[operand] to access the most significant word and %H[operand] when you need the least significant word. See Inline assembly qualifiers for more on qualifiers. If you want to specify a 64-bit object to be in specific registers, you use the same syntax as with 32-bit objects. The register you specify is the lowest numbered register:

register long long x asm( "r4" );

will allocate r4 and r5 for x. If you use EABI, you should always put 64-bit objects into an even register (and the following odd one), if you use ELF, any register and the next one is fine. For example, to add two long longs you would do this:

static inline long long BigSum( long long a, long long b )
{
long long r;

    asm volatile
    (
        "adds   %Q[res],%Q[aop],%Q[bop]     \n\t"
        "adc    %R[res],%R[aop],%R[bop]     \n\t"
        : [res] "=r" (r)
        : [aop] "r" (a)
          [bop] "r" (b)
        : "cc"
    );

    return r;
}

You can specify an operand to be in memory, by using the constraint 'm'. In that case when you refer to the operand by %[name] it will be replaced not with 'rn' but with '[rn,#offs]. Naturally, that is only suitable for the ldr and str instructions. The register and offset are generated by the compiler and you do not have to worry about them, the register will be valid and it will not clash with any of your output, input or clobber registers:

extern int some_var_in_memory;

static inline void StoreInMemVar( int x )
{
    asm volatile
    (
        "str    %[input],%[memvar]  \n\t"
        : [memvar] "=m" (some_var_in_memory)
        : [input] "r" (x)
    );
}

foo()
{
    StoreInMemVar( 20 );
}

If you compile that, this is what you get:

foo:
    mov     r2, #20
    ldr     r3,.L11
    str     r2,[r3, #0]
    bx      lr
.L11:
    .word   some_var_in_memory

You can also specify constants as parameters (for inputs, of course). Constants come in a few varieties. If you use the constraint letter 'M' for an argument, then the compiler will check that it is indeed a constant expression and that it is between 0 and 32. That is, the argument can be used as a shift amount:

// Rotates x to the right by s bits. The shift amount MUST be constant.

static inline int Shift( int x, int s )
{
    asm volatile
    (
        "mov    %[op],%[op],ror %[sh]   \t\n"
        : [op] "=r" (x)
        :      "[op]" (x),
        : [sh] "M" (s)
    );

    return x;
}

When you invoke Shift(), the compiler will check if its second argument is indeed a constant and that it is indeed between 0 and 32. Then when it processes the mov ... line, it will replace the %[sh] operand with a # symbol followed by the actual value of the constant.

The constraint 'J' denotes a constant between -4095 and +4095, suitable as an offset for ldr and str instructions.

If you use 'I' then it is a constant that can be used as an operand for an arithmetic or logic instruction. The 'K' constraint can be used if the 1's complement of the constant is such that it can be used in such an instruction. Now that raises an interesting issue. The assembler is smart enough to know that when you write

    mov     r0,#-3

then while -3 is not a valid constant value (it is not a constant between 0 and 255 shifted left by a multiple of 2) for the mov instruction, the

    mvn     r0,#2

instruction will indeed load -3 into r0 and of course 2 is a valid constant. Similarly, the assembler will exchange add and sub, adc and sbc, and and bic with each other if by negating (1's complement) the constant makes it valid. Alas, you have no constraint that would say to accept a constant that is either the 'I' type or the 'K' type. Fortunately, there's a solution. It is the multiple alternative mechanism of the constraints. There are much more to it than what you find here, so for further details see the gcc info page, C Extensions, Constraints, Multi-Alternative. The gist of the thing is that in the constraint specifications you can have multiple constraints. For each operand you must have the same number of possibilities, separated by commas. The compiler will then go through them and see if a particular combination satisfies the operands. Note that it goes through the constraints on all the operands lock-step, that is, it checks if the first constraint of all operands is satisfactory, if not then whether the second constraint of all operand is satisfactory and so on. You can't have combinations like the third constraint on the first operand but the second constraint on the second operand. Using that knowledge we can write a little stub that will add any valid constant to a value:

static inline int AddConst( int x, int c )
{
    asm volatile
    (
        "add %[oper],%[oper],%[cons]    \n\t"
        : [oper] "=r,r"             (x)
        :        "[oper],[oper]"    (x),
          [cons] "I,K"              (c)
    );

    return x;
}

Then, if you invoke with the 'c' argument being say -30, the compiler will accept it and the assembler will replace the add with a sub and change the constant from -30 to +30. You can extend that scheme to create more intelligent instructions. Assume that we want to write a function that rotates an integer to the right by a given amount. The ARM can rotate a register either by a constant amount or by an amount stored in a register. The latter solution of course needs a register and it is one clock cycle longer to execute. So one prefers shifting by a constant, if possible. Here's the solution:

static inline int Ror( int data, int rot )
{
    asm volatile
    (
        "mov    %[o],%[i],ror %[s]  \n\t"
        : [o] "=r,r" (data)
        : [i] "r,r"  (data),
          [s] "M,r"  (rot)
    );

    return data;
}

If you use this, then if the 'rot' argument is a constant between 0 and 32, then constant shift will be used, otherwise a register will be loaded with the shift amount and register specified shifting will be used.

I believe that is enough to introduce you to the inline assembly facility in gcc. If you want more (and there is a lot more!), read the gcc info page and experiment. If you sepcify the -S swith to the compiler, it will generate assembly output, so you can see what it created from your C source.

It's nice an' all, but ... it has quirks.

It does, no doubt. In the following I list some of the issues that I've come across. I hope that you've guessed that I like gcc. In fact, I've been using it as my main compiler for some 18 years. However, sometimes I feel that gcc is on a wrong path: the increase in its complexity is not reflected in the quality of its output, at least not when you compile C code. It is possible that it generates brilliant output from C++, Java, Fortran, Ada and whatever else it compiles, but I am using C so that's where my focus is. After all, back in the good old days gcc used to be the "GNU C compiler", it was just later that it became the "GNU compiler collection". Furthermore, gcc's interpretation of the standard or the choice it makes when the standard is ambiguous or grants freedom to the implementer is not always aligned with common sense or the users' interest.

The rest of this page is a mixture of information about gcc pecularities, unusal or counter-intuitive behaviour and sheer rants over gcc features about which I have a personal grudge. As with anything else on this page, there is no particular order or logic in the selection of topics.

Initialised variables

If you define an initialised constant data object then gcc will put it into the .rodata or .cdata segment. A static or global variable that is initialised will go to the .data segment. If you define a variable that is not initialised, it then goes into the .bss segment. That is:

const  int x = 3; // Goes to .rodata
static int y = 3; // Goes to .data
int z;            // Goes to .bss

So far so good, that's what you expect. However, consider this:

static int x = 1; // Goes to .data
static int y = 0; // Goes to .bss

It is because gcc assumes that the .bss segment is initialised to 0, therefore, to save initialiser space, it puts every object that is initialised to 0 into that segment. That is a neat feature, if you indeed set the .bss to 0. However, sometimes one sets it to some other value, for example 0x55 for debug reasons. You may not initialise it at all, after all it is called the uninitialised data segment for a reason. Or maybe you just fancy all your initialised data being in the .data segment, regardless of the initialiser value.
If that is the case, use the -fno-zero-initialized-in-bss command line switch.

Volatiles

The standard says the following:

	An assignment operator stores a value in the object designated by the left
	operand. An assignment expression has the value of the left operand after
	the assignment, but is not an lvalue.

According to a member of the C standard commity whom I had correspondence with regards to this issue, the above is, intentionally, slightly ambiguous, to give some freedom for the implementers.
In gcc's interpretation it means that if you have an assignment expression that assigns a value to a volatile object, then the value of the expression is whatever you can read back from the volatile. For example, when looking at this code fragment

volatile int a, b;

    a = b = 0;

most people would think that the statement sets both a and b to 0. Not so. What gcc actually does is this:

    b = 0;
    a = b;

and of course if b is indeed a volatile object, or, say, a non-readable hardware register, then that might have very hard to debug consequences.
You can create very hairy examples, volatile pointers to volatile pointers to volatile objects where a few neatly placed side-effect operators make the whole operation completely unpredictable in gcc's interpretation. Here's a somewhat artificial (but very simple) example that shows you a possible pitfall. Let's assume that you have a single HW register that is the access point to a FIFO, both in the read and write direction. You want to write a string into the transmit FIFO (and you know that it will fit). If you naively do this:

#define FIFO    (*((volatile char *) 0x12345678))


void    LoadString( const char *string )
{
    while ( FIFO = *string++ );
}

believing that the compiler would take it as

    loop:
        temp = *string++;
        FIFO = temp;
        if ( temp != 0 ) goto loop;

then you'll be surprised to discover that what the compiler actually does is this:

    loop:
        temp = *string++;
        FIFO = temp;
        temp = FIFO;
        if ( temp != 0 ) goto loop;

A small change of semantics would make everything simple. If one considered that the value of an assignment expression is not what you can read back from the LHS after the assignment but what you wrote to it, then everything would become very clear and predictable, including the exact number of volatile reads and writes.

I went to the gcc developers and asked if they would consider a command line switch that would make an assignment expression's value to be the value that gets assigned to the left hand side, but they, to put it very mildly, refused it.

If you use volatiles, restrict yourself to one volatile per expression and never use volatiles with operators that can both read and write an object.
If you break that rule, then gcc can, and will, generate an unpredictable number of spurious reads and use the values read by them in an unpredictable order.

Bitfields (especially volatile bitfields)

Sometimes it is very tempting to define HW registers as bitfields. With gcc there is a small caveat, though: Don't you even think about it!

Consider the following:

struct s_hwreg {

    unsigned int    fld1 : 3,       // Some field
                    fld2 : 5,       // An other field
                    fld3 : 1,       // Just a bit
                    fld4 : 20;      // A wide field
};

static volatile struct s_hwreg * const myreg = (void *) 0x8000e000;

Now you expect that you can write code like

    myreg->fld2 = 13;

and that would be equivalent to

    tmp  = *((unsigned *) 0x8000e000);
    tmp &= 0xf8;
    tmp |= 13 << 3;
    *((unsigned *) 0x8000e000) = tmp;

And, indeed, that is the case. So, you think that you are winning and start writing code like this:

void    foo( void )
{
struct s_hwreg  init;

    // Build the register value in a memory word

    init.fld1 = 4;
    init.fld2 = 13;
    init.fld3 = 0;
    init.fld4 = 0;

    // Store the word in the register in one hit

    *myreg = init;
}

You compile that with gcc 4.4.1, for example, and you get the correct result:

foo:
        ldr     r3, .L11        // Get the address of myreg
        mov     r2, #108        // The final value
        str     r2, [r3, #0]    // Store it
        bx      lr              // and return

Alas, if you compile it with 4.0.1, you get a very different result:

foo:
        ldr     r1, .L11        // Address of myreg
        ldr     r3, .L11+4      // A mask
        ldr     r2, [r1, #0]    // Fetch myreg
        and     r3, r2, r3      // mask it
        str     r3, [r1, #0]    // store it
        ldrb    r3, [r1, #1]    // Fetch second lowest byte of myreg
        bic     r3, r3, #1      // Clear a bit in it
        strb    r3, [r1, #1]    // Store the byte back
        ldr     r3, [r1, #0]    // Fetch myreg as a word
        bic     r3, r3, #144    // Clear some bits
        orr     r3, r3, #104    // Set some bits
        str     r3, [r1, #0]    // Store it back
        ldr     r3, [r1, #0]    // Load it again
        bic     r3, r3, #3      // Clear some bits
        orr     r3, r3, #4      // Set some bits
        str     r3, [r1, #0]    // Store it back
        bx      lr              // and return

As you can see, even though you would think that you told the compiler to access your volaitile register only once, writing a value to it, it actually read and wrote the register four times, sometimes as a word, sometimes as a byte. Which, I am sure, it did legally, if you read the fineprint in the C99 standard and interpret it in a certain way. If you interpret it in an other way, you get the gcc 4.4.x version. The possibilities are endless.
Nevertheless, you may decide that since you use 4.4.x gcc, you still feel the temptation of using bitfields. However, you don't like the fact that you need to introduce an interim variable when you set every field at the same time, so you decide to use a gcc extension, namely a constructed composite value, like this:

void    foo( void )
{
    *myreg = (struct s_hwreg) {
                .fld1 = 4,
                .fld2 = 13,
                .fld3 = 0,
                .fld4 = 0
            };
}

Alas, this is what you get with 4.4.1:

foo:
        ldr     r3, .L15
        ldr     r2, [r3, #0]
        bic     r2, r2, #3
        orr     r2, r2, #4
        str     r2, [r3, #0]
        ldr     r2, [r3, #0]
        bic     r2, r2, #144
        orr     r2, r2, #104
        str     r2, [r3, #0]
        ldr     r2, [r3, #0]
        bic     r2, r2, #256
        str     r2, [r3, #0]
        ldr     r2, .L15+4
        ldr     r1, [r3, #0]
        and     r2, r1, r2
        str     r2, [r3, #0]
        bx      lr

This is basically setting the bitfields one at a time, which is for some reason is based on yet an other interpretation of the C standard. This plurality of interpretations is great and all, although it does not help you in practice. Of course, that's your personal problem, and as such, not a concern of the gcc people.

I went to the gcc developers with a suggestion. Namely, I suggested that if and only if the user explicitely enables it with a command line switch, then the comma operator is to get a special meaning with regards to bitfield operations. In particular, if you write this:

    myreg->fld1 = val1;
    myreg->fld2 = val2;
    myreg->fld3 = val3;

then the compiler must read and write *myreg three times because it is volatile. What I suggested was that if you instead write

    myreg->fld1 = val1, // Note the comma instead of semicolon
    myreg->fld2 = val2, // Still comma, not semicolon!
    myreg->fld3 = val3;

then if all expressions separated by the commas are bitfield assignments and they refer to bitfields within the same word (or whatever memory unit you declared them to be a bitfield of), then the compiler would collect them and would produce a single read and single write (or, if you set every field in the structure, just a single write). In effect, it would be guaranteed to be the same as

<type_of_myreg> tmp;

    tmp = *myreg;
    tmp &= ~( FLD1_MASK | FLD2_MASK | FLD3_MASK );
    tmp |= (val1 << FLD_SHIFT) | (val2 << FLD2_SHIFT) | (val3 << FLD3_SHIFT);
    *myreg = tmp;

but of course you wouldn't need to define all the masks and shifts and you would not need to explicitely involve a temporary variable.

This way you could use bitfields and very neatly map HW to C constructs and still have exact control over the reads and writes of your HW registers.
Naturally, I have been told to buzz off in short order, because what I suggested was not in the C99 standard.

Now while the above shown abominations generated by the compiler might be standard compliant, they are totally useless in practice. Hence the advice: Do not try to use C bitfields to describe hardware bitfields, regardless of how elegant, useful and appealing that would be. On the other hand, having a printed copy of the C99 standard (well, you have to purchase it) and bowing three times a day towards it will probably significantly speed up you time critical, HW register intensive embedded application.

NULL pointer

The standard says:

	An integer constant expression with the value 0, or such an expression cast
	to type void *, is called a null pointer constant.
	If a null pointer constant is converted to a pointer type, the resulting
	pointer, called a null pointer, is guaranteed to compare unequal to a
	pointer to any object or function.
	The unary * operator denotes indirection. If the operand points to a
	function, the result is a function designator; if it points to an object,
	the result is an lvalue designating the object. If the operand has type
	‘‘pointer to type’’, the result has type ‘‘type’’. If an invalid value has
	been assigned to the pointer, the behavior of the unary * operator is
	undefined.
	[...]
	Among the invalid values for dereferencing a pointer by the unary *
	operator are a null pointer, an address inappropriately aligned for the
	type of object pointed to, and the address of an object after the end of
	its lifetime.

It sounds reasonable, except that

That's what gcc does. Consider the following code fragment:

void print_content( unsigned int *ptr )
{
unsigned int value;
char *name;

    value = *ptr;

    if ( ptr == NULL )

        name = "ARM reset vector";
    else
        name = "some other location";

    printf( "%s is 0x%08x\n", name, value );
}

Now when gcc sees the value = *ptr; statement it concludes that the pointer is not NULL, since you dereferenced it. Therefore, it simply omits the whole if statement and sets name to the "some other location" string. If the pointer was actually NULL, then you broke the rules and thus the compiler had every right to do whatever it wanted, including simply not generating code for your if.
The situation is more serious if you go into a loop that should halt after you fetched the value from 0. Something like this:

    do {
        ptr = something_generating_the_next_ptr();
        value = *ptr;
        do_something_with_value( value );
    } while ( ptr );

because then gcc will simply remove the terminating condition and generate code for an infinite loop.

I went to the gcc developers, but they would not consider contemplating the idea of 0x0 being a valid address or that any processor could possibly utilise its entire address range. In fact, I was told that if you want to do that, you should program in assembly rather than C. To me it seems to be in clear conflict with the naive and uneducated urban legend that the C language was created to eliminate the need of using assembly for low level, close to hardware code. Unlike the volatile issue, however, it has a solution:

If you ever think of dereferencing a NULL pointer (a perfectly safe and, in my opinion, legal operation on the ARM and on many other processors) before testing it for being NULL, use the -fno-delete-null-pointer-checks command line switch for the compiler.

Code generator issues

The fact that the latest gcc installation needs a polyhedra library, that in turn relies on the infinite precision math library and they can only be compiled by the almost latest GNU C++ compiler themselves would make you believe that gcc has some extra heavy-duty code analysis and optimisation going on and it would beat you in translating C to assembly any day.
Well, that's not the case. All that high-level math is apparently used for something other than compiling C to assembly. As a simple C compiler gcc is a good but not exceptional compiler, with its own set of problems. Its code generator is anything but spectacular. In addition, sometimes it is not compatible with itself; you have to be aware of gcc's behavioural changes if you want to write code that compiles correctly and without a warning on several compiler revisions. Furthermore, increased gcc revision numbers are no guarantee for better output; a newer gcc version might generate bigger and/or slower executable from the same source as its predecessor.

Sometimes the optimiser in gcc does absolute wonders. Other times, alas, it does horrible things. When you compile for the THUMB, it generates reasonably good code, considering the restrictions of the THUMB instruction set. When you compile for the ARM instrution set, however, if you have a routine that is particularly speed critical, you should check the assembly generated by the compiler. Sometimes the code it spews out is terrible. You can then try to experiment by writing the same function slightly differently, to help the compiler to end up with at least half-decent code. But beware, gcc is smart. If it recognises that you want to trick it, it will generate the most inefficient assembly as punishment.

Automatic inlining of small functions

When you have a small static function, gcc may inline it because that way it can get rid of the function call overhead. This what the manpage says:

[...] integrate functions into their callers when their body is smaller
than expected function call code (so overall size of program gets
smaller).  The compiler heuristically decides which functions are
simple enough to be worth integrating in this way.

It's quite neat. Except that the heuristics doesn't always get it right. It seems that gcc considers that if inlining a function increases code size only by a small amount, then it can inline it even when it optimises for size. Now consider this:

#define UART_DATA_REG   (*((volatile int *) 0x12345674))
#define UART_STAT_REG   (*((volatile int *) 0x12345678))
#define UART_STAT_BSY   1


static void char_out( int data )
{
    while ( UART_STAT_REG & UART_STAT_BSY );
    UART_DATA_REG = data;
}

static void byte_out( int byte )
{
static const char *t = "0123456789abcdef";

    char_out( t[ ( byte >> 4 ) & 15 ] );
    char_out( t[ byte & 15 ] );
}

void    WriteHex( int word )
{
    byte_out( word >> 24 );
    byte_out( word >> 16 );
    byte_out( word >> 8  );
    byte_out( word );
}

Inlining char_out() increases that function just a bit, let's call it n. Thus it gets inlined twice in byte_out(). Inlining byte_out() increases its size by a small amount, call it m. So gcc inlines that too. The net effect is that the entire file after the inlining will contain a single function, namely WriteHex() which will be 4m+8n larger than without inlining. Using gcc 4.4.3 and -Os, the above code if you disable inlining will compile to 124 bytes but if you let the compiler do what it thinks is best for you, it will generate code that is 280 bytes long. If you generate code for the THUMB, the sizes are 90 and 180 bytes, respectively. We are talking about blowing up code size by a factor of 2, while optimising for size!

There is a solution, though. If you compile for size, you should also use the -fno-inline-small-functions. In that case gcc will inline a small function only if you declare it inline but not on its own accord.

Constants

When you use an expression that gcc can work out to be a constant, it will use that constant instead. Unfortunately, it is not always the shortest or fastest way of doing things. Consider this:

unsigned int a, b;

void foo( void )
{
    a = 12345678;
    b = a >> 1;
}

Shifting takes 1 instruction (and one clock) in both ARM and THUMB mode. Unfortunately, gcc will generate a second constant load for the second statement, costing an extra 32-bit word and at least 2 extra clock cycles. If you compile for the ARM and use -O2, then it's even worse, because then gcc builds constants from 8-bit immediates, using shifts and or-s. Sometimes that saves you space and/or time, but gcc uses that technique even if it is slower and/or longer than a simple load. According to the gcc developers, this issue comes up every now and then, but there aren't enough people complaining, so it remains unfixed.

Conditional instructions

On the ARM every instruction is conditional. That feature can come very handy, because jumps are expensive. A very simple example is a function that returns 0 or 1 depending on some condition:

int is_equal( int a, int b )
{
    if ( a == b )
        return 1;
    else
        return 0;
}

With every instruction being conditional, the function can be compiled to the following sequence, where each line corresponds to a single assembly instruction (arguments a and b arrive in R0 and R1, the return value is in R0):

    tst     r0,r0   // Z-flag = ( R0 == R1 );
    moveq   r0,#1   // if ( Z-flag == true ) R0 = 1;
    movne   r0,#0   // if ( Z-flag == false ) R0 = 0;
    bx      lr      // return R0;

That means that the whole function up until the return costs you 3 instructions and 3 clock cycles, while a jump costs at least 3 clocks on its own. So you will save runtime and possibly code space with the conditional execution. Indeed, that's what gcc will compile. The problem, however, is that when gcc decides whether to use jumps or conditional instructions, sometimes it makes a wrong choice and will generate lengthy conditional sequences, that cost a lot more than a jump would. An other problem is that if the two conditional sequences have common elements (that get executed in both sequences) gcc will generate those into both sequences, wasting space and time. For example, if the function above would return long long instead of int, then the upper 32 bits of the return value must be set to 0. In that case gcc will generate an instruction that loads 0 to R1 if the Z bit is set and generate an other one that will load 0 to R1 if the Z bit is clear. When these two problems combine, you can end up with long code fragments that are twice as long as they should be (and execute in twice the time). There is nothing you can do about it, apart from hoping that at one day that will be fixed in gcc.

64-bit integers

In theory handling 64-bit integers on the ARM is pretty simple. After all, they are just 2 32-bit ones concatenated. While this is mostly true with gcc, sometimes gcc's peephole optimiser misses some very basic things. If you want to set a bit in a long long and you OR your variable with this single bit like this:

long long foo( long long x )
{
    return x | 128;
}

then since long longs are passed and returned in r1:r0, you would expect the function to compile to something along these lines; the comments explain what happens:

foo:                        // At entry x is in R1:R0
    orr     r0,r0,#128      // R0 = R0 | 128
    bx      lr              // Return, the result is in R1:R0

but gcc 4.4.1 with -Os will generate the following code (note that even though r2, r3 and r12 can be freely used as workspace, it chooses r3 and r4 which makes saving and restoring r4 necessary). Again, if you don't know ARM assembly, there's a C explanation in the comments:

foo:                        // At entry, 'x' is in R1:R0
                            // Registers R2, R3 and R12 can be used freely.
    str     r4, [sp, #-4]!  // Save R4 on the stack
    mov     r3, #128        // R3 = 128
    mov     r4, #0          // R4 = 0
    orr     r3, r3, r0      // R3 = R3 | R0
    orr     r4, r4, r1      // R4 = R4 | R1
    mov     r1, r4          // R1 = R4
    mov     r0, r3          // R0 = R3
    ldmfd   sp!, {r4}       // Restore R4 from the stack
    bx      lr              // Return, the result is in R1:R0

To blow up a function from 2 instructions to 9 and its run-time from 4 clocks to 14 can hardly be called "optimising". Quite interestingly, if you compile for the THUMB, then gcc suddenly becomes smart enough to realise that it is in fact a very simple function and generates the code you'd expect:

foo:
    mov     r3, #128
    orr     r0, r3
    bx      lr

Inline assembly qualifiers

This section will probably be meaningless if you are not familiar with gcc's inline assembly facility. On the other hand, if you know and use inline assembly, you might find it quite useful.

When you write inline assembly code, sometimes you have to pass or receive some data that occupies more than a single register, such as a long long or a double. The inline assemby of gcc associates only one register name with tha parameter, so how do you access the other register that holds your data? Well, there are qualifiers for that. Unfortunately, they are different for different target architectures and, the even worse part, are not documented at all. As it was confirmed on the gcc developer mailing list, the 'documentation' is to find a certain target-specific function in the gcc sources, follow the code and work out what it does with the various qualifiers (there are some useful comments there). If you are interested, the file in question, at least for gcc 4.4.3 and ARM target, is gcc-4.4.3/gcc/config/arm/arm.c. You are looking for the function called arm_print_operand(), around line 13,500.

For your convenience, here are some modifiers operating on double-word data (and an other one that doesn't) that can be useful:

Unfortunately, there's a small problem on ELF targets. On ELF doubles are stored in big-endian word order regardless of the endianness of your machine. Alas, the Q and R qualifiers do not take that into account, thus on ELF they select the wrong register. On EABI it works, because EABI stores doubles according to the processor's endianness. According to the gcc people it will not be fixed, because it is not important. For the gcc developers, that is.

Generated on Fri Aug 13 12:02:24 2010 by  doxygen 1.6.3