Porting and hacking the library

Porting the library to a different toolset

This might seem to be an easy thing, but it is not necessarily so. There are parts of the library that do not really depend on the compiler you use (for example the crypto library) but at many places the library uses gcc extensions. Furthermore, the assembly code in the library relies on the GNU assembler and gcc's ability to run the code through the C preprocessor before passing it to the assembler.

In addition, all assembly functions are aware and rely on the calling conventions of gcc. Since gcc supports both the ELF and the EABI conventions, the routines have to know which one they are dealing with, so they use gcc specific predefined symbols to decide. (Fortunately, there are only a handful of functions where the EABI and ELF differences matter.)

That's about all one can say about the toolset switching in general. I am sure that it's doable, but it might take considerable effort.

Porting the library to a different processor

Architectural issues

The library is targeted to the ARM7TDMI core. Consequently it has a few assumptions about the processor:

If your processor is a 32-bit, unified address space chip, then you have a good chance that all the C bits will be portable without much headache. If your chip is a 16-bit one but you can tell gcc to treat the basic data types according to the above, then you can port the C bits, but it will not be too efficient, because the library assumes a 32-bit processor and tends to use that to its advantage. If your chip has separate address spaces, then probably porting this library is simply not worth the hassle.

Assembly routines

Obviously, you have to rewrite pretty much every assembly routine from scratch. The assembly routines in the C library are there for speed only and they have C equivalents, so as long as you don't mind losing the speed advantage, you can live without them. There are a handful of very short assembly functions in the misc section of the C library without a C equivalent, but these functions can be written in C in a matter of minutes. They are in assembly is because they exploit the instruction set peculiarities of the ARM core and thus are much faster than what you could achieve in C.

In the math library the square root calculations and the Payne-Hanek range reduction are the only assembly routines. Everything else is C. It should not be too hard to port that three functions, although the Payne-Hanek routine relies heavily on the 32x32->64 multiply-accumulate instruction, which you will have to synthetise if your chip doesn't have one.

The crypto library is C, except for some low level routines that work on very large numbers. However, all of those assembly routines are also provided in C form, so the crypto library will work after a simple recompile.

The fixpoint library is pure assembly. You will have to rewrite everything from scracth, although there is enough commenting in the assembly source to allow you to create operational, albeit significantly slower, C equivalents.

The kernel is mostly C (only the context switch and interrupt handling is written in assembly) but even the C routines are aware of the ARM modes and states, which are rather chip specific features. You can port it, but you have to be careful about it.

The gcc support library functions are all assembly, but they are specific to the ARM core and to gcc. You do not need to port that library at all.

Hacking the sources

Coding style

Bless me FSF, for I've sinned: I use a coding style which is the antithesis of what the GNU people recommend. I use hard tabs (set to 4 chars wide). I use K&R style curly brace positioning, and lots of whitespace both horizontally and vertically (absolution should be granted on this latter point as the liberal use of whitespace stuff was spelled out in the first K&R book, which, being the C Bible, the word of Ritchie, should be sacred for the members of the Church of C). As a personal blasphemy, after the opening curly I do not indent the block local variable declarations. Deliver us from compiler bugs, __ramend.

I also like assignments to be aligned so when I have a block of them, it's obvious what's the LHS and the RHS for all of them at the first glance.
I am a bit haphazard in naming local variables, no real convention there. Except one. I am Hungarian and I have had the privilege to be taught by Charles Simonyi's father, the late prof. Simonyi Károly at uni (a truly great educator, he was) but I would never encode the type of a variable in its name.
In function names I tend to use CamelCase (a remnant from my Amiga days). Variables, structure fields are all lowercase, typedefs and #defines are all capitals.
In addition, the sole purpose of having rules is to know what to break. That includes rules of my own. I'm not zealous about any of the above, it's more of a long-time habit or personal tradition, nothing more. If it makes sense in the context, I use upper case variable names or lower case macro names without feeling utterly bad about it.

I tend to be pretty verbose in the comments, as I've learned that the "it was hard to write so it should be hard to read" mentality can bite you on the backside when you have to work with your own code that you wrote a long time ago. I find that the extra time spent on documenting what you do, and more importantly why you do it or do it that way is paid back with interest when five years down the track you have to dust down the drive and work on that code again.
If I use some cruft, it's all explained in the comments. A library is not the obfuscated C contest and writing code that others can't follow is not a virtue. Where more in-depth understanding of a particular subject is needed, I try to give pointers to literature. Tricky algorithms are explained before the actual implementation, although tricky but well-known algorithms are not.

So, chances are, you will find my coding style ugly, an abomination and all that. Well, beauty is in the eye of the beholder. It works for me.

Organisation of the source

Here's some ASCII art that shows you the way things are organised:

Top                Makefile, config file, license, changelog
 |
 +- include        Include files that get installed
 |  +- exec        Includes for the kernels
 |  +- crypto      Includes for the crypto library
 |  +- misc        Includes for the miscellaneous functions in the C library
 |
 +- internals      Include files used internally by the library
 |
 +- libc           The C library
 |  +- ctype       The character classes functions
 |  +- stdio       The formatted I/O core functions
 |  +- stdlib      The numeric conversion functions and other goodies
 |  +- setjmp      The non-local jump functions
 |  +- string      The string and memory manipulation functions
 |  +- time        The time conversion functions
 |  +- wchar       The wide character conversion functions
 |  +- misc        Miscellaneous useful but not standard functions
 |
 +- libm           The math library
 |
 +- fixp           The fixpoint library
 |
 +- exec           The pre-emptive kernel (implemented as a library)
 |
 +- coop           The co-operative kernel (implemented as a library)
 |
 +- cryp           The cryptographic library
 |  +- nanu        The natural numbers module
 |
 +- gccs           The gcc support library replacement functions
 |
 +- docs
 |  +- html        The HTML output from Doxygen
 |  +- doxygen     The documentation sources
 |     +- examples The example code used in the documentation
 |
 +- tool           The native tools that come with the library
 |
 +- objs           The temporary working directory (object files and the like)

You will find that often a single logical module is cut into two or three separate files. It is because it is a library and the smallest object the linker can pull from it is the object code for a whole file. If it is a reasonable assumption that the user will use only one of the functions that are logically in the same group, then they go into separate files. If they use common subroutines or data tables, then those go to yet an other file.

Symbols that are visible because they are called/referenced from different files in the library but should never be used by the user code all have a _<PFX>_object_name format, where <PFX> is a prefix, starting with the letter 'b' (for Bendor) followed by some shorthand referring to what part of the library it is in. It is to avoid name clashes with symbols defined by the code that uses the library.

Makefile organisation

The main makefile is Makefile in the top directory. It contains everything, all the rules, functions and so on. Each library is built by starting a submake in the relevant directory. The Makefile in that directory contains a single definition, FILES that lists all the source files of the library. Then it includes the top-level Makefile to get all the rest, such as the rules. When the top level make starts the submake, it also passes the target and a few definitions on the command line as well as a bunch of exported variables. One of them, SUBMAKE is used to tell the top level makefile if it is included in a submake (in which case it should supply the subtarget rules) or is controlling the top level make (in which case it should start the submakes).
A notable exception is the Makefile in the tool directory. The code there should build native applications, which is very different from crosscompiling a library. Therefore, the tool building make is a self-contained unit which does not include the main makefile. Rather, it defines all rules, targets and so on on its own.