Standard input/output formatting functions
[C library]

Core functions for formatted input and output. More...

Defines

#define EOF   (-1)
 End of file marker.

Typedefs

typedef int(* BLC_FMT_PUT )(void **, const char *, size_t)
 Prototype for the output function of print_xxx().
typedef int(* BLC_FMT_GET )(void **)
 Prototype for the input function of scan_int and scan_flt().
typedef int(* BLC_FMT_UNG )(int, void **)
 Prototype for the pushback function of scan_int() and scan_flt().

Functions

int print_flt (const char *fmt, va_list arg, size_t len, BLC_FMT_PUT fun, void **par, int(*w2m)(char *, wchar_t **))
 Formatting core for all printf() type functions.
int print_int (const char *fmt, va_list arg, size_t len, BLC_FMT_PUT fun, void **par, int(*w2m)(char *, wchar_t **))
 Formatted print core without floating point conversions.
int scan_flt (const char *fmt, va_list arg, BLC_FMT_GET get, BLC_FMT_UNG ung, void **par, int(*m2w)(wchar_t **, const char *, size_t))
 Formatted input core.
int scan_int (const char *fmt, va_list arg, BLC_FMT_GET get, BLC_FMT_UNG ung, void **par, int(*m2w)(wchar_t **, const char *, size_t))
 Processing formatted input core without floating point conversions.

Detailed Description

Core functions for formatted input and output.

The C standard defines a bunch of input/output functions. However, those functions depend on the notion of files and byte streams, defined and associated with physical devices by the underlying operating system. Since this library is targeted to embedded systems, where the concept of files and byte streams may or may not exist at all and device abstractions vary from system to system, this library contains only the formatting core of the input and output functions. These cores can be used to realise all the standard designated printf() and scanf() family functions. You have to write a little wrapper around the core, to provide the standard interface. Furthermore, you have to supply the lowest level byte I/O functions that will be called by the core functions. The details are explained in the function descriptions.

To use these functions you have to include stdio_format.h.


Typedef Documentation

typedef int(* BLC_FMT_GET)(void **)

Prototype for the input function of scan_int and scan_flt().

This function is called whenever a scan_int() or scan_flt() needs a new character from the input. The interpretation of the argument is up to the function, scan_int() and scan_flt() just pass it to the function but never modify it. The function should return a positive character value if the read from the input was successful, EOF if the input was exhasuted and a negative error code (other than EOF) in case there was a read error. In the case of an error code being returned, scan_int() or scan_flt() will abort and return the same error code.

typedef int(* BLC_FMT_PUT)(void **, const char *, size_t)

Prototype for the output function of print_xxx().

This function is called whenever a print_int() or print_flt() function wants to write a partial result. The interpteration of the first argument is up to the function, the print_int() or print_flt() functions never modify that parameter. The second argument is a pointer to the string to be written and the third is the number of characters to write.
The function should return a non-negative number in case of success or a negative error code on failure. If the function returns error, the print_int() or print_flt() function will abort and return the same error that this function returned.

typedef int(* BLC_FMT_UNG)(int, void **)

Prototype for the pushback function of scan_int() and scan_flt().

This function is called whenever a scan_int() or scan_flt() wants to push a character back to the input. At most one character is pushed back before consuming the input again. After calling this function it is expected that the next read from the input will return the same character as was pushed back. Note that scan_int() and scan_flt() guarantees that the character pushed back will be the last character obtained from the input. The first argument for the function is the character to push back, the second is a parameter which can be interpreted by the function in any way. This is the same parameter that is passed to the character read function and it is never modified by scan_int() or scan_flt(). The function should return a non-negative value if the pushback was successful and a negative error code (other than EOF) if the pushback failed.


Function Documentation

int print_flt ( const char *  fmt,
va_list  arg,
size_t  len,
BLC_FMT_PUT  fun,
void **  par,
int(*)(char *, wchar_t **)  w2m 
)

Formatting core for all printf() type functions.

The function implements the formatted print function, as specified in the C99 standard, with some exceptions (see Details). This is an internal function, called by the actual API functions from the printf() family. It is needed when you want to write your printf(), fprintf() etc. functions that this library can not provide, since they require knowledge of your system's handling of output devices.

Parameters:
fmt The format string.
arg The argument list (after the format string). Before you call this function, you have to call va_start() and pass the resulting va_list object in this argument.
len The maximum number of characters sent to the output. If you want no limit, pass -1.
fun The function that outputs a number of characters.
par This is the first parameter that is passed to (*fun)().
w2m The address of the wcitomb() function. Since the actual library function that is invoked when wcitomb() is called depends on whether you used the -fshort-wchar compiler switch and whether you have defined the CONFIG_RFC3629_CONFORMANCE macro, print_flt() can not know the actual function when the library is built. If you pass NULL for this parameter, then the 'l' size modifier for 's' and 'c' conversions will be ignored.
Returns:
The number of bytes written to the output or the number of bytes that would have been written if the maximum number of output characters were not limited. If the return value is 0, then no output was generated. If the return value is negative, it indicates an error code. If the number of output characters were limited, then a return value greater than the limit indicates that the output was truncated.
Warning:
The print_int() and print_flt() functions both use a lot of stack. You can expect a stack requirement of about 200 bytes.
Format string details:
The function reads the format string and copies it into the output until a conversion specificator is found. The conversion specificator begins with a '%' character. If the next character is also a '%' then a single '%' is sent to the output and no argument from the variable argument list is consumed. Otherwise, the specificator has the following form:

%[flags][width][.[precision]][length]conversion


Apart from the conversion specifier all other fields are optional. The details of the various fields are as follows:

Flag characters
The flag field is one or more characters from the following set:
'+'
For signed numeric conversions, a '+' symbol will be placed in front of a non-negative result. Without this flag only a negative sign is printed. For non-numeric conversions the flag is ignored.

' '
For signed numeric conversions, non-negative numbers will have a space in front of them, as a placeholder for the signum. If both ' ' and '+' are given, '+' is used. For non-numeric conversions the flag is ignored.

'0'
For numeric conversions, the number will be zero padded on the left (as opposed to space padded) to fill the specified field width. If the '-' flag is also specified, the '0' flag is ignored. For non-numeric conversions the flag is ignored. For integer conversions if precision is also specified, the flag is ignored.

'-'
The converted object should be left-aligned (the default is right aligned). The padding character is always space. Only the 'n' conversion ignores this flag.

'#'
Use alternate output formatting. For 'x' or 'X' conversions of a non-zero number the "0x" or "0X" strings will be prepended to the result. For 'o' conversion a non-zero result will be prefixed with a '0' character. For float conversions the decimal point is always written and for g and G conversions trailing zeroes are retained. In all other cases the flag is ignored.



Field width
It is either a sequence of decimal digits or the single character of '*'. A decimal digit sequence is interpreted as a decimal number. The '*' character will fetch the next int argument from the argument list and use that as the width. The value of the field width is the minimum number of output characters generated by the conversion. Depending on the flags and the conversion, the padding can be left or right, using spaces or '0'-s (only on the left). If the field width is specified by the '*' character and the value of the next argument is negative, then the left align flag is set and the absoulte value of the field width will be used.
Note that the field width is the minimum number. If the result of the conversion is longer than the specified field width, then the field width will be ignored. Specifying a field width will never cause the output being truncated.
The C99 standard also specifies the form of *m$ where m is a sequence of decimal digits, meaning that the m-th argument on the variable argument list should be used. This is not implemented and is not recognised. In addition, the decimal character sequence is not checked for overflow.

Precision
The precision starts with a period, optionally followed by a sequence of decimal digits or the single character of '*'. A decimal digit sequence is interpreted as a decimal number. The '*' character will fetch the next int argument from the argument list and use that value for the precision. If negative value is obtained through '*' or the period is not followed by a decimal digit or a '*', then the precision is set to 0.

The interpretation of the precision depends on the actual conversion. For integral conversions it is the minimum number of digits that should be generated, if the number has less digits, then 0-s will be prepended to it. For float conversions it is the minimum number of digits that follow the deciaml point. For non-numeric conversion the precision is ignored.
The C99 standard also specifies the form of *m$ where m is a sequence of decimal digits, meaning that the m-th argument on the variable argument list should be used. This is not implemented and is not recognised. In addition, the decimal character sequence is not checked for overflow.

Length modifier
The interpretation of the next argument can be affected by the length modifier. The exact interpretation of the length modifier depends on the conversion and is detailed for each conversion specifier separately. The following length modifiers are defined:
"hh"
Indicates char or unsigned char argument.

"h"
Indicates short or unsigned short argument.

"l"
Indicates long, unsigned long or wchar_t argument.

"ll"
Indicates long long or unsigned long long or long double argument.


"hh"
Indicates char or unsigned char argument.

"L"
Indicates long double argument.
Recognised and a long double argument is fetched but it is actually converted to a double and processed as such.

"j"
Indicates intmax_t argument.
This implementation treats it as if it were an "ll".

"t"
Indicates ptrdiff_t argument.
This implementation treats it as if it were an "l".

"z"
Indicates size_t argument.
This implementation treats it as if it were an "l".



Conversion specifiers
This single character field is compulsory. If it is missing, then the function will skip the '%' that started the conversion and print the characters following it.
'd' 'i'
Fetches the next argument and converts it as a signed decimal number. If the "l" or "ll" (or equivalent) length modifiers were indicated, then the appropriate sized integer is fetched from the argument list. The "h" and "hh" length modifiers are ignored due to C's default integral promotion rules. If the precision is 0 and the number fetched is zero, then no output is generated, even in the presence of a '+' or '-' flag. The '+', '-', '0' and ' ' flags are obeyed, the '#' flag has no effect on the conversion. If precision is not specified, the default value is 1.

'u'
Fetches the next argument and converts it as an unsigned decimal number. If the "l" or "ll" (or equivalent) length modifier was indicated, then the appropriate sized integer is fetched from the argument list. If the "h" or "hh" length modifiers were used, then the fetched number is masked to the range of unsigned short or unsigned char, respectively. If the precision is 0 and the number fetched is zero, then no output is generated. The '-' and '0' flags are obeyed, the '#', '+' and ' ' flags are ignored. If precision is not specified, the default value is 1.

'o'
Fetches the next argument and converts it as an unsigned octal number. If the "l" or "ll" (or equivalent) length modifier was indicated, then the appropriate sized integer is fetched from the argument list. If a "h" or "hh" length modifier was used, then the fetched number is masked to the range of unsigned short or unsigned char, respectively. If the precision is 0 and the number fetched is zero, then no output is generated. If precision is not specified, the default value is 1. The '-' and '0' flags are obeyed, the '+' and ' ' flags are ignored. If the '#' flag is set and the number is not zero, then a '0' will be prepended to it to indicate an octal number.

'x' 'X'
Fetches the next argument and converts it as an unsigned hexadecimal number. If the "l" or "ll" (or equivalent) length modifier was indicated, then the appropriate sized integer is fetched from the argument list. If one of the "h" or "hh" length modifiers was used, then the fetched number is masked to the range of unsigned short or unsigned char, respectively. When the number is converted, for digits above 9 the letters abcdef will be used for x conversion and ABCDEF for X conversion. If the precision is 0 and the number fetched is zero, then no output is generated. If precision is not specified, the default value is 1. The '-' and '0' flags are obeyed, the '+' and ' ' flags are ignored. If the '#' flag is set and the number is not zero, then a "0x" or "0X" (for x and X conversions, respectively) will be prepended to the result.

'p'
Fetches the next void * argument and prints it as a hexadecimal number. The length modifiers are ignored. If the pointer is NULL then the string (nil) is printed, otherwise the behaviour is the same as that of the %#...x conversion.

'n'
Fetches the next void * argument. Depending on the length modifier, the pointer will be interpreted as char *, short *, int *, long * or long long * (the default being int *) and the number of output characters generated (but not necessarily emitted) so far will be stored at the pointed location. As a safety measure, the data is not stored if the pointer is NULL. Flags, field width and precision are all ignored and no output is generated.

'E' 'e'
Fetches the next double argument and prints it in the form of [-]d.dddE-dd. If the conversion character is E, then the exponent separator is 'E', for e format 'e' is used. The signum is written if it is negative, or if the ' ' or '+' flags were given. If the precision is specified, then it is the number of digits written after the decimal point. If precision is not specified, the default is 6. If the precision is 0, then the decimal point is omitted, unless the '#' flag was specified. The decimal digit in front of the decimal point is zero if and only if the number is zero. Note that even in that case the result can be negative as IEEE-754 floats distingish between positive and negative zero. The signum of the exponent is always printed and the exponent is at least two digits. If the '0' flag was specified, then if the field width is larger than the space needed for the number, the result will be zero padded between the signum and the integer decimal digit. If the number to convert is NaN, then the string "nan" or "NAN" is printed (for e and E conversions, respectively), all flags but the '-' as well as the precision are ignored but the field width is respected. If the number to print is infinite, then the strings inf or INF are printed. The zero padding flag and the precision are ignored but the signum related and alignment flags and the field width will still be used.

'F' 'f'
Fetches the next double argument and prints it in the format of [-]dddddd.dddddd, printing as many digits as needed to represent the number (note that this may lead to several hundred digits). At least one integer digit is printed. The decimal point is printed only if at least one fractional digit is present or if the '#' flag was specified. The precision is the number of digits to be printed after the decimal point; if not specified, 6 is assumed. The treatment of the field width, flags and the handling of NaNs and infinites is the same as with the E and e conversions.

'G' 'g'
Fetches the next double argument. If the decimal exponent is less than -4 or larger than or equal to the precision, then the number will be converted according to the E or e conversion, otherwise according to f or F conversion. There is one difference, however: After the number is rounded to the given precision, the trailing 0-s from the fractional part will be removed, unless the '#' flag was specified. That is, if the number was 1.999E-9 and the precision was set to 2, the number is first rounded to 2.00E-9, then the trailing zeroes are removed, then since no fractional digits remain the decimal point is removed and thus 2E-09 is printed.

'A' 'a'
Prints the double argument in hexadecimal form. The format is [-]0Xh.hhhhP-d where h is a hex digit. After the signum first 0X or 0x is printed (for A and a conversions, respectively), then a single hexadecimal digit (actually, 1 or 0), the radix point and then a sequence of hexadecimal digits. It is followed by the exponent separator P or p (for A and a conversions, respecitvely), then the binary exponent, in decimal form. The signum of the binary exponent is always printed. The exponent is 1 to 4 decimal digits. If the precision is not given, then it will be taken as 13, since that allows an IEEE-754 double to be printed in full precision. The integer digit is 1 if the number was normalised, 0 otherwise. The hex digits after the decimal point represent the IEEE-754 mantissa (which is 52 bits). The mantissa is printed as it is, no normalisation or any other transformation is used on it. In the hex digits A-F is used for A conversion and a-f for a conversion. The exponent is the binary exponent, with the IEEE-754 bias removed. The only exception is the case of 0.0, where the exponent is printed as 0 (even though it is actually set to the maximum negative value in the double). If the number is NaN or infinite, the same rules as with E and e conversion apply. Note that if the precision is less than 13, the printed result will not be rounded, it will simply be truncated.

'c'
Fetches the next int argument and prints it as a single character. Alternatively, if the "l" length modifier was used, then a wchar_t argument is fetched and converted to a UTF-8 sequence. The sequence is then sent to the output. If the wchar_t argument is not a valid Unicode character, then nothing is printed.
The precision, the '#', '+', ' ' and '0' flags as well as any length modifier apart from "l" are ignored.

's'
If no "l" length modifier was specified, then the next argument is const char *. It is interpreted as a pointer to a 0-terminated C string. The string is copied to the output, not including the terminating 0 character. If precision was specified, then at most precision characters will be used from the string.
If the "l" length modifier was present, then the argument is a const wchar_t * and should point to an array of wchar_t that is terminated by a wide NUL character. The elements of the array are converted to UTF-8 and the result is sent to the output. If an array element is not a valid Unicode character, then it is simply ignored and skipped during the conversion. If precision is specified, then it is taken as a limit of bytes generated by the wchar_t to UTF-8 conversion. Note that the precision applies to the output and not to the wide character array itself. It is guaranteed that no partial UTF-8 sequence is sent out, therefore it is possible that the actual number of output bytes is less than the precision, even if the wide character array is not exhausted.
The '#', '+', ' ' and '0' flags as well as any length modifier apart from "l" are ignored.

Usage example
/*
*   Example implementation of printf() type functions.
*
*   We assume that you have the following functions defined:
*
*   void UartPutc( char data );
*
*       Writes a character to the UART.
*
*   int write( int file, const void *data, size_t dlen );
*
*       Writes 'dlen' bytes from 'data' to a file, whatever that is on
*       your system. The file is identified by the 'file' integer. This
*       is pretty much the same as the write() function on a POSIX system.
*
*   We then create the following functions:
*
*       printf()        - with full float and wide character support
*       fprintf()       - with float but not wide character support
*       vfprintf()      - with float but not wide character support
*       snprintf()      - integer only
*
*   as an example. That should be enough for you to write pretty much
*   any function in the printf() family.
*
*   Please realise that while the example functions seem long, that's
*   due to the comments. In fact, the longest of them contains only
*   four C statements, and simple ones of that.
*/

#include <stdio.h>          // It's your header for the functions in this file
#include <files.h>          // Your header, declaring write()
#include <uart.h>           // Your header, declaring UartPutc()

#include <stdio_format.h>   // Library header, declaring print_XXX()
#include <wchar.h>          // Library header, defining wide character support
#include <string.h>         // Library header, defining memcpy()

#include <stdarg.h>         // GCC header, declaring the vararg macros

/*
*   Declaration of our local output functions
*/

static int uart_out( void **unused, const char *data, size_t len );
static int string_write( void **par, const char *str, size_t len );

/****************************/
/*          printf()        */
/****************************/

/*
*   Print to the UART, support everything
*/

int printf( const char *fmt, ... )
{
va_list arg;        // Variable argument list object
int     ret;        // Return value

    // Get the variable argument list first

    va_start( arg, fmt );

    // Call the library's formatting function, passing it everything and
    // returning with whatever that returned to us.

    ret = print_flt(    // We need to call _flt() for float support
        fmt,            // First argument is the format string
        arg,            // Next the variable argument list
        -1,             // No limit on output amount
        uart_out,       // The wrapper function around UartPutc()
        NULL,           // No parameter for the wrapper
        wcitomb );      // Function to transform a wide char to UTF-8

    // We have to finish with the vararg.

    va_end( arg );
    return ret;
}

/*
*   This is the UART output wrapper
*/

static int uart_out( void **unused, const char *data, size_t len )
{
    // We do not use the first parameter

    (void) unused;

    // We have to output 'len' bytes from 'data' on the UART

    while ( len-- ) UartPutc( *data++ );

    // We have to return a positive number to indicate success

    return 1;
}

/****************************/
/*          fprintf()       */
/****************************/

/*
*   Print to file, support everything but wide char
*/

int fprintf( int file, const char *format, ... )
{
va_list args;
int     ret;

    // Get the argument list

    va_start( args, fmt );

    // This is an extremely simple case, as everything fits
    // the format function's requirements. The only thing is
    // that we have to cast write() to the relevant function
    // type and the file number to void **. The last parameter
    // is NULL, disabling wide character processing.

    ret = print_flt( format,                // Format string
                       args,                // Argument list
                       -1,                  // No limit on the amount of output
                       (BLC_FMT_PUT) write, // The output function
                       (void **) file,      // Parameter to the output func
                       NULL                 // Wide chars are not supported
                    );
    // Done

    va_end( arg );
    return ret;
}

/****************************/
/*          vfprintf()      */
/****************************/

/*
*   Like above but with variable argument list passed as argument
*/

int vfprintf( int file, const char *format, va_list args )
{
    // This is even simpler than fprintf() because we already
    // have the va_list prepared.

    return( print_flt( format, args, -1, (BLC_FMT_PUT) write,
                       (void **) file, NULL ) );
}

/****************************/
/*          snprintf()      */
/****************************/

/*
*   Print to a string, with size limit. Integers only.
*/

int snprintf( char *res, size_t len, const char *fmt, ... )
{
va_list arg;
int     ret;

    // Get the variable argument list

    va_start( arg, fmt );

    // Call print_int() to do the work. The string_write() function
    // will be called with the address of the pointer to the result
    // string as its first argument.

    ret = print_int( fmt, arg, len, string_write, (void **) &res, NULL );

    // We have to put a closing '\0' at the end of the string, but only
    // if we have not yet exhausted the buffer and there was no error.
    // Note that string_write() advanced the 'res' pointer to point to
    // the byte after the last byte written by format_int().

    if ( ret >= 0 && ret < len ) *res = 0;

    // Our return value is whatever print_int() returned.

    va_end( arg );
    return( ret );
}

/*
*   Helper function to write into a string.
*
*   The first parameter is a pointer to a pointer to the next character
*   in the output buffer. This function does not need to check if it
*   exhausted the available buffer space or not, the format function
*   itself takes care of that.
*/

static int string_write( void **par, const char *str, size_t len )
{
char    *p;

    // Get the pointer to the next character of the string

    p = *par;

    // Copy the data into the output buffer. We use the fast version
    // of memcpy(), memcpy_forw() that is also defined in this library.

    memcpy_forw( p, str, len );

    // Store the new end of the string so that at the next
    // invocation we know wehere to write.

    *par = p + len;

    // Report success. Any positive number would do, but we
    // return the length, because that's what a normal write
    // function would do.

    return( len );
}

int print_int ( const char *  fmt,
va_list  arg,
size_t  len,
BLC_FMT_PUT  fun,
void **  par,
int(*)(char *, wchar_t **)  w2m 
)

Formatted print core without floating point conversions.

This function has the exact same semantics and behaviour as the print_flt() function, with the following exceptions:

  • The 'a', 'A', 'e', 'E', 'f', 'F', 'g', 'G' specifiers are not recognised. If such specifiers occur in the format string, they will be ignored and simply printed as they occur in the format string.
  • The l size qualifier for 'c' and 's' conversions is ignored and the passed wide character or wide character string is treated as a normal char or char pointer.

Due to the omission of the floating point, this function is much smaller than using print_flt(). For details on the format string and usage examples see print_flt().

Parameters:
fmt The format string.
arg The argument list (after the format string). Before you call this function, you have to call va_start() and pass the resulting va_list object in this argument.
len The maximum number of characters sent to the output. If you want no limit, pass -1.
fun The function that outputs a number of characters.
par This is the first parameter that is passed to fun.
w2m An unused parameter. It is only here to make the prototypes of print_flt() and print_int() identical. It's best if you specify NULL here so that you don't have to care about type mismatches.
Returns:
The number of bytes written to the output or the number of bytes that would have been written if the maximum number of output characters were not limited. If the return value is 0, then no output was generated. If the return value is negative, it indicates an error code. If the number of output characters were limited, then a return value greater than the limit indicates that the output was truncated.
Warning:
The print_int() and print_flt() functions both use a lot of stack. You can expect a stack requirement of about 200 bytes.
int scan_flt ( const char *  fmt,
va_list  arg,
BLC_FMT_GET  get,
BLC_FMT_UNG  ung,
void **  par,
int(*)(wchar_t **, const char *, size_t)  m2w 
)

Formatted input core.

This function implements the core of the formatted input standard C library function set (the scanf() family functions). The library itself does not provide those functions as such, since your input/output configuration is unknown to the library. However, using this function you can implement those functions in only a few lines of C code, as shown by the examples below.

Parameters:
fmt The format string.
arg The argument list (after the format string). Before you call this function, you have to call va_start() and pass the resulting va_list object in this argument.
get The function that reads the next character from the input.
ung The function to call when a character must be pushed back. This function is called at most once.
par This is the parameter that is passed to get and ung.
m2w Pointer to the mbtowci() function. Since the size of the wchar_t type used by the application can not be known when the library is compiled, functions that manipulate memory storing wchar_t objects must be passed in invocation time, hence this parameter. If this argument is NULL, then the 'l' size modifier in the 'c', 's' and '[' conversions will be ignored.
Returns:
The number of specifiers successfully matched. The %n specifier is not included in the count. EOF is returned if the input was exhausted or there was a character mismatch before the first conversion, 0 is returned if the first conversion fails and a negative error code, other than EOF, is returned if the get function reported a read error from the input.
Format string details:
The function reads the format string and tries to match it with the input. A whitespace character in the format string matches any number of whitespace characters on the input, including none. Any character other than whitespace or the '%' character in the format string must match the input exactly; if a mismatch occurs, the function returns. If the formt string contains a '%' character, it is then taken as a conversion specificator. If its format is incorrect, then it is treated as a mismatch, otherwise the input is converted according to the specificator and the result is stored by the location specified by the next argument of the variable argument list. The format of the conversion specificator is the following:

%[sup][width][size]specifier


Apart from the conversion specifier all other fields are optional. The details of the various fields are as follows:

Suppression
If the first character after the '%' is a '*', then the conversion will be performed, but the result will be discarded. No pointer will be consumed from the variable argument list (but if dynamic field with is specified, then the corresponding parameter will be consumed) and the conversion count will not be incremented.

Field width
It is either a sequence of decimal digits or the single character of '$'. Note that the '$' character is a non-standard extension. A decimal digit sequence is interpreted as a decimal number. The '$' character will fetch the next int argument from the argument list and use that as the width; if the value is negative, it will be taken as unset.

Size modifier
The interpretation of the next argument can be affected by the size modifierr. The exact interpretation of the length modifier depends on the conversion and is detailed for each conversion specifier separately. The following length modifiers are defined:
"hh"
Indicates char or unsigned char argument.

"h"
Indicates short or unsigned short argument.

"l"
Indicates double, long, unsigned long or wchar_t argument.

"ll"
Indicates long double, long long or unsigned long long argument.


"L"
Identical to ll.

"j"
Indicates intmax_t argument.
This implementation treats it as if it were an "ll".

"t"
Indicates ptrdiff_t argument.
This implementation treats it as if it were an "l".

"z"
Indicates size_t argument.
This implementation treats it as if it were an "l".



Conversion specifiers
This single character field is compulsory. If it is missing, then the function will return as if a conversion mismatch happened.
'd'
Converts a possibly signed decimal integer. The conversion is according to the semantics of the strtol() or strtoll() functions with radix 10. Before the conversion, leading whitespace is discarded. If field width is specified, then not more than that many characters are consumed from the input. The leading whitespace is not counted in the field width but the signum is. If the field width is set to 0, then the conversion will fail. The next argument must be a pointer to an int, or to a char if the 'hh' size modifier was present, or to a short if the 'h' size modifier was present, or to a long if the 'l' or equivalent size modifier was present, or to a long long if the 'll' or equivalent modifier was present.

'i'
Converts a possibly signed integer. The conversion is according to the semantics of the strtol() or strtoll() functions with radix 0, that is, if the input starts with 0x or 0X, then hexadecimal conversion takes place, otherwise if it starts with 0 then octal conversion is performed, otherwise decimal conversion is performed. With regards to the field width and leading whitespace, the behaviour of this conversion is identical to the 'd' conversion. The next argument must be a pointer to an (appropriately sized) int.

'u'
Converts an unsigned decimal integer. The conversion is according to the semantics of the strtoul() or strtoull() functions with radix 10. With regards to the field width and leading whitespace, the behaviour of this conversion is identical to the 'd' conversion. The next argument must be a pointer to an (appropriately sized) unsigned int.

'o'
Converts an unsigned octal integer. The conversion is according to the semantics of the strtoul() or strtoull() functions with radix 8. With regards to the field width and leading whitespace, the behaviour of this conversion is identical to the 'd' conversion. The next argument must be a pointer to an (appropriately sized) unsigned int.

'x' 'X'
Converts an unsigned octal integer. The conversion is according to the semantics of the strtoul() or strtoull() functions with radix 16. With regards to the field width and leading whitespace, the behaviour of this conversion is identical to the 'd' conversion. The next argument must be a pointer to an (appropriately sized) unsigned int.

'p'
This conversion is identical to the 'x' conversion, with two exceptions. First, the size qualifiers are ignored and the next argument must be a pointer to a pointer. Second, if after discarding the leading whitespace the input contains the string (nil), then it is accepted and is interpreted as 0 (i.e. NULL pointer).

'n'
Fetches the next int * argument and stores the number of characters consumed from the input at the given location. The "ll", "l", "h", "hh" or equivalent size modifiers, if present, are respected and the pointer is typecast to a pointer to, respectively, long long, long, short or char when storing the number. This conversion does not consume any input, does not increase the conversion count and can not fail. If assignment supression is also specified, then this conversion is a null operation, consuming neither the input nor the arguments.

'a' 'e' 'f' 'g'
Converts a float-point number. The conversion semantics is the same as that of the strtod() function. Leading whitespace and field width is treated as in the case of the 'd' conversion. The next argument must be a pointer to a float, or if the "l" (or equivalent) size modifier was present, then to a double or if the "ll" (or equivalent) size modifier was present, to a long double. Note that the implementation always performs a string to double conversion and the result is then typecast to float or long double.

'c'
Leading whitespace is not skipped. If the field width is not specified, then it is set to 1. If the field width is explicitely set to 0, then the conversion will succeed without consuming input or producing output. If no 'l' size modifier was specified, then field width number of characters are consumed from the input and stored at the location pointed by the next argument, which must pe a pointer to char. No terminating NUL is added to the string. If the input is exhausted before the given number of characters are processed, then the conversion will stop but otherwise succeed, except if not even one character could be processed.
If the "l" (or equivalent) size modifier was specified, then the input is processed as UTF-8 encoded wide characters and the result is stored at the location pointed by the next argument that must be a pointer to wchar_t. The conversion of the input is according to the semantics of the mbtowc() function. Note that the field length is the number of wide characters and not the number of bytes used from the input. If at any point the UTF-8 to wchar_t conversion signals an error, a conversion mismatch is declared and the function returns.
Note that encountering a 0 on the input does not stop the conversion. Also note that the resulting character or wide character string is not terminated with a 0. If the input gets exhausted at a multibyte sequence boundary, then it is treated the same as without the 'l' modifier, if the input gets exhausted inside a multibyte sequence, then it is considered a conversion error.
Any length modifier other than "l" (or equivalent) is ignored.

's'
Any leading whitespace on the input is skipped. If no "l" length modifier is specified, then the next argument must be a pointer to char, if 'l' was specified, it must be a pointer to wchar_t. Characters or multibyte character sequences are read from the input and the result is stored at the location pointed by the argument pointer. The conversion stops if either field width number of characters or wide characters have been processed or if a whitespace character is read from the input. The whitespace will not be stored and will not be consumed. The stored strin is terminated with a 0 character (or wide character). The terminator charatcer is not counted in the field width and it is omitted if the conversion fails. A conversion with a field width of 0 matches without consuming input, but it still consumes the argument stores the terminating 0 in the result. The exhaustion of the input is treated exactly as with the 'c' conversion. An illegal UTF-8 sequence will cause a conversion failure.
Note that if the 'l' modifier is present, the testing for whitespace is done before the UTF-8 to wide char conversion is done. This means that an overlong multibyte sequence encoding a blank character will not be recognised as whitespace. Also note that that is only an issue if the UTF-8 input violates the rules set forth by RFC3629.

'['
Leading whitespace is not skipped. If field width is not specified, it is set to 1. Field width of 0 matches without consuming input or producing output, but it consumes an argument. Input is read and treated as described with the 'c' conversion. However, the conversion stops if the next input character does not match the set, as described below. The set of characters are the characters between the '[' and a closing ']'. Thus, [abc] is the set of a, b and c. If the ']' character itself must be in the set, it must be the first character of the set (otherwise it is interpreted as the end of set marker. Thus, [][] is a set of '[' and ']'. The set can contain ranges. A range is two characters with '-' between them. [0-9] describes the decimal digits. Therefore, if you want '-' be in your set it must be last character within the set. [0-9.eE+-] is the set of characters that a decimal float point number can possibly contain. The interpretation of the set is inverted if the first character after the '[' is a '^'. The input will match any character not in the set. If you need the ']' in the excluded set, then it must be the character following the '^'. That is, [^]^[-] matches anything but [, ], ^ and -. When the set is processed, characters between the '[' and ']' are treated as bytes, therefore any value between 1 and 255 can be used.
The interpretation of the set is slightly changed if you use the 'l' size modifier. The following restrictions apply:
  • if you use a non-ASCII character in the set, that is, a byte with a value over 127, it will be accepted, but will otherwise be ignored.
  • any multibyte character sequence longer than 1 byte is considered as *not* being in the set, even if the sequence is an overlong one that describes an ASCII character. If you specified the set with inverted interpretation, then multibyte characters longer than 1 byte are accepted without any further checking. This means that if you specify [\x01-\x7f] it will accept any single-byte sequence but reject any longe ones, while specifying [^\x01-\x7f] will reject any single-byte sequence but accept all longer ones. If you use [\x80-\xff] then no character whatsoever will be accepted, because the set is composed of character codes above 127, thus will never match a single-byte UTF-8 sequence and longer UTF-8 sequences are rejected automatically, since the set is a positive one.

Usage example
/*
*   Example implementation of scanf() type functions.
*
*   We assume that you have the following functions defined:
*
*   int UartGetc( void );
*
*       Reads a character from the UART. It returns the character itself.
*       It waits for the character. If an error occurs it returns a
*       negative error code (other than EOF).
*
*   int UartUngetc( int lastchar );
*
*       Returns the character 'lastchar' to the UART's input buffer.
*       The next call to UartGetc() should return that character.
*       This function is called at most once, just before scanf()
*       returns. It should return a non-negative number on success
*       and a negative error code (but not EOF) on failure.
*
*   int getc( FILE *file );
*   int ungetc( int c, FILE *file );
*
*       These are the standard C library functions to get the next
*       character (or EOF) from a file or to push back the last
*       character to a file. In this example we assume that you defined
*       FILE to some type. The functions here do not care what that
*       type actually is.
*
*   We then create the following functions:
*
*       scanf()         - with full float and wide character support
*       fscanf()        - with float but not wide character support
*       vsscanf()       - integer only
*
*   as an example. That should be enough for you to write pretty much
*   any function in the scanf() family.
*
*   Please realise that while the example functions seem long, that's
*   due to the comments. In fact, the longest of them contains only
*   four C statements, and simple ones of that.
*/

#include <stdio.h>          // It's your header, declaring the functions in
                            // this file as well as FILE, getc() and ungetc()

#include <uart.h>           // Your header,  UartGetc() and UartUngetc()

#include <stdio_format.h>   // Library header, declaring scan_???()
#include <wchar.h>          // Library header, wide character support

#include <stdarg.h>         // GCC header, declaring the vararg macros

/*
*   Declaration of our local functions
*/

static int uart_get( void **unused );
static int uart_ung( int c, void **unused );
static int str_get( void **p );
static int str_ung( int c, void **p );

/*
*   Scan from the UART, support everything
*/

int scanf( const char *fmt, ... )
{
va_list arg;        // Variable argument list object
int     ret;        // Return value

    // Get the variable argument list first

    va_start( arg, fmt );

    // Call the library's scanning function, passing it everything and
    // returning with whatever that returned to us.

    ret = scan_flt(     // We need to call _flt() for float support
            fmt,        // First argument is the format string
            arg,        // Next the variable argument list
            uart_get,   // Get a character
            uart_ung,   // Push-back a character
            NULL,       // No parameter for the wrapper
            mbtowci );  // Function to transform UTF-8 to a wide char

    // Finish the vararg list and return the result

    va_end( arg );
    return ret;
}

/*
*   Wrappers around the UART functions
*/

static int uart_get( void **unused )
{
    (void) unused;
    return UartGetc();
}

static int uart_ung( int c, void **unused )
{
    (void) unused;
    return UartUngetc( c );
}

/*
*   Scan from a file, support everything but wide char
*/

int fscanf( FILE *file, const char *format, ... )
{
va_list args;

    // Get the argument list

    va_start( args, fmt );

    // This is an extremely simple case, as everything fits
    // the format function's requirements. The only thing is
    // that we have to cast getc() and ungetc() to the
    // relevant function type and the file pointer to void **.
    // The last parameter is NULL, disabling wide character
    // processing.

    ret = scan_flt( format,
                    args,
                    (BLC_FMT_GET) getc,
                    (BLC_FMT_UNG) ungetc,
                    (void **) file,
                    NULL );

    // Finish the vararg list and return the result

    va_end( arg );
    return ret;
}

/*
*   Scan from a string, integers only. The vararg list is given to us
*/

int sscanf( char *str, const char *fmt, va_list arg )
{
    // Call the scan core. The get and unget functions are locally
    // defined little functions. Since we don't have to close the
    // vararg list, we can return straight from the scan function.

    return scan_int( fmt, arg, str_get, str_ung, (void **) &str, NULL );
}

/*
*   Read the next character from a string
*/

static int str_get( void **p )
{
char    *s;
int     c

    // We received a pointer to the string pointer. Fetch it.

    s = * (char **) p;

    // Get the next character. We cast the pointer to unsigned to
    // make sure that we never get a negative number.

    c = * (unsigned char *) s++;

    // If the character is 0, that is, the end of the string, we return
    // EOF and do *not* update the string pointer. Otherwise, we update
    // the string pointer (that is, we consume the character) and return
    // whatever we fetched.

    if ( c )

        * (char **) p = s;
    else
        c = EOF;

    return c;
}

static int str_ung( int c, void **p )
{
    // If the character to return is EOF, we do nothing.
    // Otherwise we simply decrement the current string
    // pointer. In either case we report success.

    if ( c != EOF ) (* (char **) p)--;
    return 0;
}

int scan_int ( const char *  fmt,
va_list  arg,
BLC_FMT_GET  get,
BLC_FMT_UNG  ung,
void **  par,
int(*)(wchar_t **, const char *, size_t)  m2w 
)

Processing formatted input core without floating point conversions.

This function has the exact same semantics and behaviour as the scnan_flt() function, with the following exceptions:

  • The 'a', 'e', 'f', and 'g' specifiers are not recognised. If such specifiers occur in the format string, they will trigger a format error and the function will stop.
  • The l size qualifier for 'c', 's' and '[' conversions is ignored and the passed wide character pointer is treated as a normal char pointer.

Due to the omission of the floating point, this function is much smaller than using scan_flt(). For details on the format string and usage examples see scan_flt().

Parameters:
fmt The format string.
arg The argument list (after the format string). Before you call this function, you have to call va_start() and pass the resulting va_list object in this argument.
get The function that reads the next character from the input.
ung The function to call when a character must be pushed back.
par This is the parameter that is passed to get and ung.
m2w An unused parameter. It is only here to make the prototypes of scan_flt() and scan_int() identical. It's best if you specify NULL here so that you don't have to care about type mismatches.
Returns:
The number of specifiers successfully matched. The n specifier is not included in the count. EOF is returned if the input was exhausted or there was a character mismatch before the first conversion, 0 is returned if the first conversion fails and a negative error code, other than EOF, is returned if the get function reported a read error from the input.
Generated on Tue Jul 13 16:51:45 2010 by  doxygen 1.6.3