www.icosaedro.it

Modular programming in C

Last updated: 2017-09-13.
2017-09-13 Added examples from the ACM Flight Sim program.
2017-03-25 Removed useless "struct module1_Type;" declaration in opaque type.
2017-03-12 Opaque data types revised. Added references to some useful script: make-makefile, check-included, create-c-module. Added references to some concrete modules of a real examples.

This paper explains how C programs can be structured by modules.

What modules are
Module interface

Header file: Constant declarations
Header file: Type declarations
Header file: Global variables
Header file: Function prototypes

Module implementation
Main program
Makefile
Modules dependent from other modules
Final suggestions
Tools and examples

What modules are

Modularization is a method to organize large programs in smaller parts, i.e. the modules. Every module has a well defined interface toward client modules that specifies how "services" provided by this module are made available. Moreover, every module has an implementation part that hides the code and any other private implementation detail the clients modules should not care of.

Layout of the source three. Dotted boxes are files generated by the compiler, while arrows indicate files involved in their generation.

Modularization has several benefits, especially on large and complex programs:

modules can be re-used in several projects;
changing the implementation details of a module does not require to modify the clients using them as far as the interface does not change;
faster re-compilation, as only the modules that have been modified are actually re-compiled;
self-documenting, as the interface specifies all that we need to know to use the module;
easier debugging, as modules dependencies are clearly specified and every module can be tested separately.

Programming by modules using the C language means splitting every source code into an header file module1.h that specifies how that module talks to the clients, and a corresponding implementation source file module1.c where all the code and the details are hidden. The header contains only declarations of constants, types, global variables and function prototypes that client programs are allowed to see and to use. Every other private item internal to the module must stay inside the code file as they are implementation details clients do not need to know. We will now describe in detail the general structure of the interface and the implementation files.

Module interface

Every interface file should start with a brief description of its purpose, author, copyright statement, version and how to check for further updates. All these information are simply C comments or Doxygen DocBlocks.

Proper C declarations must be enclosed between C preprocessor directives that prevent the same declarations from being parsed twice in the same compilation run. Here is the skeleton of our module1.h interface file using Doxygen DocBlocks:

/**
 * Skeleton example of a C module. Illustrates the general structure of a
 * module's interface.
 * @copyright 2008 by icosaedro.it di Umberto Salsi
 * @license as you wish
 * @author Umberto Salsi <salsi@icosaedro.it>
 * @version 2008-04-23
 * @file
 */

#ifndef module1_H
#define module1_H

/*
 * System headers required by the following declarations
 * (the implementation will import its specific dependencies):
 */
#include <stdlib.h>
#include <math.h>

/*
 * Application specific headers required by the following declarations
 * (the implementation will import its specific dependencies):
 */
#include "module2.h"
#include "module3.h"

/* Set EXTERN macro: */
#ifdef module1_IMPORT
    #define EXTERN
#else
    #define EXTERN extern
#endif

/* Constants declarations here. */

/* Types declarations here. */

/* Global variables declarations here. */

/* Function prototypes here. */

#undef module1_IMPORT
#undef EXTERN
#endif

As a general rule, to prevent collisions in the global space of names, every public identifier must start with the name of the module, then an underscore, and then the actual name of the item.

The purpose of the module1_IMPORT and the EXTERN macros is to allow the same definition file to be included by client modules AND the implementation of the module, so that global public variables can be declared only once, and the compiler can check if the function prototypes do really match their implementation.

And here is the trick. The implementation file module1.c will define the macro module1_IMPORT just before including its own header file; in this way the EXTERN macro is left empty and all the public variables and public functions will result properly defined: variables will be allocated by the compiler in the text section of the generated object file; function prototypes will be checked against their implementation.

Client modules, instead, do not define the module1_IMPORT macro, then the compiler will see only external variables and external functions the linker will have to resolve.

Header file: Constant declarations

Constants can be both simple macros or enumerative values. Enumeratives are more suited to define also a new type and are discussed below along the type declarations. Usually constants are simple int or double numbers, but also float and literal strings are allowed.

/* module1.h -- Constants declarations */

#define module1_MAX_BUF_LEN (4*1024)

#define module1_RED_MASK    0xff0000
#define module1_GREEN_MASK  0x00ff00
#define module1_BLUE_MASK   0x0000ff

#define module1_ERROR_FLAG    (1<<0)
#define module1_WARNING_FLAG  (1<<1)
#define module1_NOTICE_FLAG   (1<<2)

Header file: Type declarations

This section of the header file contains enumerative declarations, data structure declarations, explicit type declarations and opaque type declarations. Enumeratives are suitable to declare several constants. struct declarations are suitable to declare data structures whose internal details are exposed to client modules.

To enforce the encapsulation of the implementation details, an opaque data type can be declared instead of an explicit data type. Opaque data types are types whose internal details are hidden to the client modules; their actual internal structure is fully declared only in the implementation module, so that client modules cannot access their internal details. This opaque declaration follows this general pattern for the .h and the .c files respectively:

Opaque data type
module.h	module.c
typedef struct module_Type module_Type;	struct module_Type { int field1; int field2; ... };

Note that two identifiers are defined: one module_Type is an opaque struct, and the other module_Type is a type derived from this struct type. There is no conflict between these two types because they belong to two different symbol tables inside the C compiler.

The drawback of the opaque types is that clients modules cannot dynamically allocate opaque data structures, nor they can declare arrays or struct fields of such types because their size is known only inside their own implementation; only pointers to such opaque types are allowed:

Clients can't do this:	...but can use pointers:
module1_Type elems[100]; /* ERR / struct AnotherType { int field1; int field2; module1_Type field3; / ERR */ };	module1_Type elems[100]; / ok / struct AnotherType { int field1; int field2; module1_Type field3; /* ok */ };

Clients can't do this:

...but can use pointers:

module1_Type elems[100];  /* ERR */

struct AnotherType {
	int           field1;
	int           field2;
	module1_Type  field3;  /* ERR */
};

module1_Type *elems[100];  /* ok */

struct AnotherType {
	int           field1;
	int           field2;
	module1_Type *field3;  /* ok */
};

Since client modules can deal only with pointers to opaque types, the implementation must then provide every allocation and initialization routine that may be required, whose typical name follows the scheme module_type_alloc() and module_type_free() respectively.

/* module1.h -- Types declarations */

enum module1_Direction {
    module1_NORTH,
    module1_EAST,
    module1_SOUTH,
    module1_WEST
};

/**
 * Explicit type declaration example.
 */
typedef struct module1_Node
{
    struct module1_Node *left, *right;
    char * key;
} module1_Node;

/**
 * Alternative opaque declaration of the node above.
 */
typedef struct module1_Node module1_Node;

Header file: Global variables

It is a good rule to avoid public global variables at all. But if you really need them, here is the recipe to deal with their declaration and initialization. The module1_IMPORT macro is required in order to allocate the variable in the "text" section of the code module. Without this macro every client module would allocate its own copy if the variable, which is not what we expect.

/* module1.h -- Global variables declarations */

EXTERN int module1_counter
#ifdef module1_IMPORT
    = -1
#endif
;

EXTERN module1_Node *module1_root;

The preprocessor code protects the initial value from being evaluated by client modules, so that the variables are allocated in the code module and here initialized. Client modules will only see an external variable of some type.

Note that global variables are always initialized to zero and pointers are set to NULL, which typically is just the initial safe value programs expect, so often assigning an initial value is not needed.

Header file: Function prototypes

All the functions that need to be accessible from client modules must be declared with a prototype. Remember that functions without arguments must have a dummy void formal argument, otherwise the compiler would complain with a quite misleading error message telling the prototype is missing when it looks to be right there!

/* module1.h -- Function prototypes */

/**
 * Initializes this module. Should be called in main() once for all.
 */
EXTERN void module1_initialization(void);

/**
 * Releases internal data structures. Should be called in main()
 * before ending the program.
 */
EXTERN void module1_termination(void);

/**
 * Add a node to the root three.
 * @param key Value to add to the tree.
 * @return Allocated node.
 */
EXTERN module1_Node * module1_add(char * key);

/**
 * Releases node from memory.
 * @param n Node to release.
 */
EXTERN void module1_free(module1_Node * n);

Module implementation

The implementation module module1.c should include the required headers, then it should define the module1_IMPORT macro before including its own header file. By including its own header the compiler grabs all the constants, types and variables it requires. Moreover, by including its own header file the code file allocates and initialize the global variables declared in the header. Another useful side effect of including the header is that prototypes are checked against the actual functions, so that for example if you forgot some argument in the prototype, or if you changed the code missing to update the header, then the compiler will detect the mismatch with a proper error message.

Macros, constants and types declared inside a code file cannot be exported, as them are implicitly always "private".

Global variables for internal use must have the static keyword in order to make them "private".

Remember also to declare as static all the functions that are private to the code module. The static keyword tells to the compiler that these functions are not available for linking, and then them will not be visible anymore once the code file has been compiled in its own module1.o object file.

Since all the private items are not exported, there is not need to prepend the module name module1_ to their name, as them cannot collide with external items. Private items are still available to the debugger, anyway.

/* module1.c -- See module1.h for copyright and info */

/* Import system headers and application specific headers: */
#include <malloc.h>
#include <string.h>
#include "module4.h"
#include "module5.h"

/* Including my own header for checking by compiler: */
#define module1_IMPORT
#include "module1.h"

/* Private macros and constants: */

/* Private types: */

/* Actual declaration of the private opaque struct: */
struct module1_Node
{
    struct module1_Node *left, *right;
    char * key;
};

/* Private global variables: */
static module1_Node * spare_nodes = NULL;
static int allocated_total_size = 0;

/* Private functions: */
static module1_Node * alloc_Node(void){ ... }
static void free_Node(module1_Node * p){ ... }

/* Implementation of the public functions: */
void module1_initialization(void){ ... }
void module1_termination(void){ ... }
module1_Node * module1_add(char * key){ ... }
void module1_free(void){ ... }

Note that public functions are left by last, since usually them need some private function; moreover, since public functions already have their prototype, public functions can be called everywhere in the code above them.

The code file should never need to declare function prototypes, the only exception being recursive functions.

Main program

The name of our project will be program_name and its source file is program_name.c. This source file is the only one that does not require an header file, as it contains just the public function main() which does not need a prototype. The main source includes and initializes all the required modules, and finally terminates them once the program is finished. The general structure of the main program source file is as follows:

/**
 * Our sample program.
 * @copyright 2008 by icosaedro.it di Umberto Salsi
 * @license as you wish
 * @author Umberto Salsi <salsi@icosaedro.it>
 * @version 2008-04-23
 * @file
 */

/* Include standard headers: */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

/* Include modules header we directly invoke here: */
#include "module1.h"
#include "module2.h"

int main(int argc, char **argv)
{
    /* Initialize modules: */
    module1_initialization();
    module2_initialization();

    /* Perform our job. */

    /* Properly terminate the modules, if required: */
    module2_termination();
    module1_termination();

    return 0;
}

Makefile

Compiling, linking and other common ordinary tasks are usually delegated to a Makefile, the configuration file for the make command. make already has default rules that tell how to build object files *.o out from their source file *.c, but unfortunately it is not aware of our modular structure of the source. To deal with our modules we have to tell to make that also *.h header files have to be added to its dependencies rules. This will require an explicit rule, as we can't rely on the default one. Moreover the main program, the only one that has not an header file, must to be compiled with another rule and it may require also to specify some external library to linking with. Finally, this is the resulting Makefile skeleton:

# Makefile

# Compiler flags: all warnings + debugger meta-data
CFLAGS = -Wall -g

# External libraries: only math in this example
LIBS = -lm

# Pre-defined macros for conditional compilation
DEFS = -DDEBUG_FLAG -DEXPERIMENTAL=0

# The final executable program file, i.e. name of our program
BIN = program_name

# Object files the $BIN depends on
OBJS = module1.o module2.o module3.o

# This default rule compiles the executable program
$(BIN): $(OBJS) $(BIN).c
	$(CC) $(CFLAGS) $(DEFS) $(LIBS) $(OBJS) $(BIN).c -o $(BIN)

# This rule compiles each module into its object file
%.o: %.c %.h
	$(CC) -c $(CFLAGS) $(DEFS) $< -o $@

clean:
	rm -f *~ *.o $(BIN)

depend:
	makedepend -Y -- $(CFLAGS) $(DEFS) -- *.c

With this Makefile, compiling the source becomes as simple as issuing the make command alone, no arguments are required. Other tags may also be present, for example make clean, make dist and so on. The last tag make depend will be the subject of the next paragraph.

Modules dependent from other modules

The source three we considered till now is very simple, with a main program that depends from several, independent modules. The %.o rule takes care to update every *.o file if any module source gets modified, while the $(BIN) rule re-compiles and re-link the main program if its source or any of the modules gets modified.

But what if some module depends from some other sub-module, either including it in its header or in its code file? And what if modules, besides contributing to the main program, are also mutually dependent? The figure below schematically illustrates a situation in which module1.h/.c requires a sub-module module4.h, and module2 requires module3.

A more complex source layout, where module 1 (either in its .h or .c file) requires module 4, and module 2 (either in its .h or .c) requires module 3. If not properly directed, our Makefile in its basic form fails to detect these dependencies, and sources are not re-compiled as required.

The make command does not parse the content of the files and it is not aware of these new dependencies. So if we modify module4 it will omit to re-compile module1, and if we modify module3 it will omit to re-compile module2. We can fix simply adding specific rules that handle these dependencies, but we have also to remember to update these rules according to any change in our source three layout.

The special tag make depend can do all that boring work for us, as it builds automatically all the dependencies between the source files, and appends them to the Makefile itself. Issuing make depend, in fact, the Makefile gets changed with these new lines:

---- The Makefile as above, but remember to add ----
---- module4.o to the list of the object files. ----

# DO NOT DELETE

module1.o: module4.h
module2.o: module3.h
program_name.o: module1.h module2.h module3.h module4.h

These rules complete the %.o rule we wrote by hand. The last rule reports the file program_name.o we do not generate, and it is ignored in the context of our Makefile. So, for example, modifying module4.h and issuing the make command, the rule %.o causes the re-compilation of module4.c, the rule module1.o added by makedepend combined with the rule %.o causes the re-compilation of module1.c, and finally the rule $(BIN) produces the updated executable program program_name.

Summarizing, after every change to the layout of the source three it is safe to update the Makefile issuing the command "make depend", and then we can use the command "make" as usual to generate the executable program.

Final suggestions

The GNU GCC compiler has a -Wall flag that enables all the possible warning messages. I always use this flag because it helps to write clean code, and it saves from many obscure mistakes that would be difficult to detect otherwise.

You may use the nm command to check if some internal item (variable or function) escaped from our modularization. This command displays all the symbols available in the object file, either available to the linker or to the debugger. For every symbol this command prints also a letter that marks its status and its availability. Public items (i.e. those that the object file make available to the client modules) are marked by an uppercase letter B D T etc. while local symbols have lowercase letters b d t etc.:

$ nm module1.o
00000000 t alloc_Node
0000000c b allocated_total_size
00000014 T module1_add
00000000 D module1_counter
0000001e T module1_free
0000000f T module1_init
00000004 B module1_root
0000000a t free_Node
00000008 b spare_nodes

A simple grep allows to immediately detect variables and items actually exported by modules:

$ nm module1.o | grep " [A-Z] "
00000014 T module1_add
00000000 D module1_counter
0000001e T module1_free
0000000f T module1_init
00000000 B module1_root

We can improve this shell command writing an useful tool that displays all the private identifiers erroneously exported by each code module:

#!/bin/sh
# Usage: c-detect-private-exported *.o
echo "Detecting private items exported by object files:"
while [ $# -gt 0 ]; do
    base=`basename $1 .o`
    nm $1 | grep " [A-Z] " | cut -d " " -f3 |
    while read id; do
        grep -q -w $id $base.h || echo "    $id"
    done
    shift
done

This script accepts a list of .o files and displays all the identifiers exported that are not declared in the corresponding .h file: these symbols can then be readily added to their proper include file.

Tools and examples

check-included is a Bash shell script that checks all the included modules are actually used in the source. Type check-included -h for help.
create-c-module is a Bash shell script that generates the skeleton of a new module as described here. Type create-c-module -h for help.
make-makefile, or when 'make depend' is not flexible enough. It is a Bash shell script that generates the Makefile for all the C source programs in the current directory by scanning recursively header files and C sources. Type make-makefile -h for help.
Concrete examples of modules. The Makefile has been created by the make-makefile script above. The names of the modules are mostly self explanatory, but the most notable ones for what it cares to our discussion are:
- array: dynamic array of pointers; several specific implementations are also built over this module: bufarray, pairarray, ustringarray.
- buf: dynamic buffer of bytes.
- encoder: stream text translator from an encoding to another.
- hashtable: a collection of key/value pairs, also known as "dictionary" or "map".
- memory: memory allocator with leaks detector, dispose function with modules' specific destructor call, and modules' cleanup function.
- mime: email MIME format encoder and decoder.
- sparsearray: sparse array of pointers.
- terminal: basic terminal management with support for locale encoding.
- ustring: Unicode strings as dynamic buffers of characters.
Even more concrete examples of modules. The Makefile has been created by the make-makefile script above; a specific Makefile-include.txt has also been added to support compilation of the sources under Linux and under Windows with the MinGW development kit, so the following modules are for both Linux and Windows:
- audio: allow to load and control the execution of a WAV audio file.
- gui: simple abstraction of the underlying windowing system that allows to mange keyboards, mouse and window events; the only graphical primitive provided allows to draw a line because this was the only requirement of the program this module belongs to, but adding more features is easy.
- prng: pseudo-random number generator with easier and safer interface to prevent common mistakes.
- timer: simple real-time timer you can start, stop, restart and reset.
- varray: implements a collection of objects that can be allocated and retrieved by univocal handles (or "versioned entries") rather than by pointers to detect or prevent access to already released objects.
- wav: WAV PCM file reader.
- zulu: "zulu" UTC time and date conversion routines.

Umberto Salsi

Comments

Contact

Site map

Home/

An abstract of the latest comments from the visitors of this page follows. Please, use the Comments link above to read all the messages or to add your contribute.

2023-09-25 by Umberto Salsi
Re: Bad HTML generated in CVS view
Guest wrote: [...] In fact the function deObfuscateLinks() from http://www.icosaedro.it/deObfuscateLinks.js is (should be...) executed to de-obfuscate all those links. Maybe you either disabled JS or some filter in your browser prevents the execution of "external" JS.[more...]

2023-09-22 by Guest
Bad HTML generated in CVS view
I was looking at the CVS repository view for the C-modules source tree, and could not click on any files, despite having a note pop up saying "view source" when I hovered my mouse over the file version numbers. Inspecting the selection source for one row of the sources, I see html anchors generated like "<a href2=...". As far as I know, "href2" is an invalid attribute. I suspect some kind of automatic rewrite script has gone wrong.[more...]

2021-01-20 by Umberto Salsi
Re: Congratulations!
Pedro Rondón wrote: [...] I don't know about a "standard" way to organize the directory of a C project, but there are several examples in my repository that (more or less) follow this scheme: An "overall" directory which contains all: - The "public" directory containing stuff meant to be published or released to the customer (more on that next). - The "doc" directory containing private documents or documents that are already available elsewhere (RFC, specifications, etc.). - Several "experimental" directories containing chunks of test code and POC. - The "releases" directory containing previously released source code packages and executable code packages. The "public" directory is the actual project to be released. It could contains: - A "doc" directory containing documents specific of the project meant to be included along with the released packages (design, installation, user manual, maintenance, ...). - A "lib" directory containing external libraries. - A "src" directory containing [...][more...]

2021-01-19 by Pedro Rondón
Congratulations!
simply, straightforward, good explanation, do you have a resource to learn the best way of how to organize a C project folder structure in a "standard" or "typical" way how good C programmers suggest, I don't know if the correct word is architecture, maybe?[more...]

2019-08-23 by Guest
Thanks
Thanks for the explanation, it is very useful to me![more...]