Department of Electrical and Computer Engineering
Dalhousie University

C Programming Primer


From: Introduction to Data Communications: A Practical Approach by Larry Hughes (Jones and Bartlett, 1997)

Introduction

This document offers a brief overview of C for those readers unfamiliar with the language. The description of C covers only those language constructs used by Commkit; additional details can be found in any number of books written on C or in the Borland C manuals.

Comments

A comment begins with /* and ends with */. Everything within the comment is ignored by the compiler, including any code or data structures. Comments cannot be nested but can span multiple lines.

C++ comments are denoted by a pair of slashes, //. Everything from the slashes to the end-of-line is taken as a comment.

It is very poor programming practice to comment out sections of code. If it is necessary to avoid compiling sections of code, use #ifdef (see below, Compiler Directives).

Base Types

C supports three base types from which all other structures can be derived: integers, characters, and floating point. Only integers and characters are considered in this document, since floating point is seldom used in writing kernels and device drivers. Any character can be used as either a character or an integer, depending upon the context.

Unnamed Constants

Commkit uses five different unnamed constants:

Identifiers and Variables

An identifier is any collection of alpha-numeric characters that starts with an alphabetic character. Spaces, tabs, and carriage returns are not allowed within an identifier; however, underscores '_' are allowed.

Variables

A variable is simply an identifier declared to be of a specific type. 'Traditionally' all C variables are written in lower-case. A declaration is written as the type followed by one or more variable names (separated by commas). The declaration is terminated by a semicolon.

Integer variables are declared as either a short, int, or long. The number of bits associated with each type depends upon the underlying architecture and the compiler; for example, a short is typically 16 bits, while an int is often 16 or 32 bits, and a long is 32 or 64 bits.

By default, all variables are signed, however, the prefix unsigned allows the declaration of unsigned variables:

int alpha;
unsigned long beta;
short gamma, delta;

Character variables hold one byte (8 bits) and can be used for either characters or as 8-bit integers. Characters are declared as type char; by default, characters are signed, although they can be explicitly declared unsigned:

char ch, data;
unsigned char subscript;

Initialization

Variables can be initialized when they are declared, for example:

char ch = 'X';
int data = 3;

Variables that are not initialized have undefined values until an assignment takes place (see below).

Reserved Words

The following reserved words cannot be used as identifiers (and hence variables):

autobreakcasecharcontinuedefault do
doubleelseexternfloatforgoto if
intlongregisterreturnshortsizeof static
structswitchtypedefunionunsignedvoid while

The Borland C User's Guide lists an additional set of reserved words used by Borland C.

Expressions

C supports a number of different expressions and operators:

The Assignment Statement

The assignment statement is defined as a left-value (lvalue) being assigned the result of a right-value (rvalue). It is written as lvalue = rvalue. The lvalue is always a memory location and the rvalue an expression. Unless otherwise indicated, the statement is terminated with a semicolon (;). C supports little or no checking when dealing with variables of the base type, for example:

int a;
char b;
  
a = 'X';       /* Assigning a character to an integer */
b = a + 1;     /* Storing an integer into a character variable */

Multiple assignments are allowed:

a = b = c = 10;

Beware of seemingly innocent typos such as:

a = b == c = 10;

In which c is assigned the value 10, then b is compared with c and the result of the comparison (0 or 1) is assigned to a.

Variations

C offers a number of shorthand notations for the assignment statement:

Selection

C supports two selection statements, one conditional and the other a multi-way branch.

Compound Statements

Compound statements are groups of zero or more statements enclosed in braces (i.e., {...}); note that all statements must be ended with a semicolon:

{
     Statement1;
     /* More statements */
     StatementN;
}

The compound statement is not ended with a semicolon.

The if Statement

The if statement is written as:

if (Expression)
     Statement1;
else
     Statement2;

The Expression (see above) is evaluated; a non-zero result causes Statement1 to be executed, otherwise Statement2 is executed. If the else Statement2 construct is omitted, the result is an if-then statement. Note that Statement1 and Statement2 can both be compound statements (remember that compound statements cannot be followed by a semicolon).

The following code fragment illustrates an if statement: should a equal 'X' or c be less than 2, data is assigned the value five; otherwise data is cleared and a is assigned 'Z'.

if (a == 'X' || c < 2)
     data = 5;
else
{
     /* a != 'X' and c >= 2 */
     data = 0;
     a = 'Z';
}

The switch Statement

The multiway branch is known as the switch statement, it is normally written in the following form:

switch(Expression)
{
case Constant:
     Statement(s);
     break;
 
case Constant:
     Statement(s);
     break;
 
/* Other statements */
 
default:
     Statement(s);
}

The Expression is evaluated to an integer value; control is passed to the case label (a constant) which matches the value of the Expression. The Statement(s) following the label are then executed. If a section of code is to be associated with a number of different values of the Expression, each Constant must be associated with its own case label; for example:

switch (ch)
{
case 'A':
case 'a':
     /* Statements */
     break;

case 'B':
case 'b':
     /* Statements */
     break;
 
/* Other 'case' labels and statements */
}

Once the set of statements associated with the Expression has been evaluated, control can be passed outside of the switch statement using the break statement. It is possible to branch into the middle of a series of statements simply by placing the case label above the first statement associated with the case label:

switch(ch)
{
case 'A':
     ch = 'a';
case 'a':
     /* Statements */
     break;
 
     /* Other 'case' labels and statements */
}

If the value of the Expression does not match any of the case labels, control passes to the statements that follow the label default:. If there is no default, control passes to the first statement following the closing brace of the switch.

Iteration

C supports three structured iteration statements as well as a goto statement.

The while Statement

The while statement is a pre-test, non-deterministic loop structure, written in the form:

while (Expression)
     Statement;

The Expression is evaluated, if it is non-zero, the Statement is executed. The cycle is repeated as long as the result of the expression is non-zero. The Statement can be a compound statement. For example,

count = 0;
while (count < 10)
{
   /* Other statements */
   count++;
}

Oftentimes the loop can proceed backwards, producing some interesting software:

count = 10;
while (count--)
{
     /* Statements */
}

The loop will be entered with the final value of count being zero; the next iteration will determine that count has a zero value and the loop will terminate.

An infinite loop can be written by setting the Expression to 1, that is: while(1).

The do..while Statement

The do..while statement is a post-test, non-deterministic loop, written in the form:

do
     Statement;
while (Expression);

The Statement (which can be a compound statement) is executed before the Expression is evaluated. The cycle continues as long as the Expression produces a non-zero result. Multiple statements must be written as a compound statement.

The for Statement

The for statement allows the construction of deterministic loops (i.e., loops with a known initial condition, final condition, and increment). The format of the for statement is as follows:

for (Expression1; Expression2; Expression3)
     Statement;

where Expression1 is the initial condition (typically an assignment), Expression2 is the termination condition, and Expression3 is the increment. For example, to count from 0 to 10, a for loop could be written as:

for (i=0; i<=10; i++)
{
     /* Statements */
}

Note that the for loop is equivalent to:

expression1;
while (expression2)
{
     /* Statements */
     expression3;
}

Finally, any or all of the expressions may be omitted. For example, the following set of statements are performed 'forever':

for(;;)
{
     /* Statements */
}

The goto Statement

An unconditional transfer of control can be achieved using the goto statement. The goto statement is written with an identifier (a label), for example:

goto done;

The label must be within the same function (see below) as the goto, is terminated with a colon (not a semicolon) and can branch forward or backward over any number of nested loops:

while(1)
{
     /* Statements */
     if (data == 'X') goto done;
     /* Statements */
}
done:
/* Statements */

Note that goto is different from break in that goto can branch to anywhere within a function. However, break is more structured since control passes to the first statement beyond the end of the block in which the break is written. The continue statement passes control to the end of the block in which the continue is written.

For example, if the statement goto done were replaced by break, execution would resume with the first statement outside the while loop. However, if continue replaced goto done, the statements between the continue and the closing brace would be ignored, with execution resuming at the start of the loop (i.e., the while).

Aggregate Data Types

C allows complex data types (notably arrays, structures, and unions) to be constructed out of the three base types. Additionally, pointers to the base types or aggregate types can be constructed.

Arrays

An array is a data structure consisting of one of more elements sharing a common type and name (an identifier). Arrays are declared by specifying the array's type, its name, and dimension. For example, to declare an array of 10 integers, the following declaration could be used:

int data[10];

An individual element in the array is accessed using a subscript enclosed in square brackets. Subscripts are integers (or characters) and must be in the range 0 through N-1 (where N is the size of the array). For example, the array data could be set to zero using a for loop:

for (i=0; i<10; i++)
     data[i] = 0;

Arrays can also be initialized when they are declared:

int data[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};

There is no pre-defined limit to the number of dimensions that an array can have (however, each compiler will have its own limitations). A multi-dimensioned array is written as follows:

type name[dim1][dim2]...[dimN]

where 'type' is the type of the individual elements in the array 'name' and 'dim1' through 'dimN' are the sizes (i.e., number of elements) in each dimension.

For example to define a 5-by-10 array of integers, one could write:

int data[5][10];

A string is simply an array of chars. For example:

char name[10];

Text strings cannot be assigned directly to string variables in an assignment statement, although individual characters may be assigned to each array element. However, C has many string manipulation routines that can be used to access, compare, and manipulate strings.

Strings can be initialized at compile time in much the same way integer arrays are handled (note that the braces are omitted):

char name[10] = "Your name";

In the above example, the array name is assigned the nine characters of the string "Your name". A tenth character (the null character) is added at the end of the string. To avoid counting each character in a string, C allows a shorthand notation for character string initialization:

char name[] = "Your name";

Structures

Separate data structures that are related can be placed in a single, larger data structure known as a struct. The basic format of a structure is:

struct
{
field(s);
}

The structure consists of one or more field(s); where a field is a data structure declaration. For example, a person's birthday consisting of a day, month, and year are all related items that can be grouped into a struct:

struct
{
int day;
int month;
int year;
}

A structure can be used to declare a new data structure or a new data type (or both); the above example is incorrect in that the structure has not declared a new data structure or a new data type.

Data structures are declared with the name of the data structure following the closing brace. For example, a data structure, my_birthday, with the fields day, month, and year, could be declared as follows (note that the structure ends with a semicolon):

struct
{
int day;
int month;
int year;
} my_birthday;

To declare a new data type, the name of the new type is entered after the word struct and before the {, for example (note that the structure must be terminated with a semicolon after the closing bracket):

struct birthday
{
int day;
int month;
int year;
};

Structures can be declared within other structures.

The rules for declaring a data structure of type struct are the same as any other declaration: the name of the type (for example, struct birthday) must be followed by one or more identifiers, separated by commas and terminated with a semicolon:

struct birthday bills_birthday;
struct birthday the_cats_birthday;

The individual fields within the structure are accessed by specifying the name of the structure (i.e., the identifier), followed by a '.', followed by the name of the field (note, this can be recursive if structures within structures are declared). Structures can also be initialized at compile time. For example:

struct birthday bills_birthday = {18, 5, 1978};
struct birthday the_cats_birthday;
  
the_cats_birthday . day   = 1;
the_cats_birthday . month = 4;
the_cats_birthday . year  = 1990;

The individual fields within the structure can be manipulated based upon their type.

Structures can be declared as arrays and accessed using subscripts:

struct birthday cat_family[5];
int i;

for (i=0; i<5; i++)
{
     cat_family[i] . day   = 0;
     cat_family[i] . month = 0;
     cat_family[i] . year  = 0;
}

Unions

Data structures can share the same memory locations using a union. A union is declared and accessed in the same way as a structure, with the difference being that each field entry in a union refers to the same memory location.

For example, the following union declaration allows a 32-bit location to be accessed as four bytes, two words, or one long word:

union memloc
{
char byte[4];
int word[2];
long double_word;
};
  
union memloc x;

The variable x refers to a single 32-bit location, and can be visualized as follows:

x.byte[0]x.byte[1]x.byte[2]x.byte[3]
x.word[0] x.word[1]
x.double

Pointers

All data structures are associated with an address. C allows the program to access a data structure through the name of the data structure or its address. The address of a data structure is obtained by placing an '&' before the name of the data structure. For example, the address of an integer x can be obtained by writing & before the x.

A pointer is declared as a pointer to a specific type. For example, a pointer to an integer is declared as:

int *ptr;

Pointers are assigned values (usually addresses, although not a necessity due to C's lax type checking) using an assignment statement. To refer to the location indicated by the pointer requires placing an '*' in front of the pointer's name.

A typical, contrived, example of how a pointer functions is as follows:

int *ptr;       /* A pointer to an integer */
int data, ans;  /* Two integers */
  
ptr = &data;    /* 'ptr' now contains the address of 'data' */
*ptr = 7;       /* 'data' now has a value of 7 */
ans = *ptr;     /* 'ans' takes the value of the location */
                /*  pointed to by 'ptr' (i.e. 7)         */

Pointers can point to array elements, as long as the types agree, for example:

char *cptr;        /* A pointer to a character */
char array[10];    /* A string of 10 characters */
  
cptr = &array[2];  /* 'cptr' points to the 3rd element in 'array' */
*cptr = 'S';       /* 'array[2]' now contains 'S' */

Pointers can be incremented and decremented. For example, to initialize array to '?', one could write:

cptr = &array[0];  /* or simply 'cptr = &array' */
i = 0;
while (i < 10)
{
     *cptr++ = '?'; /* Assign '?' then increment 'cptr' */
     i++;
}

Pointers can also point to structures and unions. A structure (or union) pointer is declared to be a pointer to the specific structure. When referring to a field within the structure, the pointer name is followed by ->, and finally the field name:

struct birthday bills;  
struct birthday *guess; /* Pointer to struct 'birthday' */

bills . day = 18;
bills . month = 5;
bills . year = 1978;

guess = &bills;             /* Address of struct 'bills' */
guess -> day -= 5;          /* Decrement 'day' by 5 */
guess -> month = 2;         /* Change 'month' to 2 */
 
if (guess -> year > 1970)
     guess -> year = 1963;  /* Change year to 1963 */
 
/* 'bills' now contains 13 (day), 2 (month), 1963 (year) */

Functions

A C program consists of one or more functions. All functions have the same format, notably:

Result-Type Function-Name(Parameter-Declarations)
{
     Function-Body
}

The Result-Type can be any type: a base type (int, char, long, unsigned, etc.); or an aggregate type. However, if an aggregate type is being returned, it should be returned as an address since the function returns at most a 16 or 32-bit integer value. A Result-Type of void indicates that nothing is to be returned, meaning that the function is essentially a procedure. The Function-Name is a valid identifier name, while the Function-Body is enclosed in braces: {...}. The Function-Body consists of (local) variable declarations as well as executable statements.

If the Result-Type is omitted, the function is assumed to return an integer. The Parameter-Declarations are optional: if they are omitted, the parenthesis must follow the Function-Name. For example, the function ex1() is an integer function:

ex1()
{
/* Statements */
}

All parameters are considered local to the function and when listed, must be separated by commas. For example, the following function is of type int, with three parameters (arg1 is an integer, arg2 is a character, and arg3 is a pointer to an integer):

int example(int arg1, char arg2, int *arg3);
{
/* Statements */
}

A value can be returned from a function using the return statement. For example, the following function returns the larger of two integers:

int largest(int data1, data2)
{
return (data1 > data2) ? data1 : data2;
}

A function is called by writing the Function-Name, followed by the arguments associated with the function. For example, to find the largest of two numbers num1 and num2, one could write:

answer = largest(num1, num2);

It is possible to ignore the return value by casting the function to void:

(void) largest(num1, num2);

Unless otherwise specified, all parameters are call by value; meaning that whatever changes take place to the parameter in the function, the corresponding argument remains unchanged. Should it be necessary to have the function change the value of the argument, C allows the arguments to be passed by reference.

A call by reference parameter requires the address of the data structure to be the argument; the corresponding parameter in the Parameter-Declarations must be a pointer to the specified type. Structures must be passed by reference. Consider the following example:

void ex2(struct birthday *bptr, int *iptr)
{
bptr -> day = 26;
bptr -> month = 8;
bptr -> year = 1964;
*iptr = 123;
}
 
void call_ex()
{
struct birthday jaws;
int dusty;
 
ex2(&jaws, &dusty);

/* jaws: 26 (day), 8 (month), 1964 (year), and dusty: 123 */
}

All variables declared within a function are local to the function. Global variables are those variables declared outside of functions; global variables are global to all functions. Aggregate types can be declared globally as well. Since C programs can be developed in a number of different files, global data structures (common to a number of separately compiled functions) can be declared as externals using the extern type. For example, assuming a number of separately compiled functions shared a common data structure cookie of type struct birthday, one file would require the declaration: struct birthday cookie (to reserve the memory location); while the other files would contain the declaration extern struct birthday cookie. The linker resolves any addressing problems.

An alternative to declaring a global variable that is used by a single function is to declare a local static variable. The static variable retains its value between calls of the function, whereas all other local variables are automatic in that they are created on the stack for the duration of the function's call. A static variable can be initialized at its declaration:

int example()
{
static char data = 'X';
/* Statements */
}

The entry point from the operating system into the program must be a function with the name main(). This function can have two parameters, the first indicating the number of items entered on the command line when the program is loaded, and the second, an array of pointers pointing to each word (assumed to be a character string) entered on the command line. These two parameters are given the names argc and argv respectively:

main(int argc, char *argv[])
{
/* Statements */
}

For example, if an executable program example has three arguments entered on the command line as follows:

C:\> example cricket dusty 1200

Then the value of argc is 4 (there are four 'words' entered on the command line), and argv is an array of string pointers; the structure is shown in the following figure:

Any of the strings can be accessed; for example, to access dusty, one would refer to the third element of argv, notably argv[2].

Some general points about functions:

Compiler Directives

C supports a number of compiler directives that instruct the compiler to perform an action that need not result in the generation of code. Two compiler directives that are used by Commkit are #define and #ifdef.

The #define compiler directive instructs the compiler to store a symbol and a value in the symbol table. A common use of #define is to declare named constants, for example:

#define TRUE     1
#define FALSE    0
#define LIMIT    25
#define MASK     0xff
#define VALUE    'w'
  
main()
{
char data[LIMIT];  /* 'data' is an array of size LIMIT */
 
if (data[3] & MASK == VALUE)
     then data[3] = 0;
}

It is common to write all defined symbols in UPPER case to distinguish them from other data structures. Defined symbols are not variables, they cannot be an lvalue nor are they associated with an address.

The #define directive can be used for more than simply defining named constants -- it can define entire expressions or statements, for example:

#define FOREVER       for(;;)
#define DOUBLE_X      x *= 2;
Whenever a defined symbol is written, the compiler expands the symbol into whatever the symbol is defined as. For example, whenever FOREVER is encountered, the compiler actually compiles for(;;). Arguments can be passed to compiler directives as well. For example, to allow any value to be doubled (instead of x as in the previous example):
#define DOUBLE(value) value *= 2;

main()
{
int x, count;
 
DOUBLE(x)      /* Compiler produces x *= 2;     */
DOUBLE(count)  /* Compiler produces count *= 2; */
}

Multiple arguments are allowed, although the exact number depends upon the compiler.

Conditional compilation is possible using the #ifdef compiler directive in conjunction with the #define directive. Conditional compilation permits the programmer to instruct the compiler to generate code under certain conditions (for example, when searching for an error).

For example, to track down an error, it is possible to plant diagnostic statements throughout a program. Once the error is found, all the diagnostic statements may be removed (although existing code may potentially be damaged). An alternative is to leave the diagnostic software in the program, but to associate the diagnostic statements with directives that instruct the compiler when to include the diagnostics:

void a_procedure()
{
     /* Statements */
#ifdef DEBUG
     /* Diagnostic statements */
#endif
     /* Statements */
}

The compiler will include the diagnostic statements between the #ifdef and #endif if DEBUG has been defined; otherwise the diagnostic statements are left out of the compilation. DEBUG can be defined simply by writing #define DEBUG (there is no need to associate a value with DEBUG, the compiler simply marks it as defined).

Software Management

There are two types of source code file: source files (with a .c extension), that is, C programs that can be compiled; and header files (indicated by the .h extension), containing definitions and data structures. All of the source files have an equivalent object version (each with an extension .obj), consisting of executable software that must be linked (with other object files) to create an executable file (with an .exe extension).

To minimize the amount of compiling required each time a change is made to a source file, a software management tool known as make is supplied with Borland C. The make utility controls the recompilation of files by reading the commands specified in the makefile. The makefile contains a list of dependencies which specify the source files that must be recompiled and relinked after a change is made. For example, consider the steps in the creation of ipc.exe:

ipc.exe: ipc.obj commkit.obj srcalls.obj
    tlink \tc\lib\c0s ipc commkit srcalls, ipc, , \
    \tc\lib\emu \tc\lib\maths \tc\lib\cs
  
ipc.obj: ipc.c general.h ascii.h devices.h
    bcc -ms -c ipc

The file ipc.obj is dependent upon ipc.c: should a change occur to ipc.c (i.e., if the time and date of ipc.obj is earlier than that of ipc.c), ipc.c is recompiled using bcc, the Borland C (and C++) compiler, producing a new copy of ipc.obj. The executable version of ipc.c (ipc.exe) is also specified as a dependency, this time the Turbo linker tlink is called to link the object files ipc.obj, commkit.obj, and srcalls.obj to create ipc.exe.

The instructions in the makefile are processed by the make utility by typing:

C:\> make

When the make utility finds a file that must be recompiled or relinked, the specific line in the makefile is displayed. If all files are found to be up-to-date, make returns to the MS-DOS prompt.

A specific file can be processed by the make utility by typing the file name after make. For example, to check whether ipc.exe is up-to-date, one could type:

C:\> make ipc.exe

In some cases it is necessary to remake an executable module without having modified the original source module. For example, all object and executable modules supplied with Commkit were created using Turbo C++; to recompile all the source modules with Turbo C would mean having to change the date on each module using a tool such as an editor. Fortunately, the touch utility (supplied with Turbo C and C++) can change a file's time and date to the present; if touch is followed by a make, the associated files will be recompiled. As an example, to force the recompilation of ipc.c, one could type:

C:\> touch ipc.c
C:\> make ipc.exe

Wildcards work with touch; a filename of *.c will change the time and date of all .c modules to the present.


© 2002 -- Whale Lake Press