The ANSI Standard: A Summary for the C Programmer
By Eric Giguere
December 18, 1987
The following summary of ANSI C was originally written for inclusion
with the Waterloo C compiler for VM/CMS, a compiler developed
by the Computer Systems Group (CSG) at the University of Waterloo.
At the time, I was working as a co-op student at CSG, working
on Waterloo C. It was also published as an article in the
Transactor for the Amiga, Volume 1, Number 3. It is
presented here primarily for historical interest I doubt
it will be that useful to anyone! Although
it predates the final standard, it presents the major changes
quite nicely. You can also check out the second edition
of K&R for more information.
Introduction
Programming languages are constantly evolving and diversifying.
The C language is no exception, especially due to its increased
popularity in recent years. The original specification document
for C, The C Programming Language
by Brian Kernighan and Dennis Ritchie, commonly referred to
as K&R, is now almost ten years old.
K&R has served as the C programmer's "bible", the de facto
standard for C. But as the language has evolved,
the need for a formal language standard has become apparent.
Under the auspices of the International Standards Organization (ISO),
the American National Standards Institute (ANSI) began the preparation
of such a standard through the leadership of the X3J11 Technical
Committee. The proposed standard is now in final draft form and is
expected to be approved by ANSI and ISO in 1988.
This document is a summary of the proposed standard's major changes
to the language as it pertains to the C programmer. All information
is drawn from official X3J11 documents: the
Draft Proposed American National Standard for Information
Systems - Programming Language C
and the accompanying
Rationale for Draft Proposed National Standard for Information
Systems - Programming Language C.
(These publications will be referred to as the Standard and the
Rationale, respectively.)
Purpose of the Summary
Few programmers have either the time or the interest
to wade through the actual text of the draft Standard
in its entirety. What interests them are the
major points of the Standard and the changes
it makes to what was defined in K&R
and how it affects their current programs.
By summarizing these changes, this document is intended
to provide a
quick reference that the average C programmer can read
and understand in one session. In keeping with this goal,
the Standard Library is only briefly mentioned. Readers
interested in the specifics of the Library should consult
the Standard itself or the documentation accompanying
any ANSI-conforming compiler.
This document is not
a criticism or a justification
of the Standard, only a commentary.
Nor is it a tutorial on the C programming language.
Readers should
also be aware that changes may occur to the Standard before its
final acceptance by ANSI.
What to expect from the Standard
The Standard is not
creating a new language definition. Its purpose, to quote
the Rationale, is to "codify common existing practice". This
means that the fundamental structure and syntax of the language
as described in K&R has been left unchanged. The Standard has
instead tried to unify the diverse extensions and dialects
that have grown over the years
(the existing practice)
into a single cohesive language definition. Existing
practice is often inconsistent, however, so many compromises
have had to be made.
Perhaps the most important thing to remember about the Standard
is that it is not intended to invalidate existing C code.
Existing programs should compile with only minor changes
when using an ANSI-conformant compiler.
Common Terms
Throughout this document the word implementation
will be used to refer to any particular implementation
of an ANSI-conformant C language interpreter or compiler.
1. Reserved Identifiers
The following keywords have been added to the language:
- const
- enum
- signed
- void
- volatile
Explanations for each keyword follow in Sections 2 and 4.
In addition, the identifier "entry"
has been deleted from the list of reserved identifiers as it
was never implemented by K&R or subsequent versions of C.
Each keyword is a reserved identifier; programs that currently
use these keywords as variable names must be changed to compile
under an ANSI-conformant implementation.
2. Data Types
Major language changes occur with respect to data types.
The trend in the Standard has been to provide the language
with stronger typing facilities.
Integers
The list of available integer types has been
expanded to include "signed char"
and the following variations:
- signed char
- signed int
- signed long
- signed short
- unsigned int
- unsigned long
- unsigned short
long int and short int may also be used as variations of
long and short, respectively. As would be expected,
the declarations:
signed x;
unsigned y;
may be used as shorthand for signed int and unsigned int.
All logical combinations for integer types are now allowed.
Whether or not a simple char is considered to be signed or unsigned
is left up to the implementation.
While int is still the default type for variables and functions,
at least one storage class (auto, register,
static, extern)
or type specifier must be present when declaring a variable.
A declaration of the form:
x;
is no longer allowed and must be replaced with:
int x;
to compile.
Floats
The new type long double has been added for more
precision. But like any
long type, an object of this type is only guaranteed to be
at least as large as a double.
The type long float (a previous synonym for double)
is now invalid. The only acceptable floating-point types are
float, double and long double.
Structures and Unions
Member name spaces are now unique within structures and unions.
That is, two different structures or unions may contain members with
the same name without fear of conflict.
Structures and unions may now:
- be assigned to another of the same type
- be initialized when declared with the auto storage class
- be passed as function parameters and return values
Enumerations
Already available in most compilers, enumerations have been
added to the language. An enumeration is a way
of declaring a set of integer constants. The declaration:
enum colours { RED, BLUE, GREEN };
would declare colours
as an enumeration tag representing the integer constants
RED, BLUE and GREEN.
These enumeration constants are given integer values starting
at 0 and increasing by 1 with each identifier.
An enumeration
constant may be used wherever an integer is expected.
The following is equivalent to the above enumerated type:
#define RED 0
#define BLUE 1
#define GREEN 2
Enumeration constants are not restricted to upper case, but
upper case is a widely recognized convention for constants.
Variables may be declared to have enumeration type.
The declaration:
enum colours x, y;
declares x and y to be integer variables capable of holding an enumeration
constant of type colours.
In practice, little or no checking is done to make sure enumeration
constants are used, so the following assignments are equivalent:
x = BLUE;
x = 1; /* defeats purpose of enum */
Constant values may be directly assigned within an enumeration
as well:
enum relation { EQUAL = 1, LESS_THAN = 2,
GREATER_THAN = 4 };
If no value is specified for a given identifier, the constant
is taken to have the value of the previous constant plus one.
The size of an enumeration type has been left unspecified;
the implementation is free to store it in the most optimal
fashion, providing that it always behaves like an int.
Void Type
The void data type has been added to indicate that an expression has
no value. No variables can be declared with such a type, but
expressions may be cast to void.
For example, the following declaration:
(void) printf( "hello world" );
specifically indicates to the compiler that the return value
from printf (an integer) is to be ignored. As such,
the following statement is illegal:
a = (void) func(); /* illegal! */
since the assignment operator expects a value to be returned for
assignment.
Void pointers and void functions are discussed below and in Section 4.
Pointers and Arrays
Pointers are no longer synonymous with the int
type. Pointers may only be compared with or assigned:
- the integer value 0 (used to define a null pointer)
- a pointer of the same type
- a pointer of generic type (a void pointer)
Any other use of a pointer will generate a warning message
upon compilation. Many assignment statements will require
explicit casting of the right-hand values to avoid generating
these messages.
A void pointer is a pointer that has no base type that is, it points
to a type of unknown specification and is declared using the syntax:
void *ptr;
Indirection through a void pointer is not allowed; it must be
cast to an appropriate pointer type first.
Its main use is as a generic pointer.
Arrays with storage class auto may now be initialized. If specified,
the size of an array must be an integral expression greater than zero.
Special Modifiers
The Standard makes available the two attributes
const and volatile for use as type modifiers.
An object declared to be const cannot be modified (assigned to,
incremented or decremented) by a program. Thus the following code
is invalid:
const int x;
/* ..... */
x = 2; /* illegal! */
Initialization, however, is allowed:
const unsigned char masks[] = { 0x00, 0xff };
A const object (if it is of static storage duration)
may be data that is put into read-only memory. Declaring such
data with the const
attribute allows the compiler to diagnose any attempts at
modifying the data. Function parameters may also be declared as
const to indicate that they are not
modified by the function. This provides both extra documentation
and, when function prototypes (described in Section 4) are used
properly, consistent error-checking.
A volatile object is one that may be modified outside of program control.
Memory-mapped I/O ports are a typical example. Declaring an object as
volatile indicates that the compiler should always generate code to
fetch the object's value from its actual memory location
it may have changed since the last access by the program.
(This disallows optimizations which could load the value into
a register and possibly return erroneous results.)
volatile char *port1 = 0x00f3; /* ptr to I/O port */
while( *port1 & DATA_FLAG ) /* needs to be volatile */
clear_io();
The const and volatile modifiers may be used (singly or together) in
combination with any other valid type specifiers.
Pointers may also be declared to be const or volatile
through the use of special syntax:
int const *a;
int *const b;
int *const *c;
In the example, a is declared to be a pointer to a const integer,
whereas b is declared to be a const pointer to an integer.
The distinction lies in the placement of the const attribute. The
declaration for c is even more confusing: it declares a
pointer to a const pointer to an integer.
Consider the following statements:
a = NULL; /* ok */
*a = 0; /* error */
b = NULL; /* error */
*b = 0; /* ok */
c = &b; /* ok */
*c = NULL; /* error */
**c = 0; /* ok */
Because a is a pointer to a const int,
the value it points to may not be changed. Similarly,
because b is a const pointer to an int,
it may not be modified, though the value it points to may.
The pointer c may be modified, but not the pointer *c,
though **c (an integer) is modifiable itself.
The volatile modifier may also be used with pointers
in conjunction with or separate from const.
Bit-fields
Bit-fields may now be of type int, unsigned int or
signed int.
Whether or not the high-order bit of an int
bit-field is to be considered a sign bit is
implementation-defined.
Vacuous Definitions
A vacuous definition consisting only of a struct or union
specifier with a tag name is now allowed. Its purpose is to
hide any outer declaration of the same name in the current
block, as the definitions for struct a
demonstrate in this example:
struct a {
int x;
};
int func(){
struct a st1; /* struct defined above */
struct a; /* vacuous definition: it "clears"
the current defn of struct a to
make way for a new one */
/* references to struct a now refer to the new
definition within this block */
struct b {
struct a *y; /* refers to NEXT struct */
} st2;
struct a {
struct b *z;
} st3;
st1.x = 1;
st2.y = &st3; /* &st1 would give warning */
st3.z = &st2;
}
/* old struct defn now back in scope */
Here the member y of st2 is a pointer to the
second struct a,
which is defined below it. If the vacuous definition
struct a;
had not been present, y would instead have been a pointer to the
struct a defined in the previous (enclosing) scope level outside
the function.
Conversions and Promotions
The Standard defines the integral promotions as follows: the
char, short or bit-field types (with or without the
signed or unsigned
modifiers) may be used wherever an int is expected. The values will be
converted to int if possible; otherwise they will be converted to
unsigned int.
The usual arithmetic conversions
used with most binary operators have been modified to reflect
the new types available to the programmer.
Of particular note: expressions of type float are no longer automatically
converted to double for arithmetic purposes; such arithmetic may now be
performed less accurately.
The Standard also specifies other rules regarding conversions.
Where signed and unsigned integer values are concerned, the
Standard now advocates value preserving as opposed to
unsigned preserving rules: unsigned values are promoted to
signed int if possible, otherwise they are promoted to
unsigned int.
Floating-point values must now truncate towards zero when
converted to integral types. No rounding need occur when a
double is demoted to float.
Otherwise, the rules in K&R are unchanged.
Any program comparing or performing arithmetic on
values of different types should be closely screened
for possible changes in behaviour.
Minimum Type Limits
Any compiler conforming to the Standard must also respect the
following limits with respect to the range of values any
particular type may accept. Note that these are lower limits:
an implementation is free to exceed any or all of these.
Note also that the minimum range for a char is dependent
on whether or not a char is considered to be signed or unsigned.
Type | Minimum Range |
signed char | -127 to +127 |
unsigned char | 0 to 255 |
short int | -32767 to +32767 |
unsigned short int | 0 to 65535 |
int | -32767 to +32767 |
unsigned int | 0 to 65535 |
long int | -2147483647 to +2147483647 |
unsigned long int | 0 to 4294967295 |
|
Type | Minimum Precision |
float | 6 digits |
double | 10 digits |
long double | 10 digits |
The Standard also specifies that these limits should be present
as preprocessor macros in the header file <limits.h>.
3. Data Objects
Changes in this area have occurred mainly with respect to
variable (object) linkage and initialization.
Initialization
Objects declared as either static or auto
may be initialized by following the declaration with an equals
sign, '=', and an initialization expression. External (inter-module)
objects are discussed below.
If no initialization is given for a static
object, all arithmetic types in the object are assigned 0 and all
the pointers are set to NULL. If no initialization is given to an
auto object, its initial value is undefined.
These rules are unchanged from K&R.
All initializers for either static objects or auto
arrays, unions and structures must be constant expressions.
Unions may now be initialized: the initialization value is assigned
to the first member of the union.
The initialization expression for a scalar (integral, floating-point
or pointer) object may optionally be enclosed in braces. Braces
must enclose the initialization expressions for arrays, structures
and unions. There can be no more initializers in an initialization
list than there are objects to be initialized (there may be less,
though, and any remaining uninitialized objects are handled as
described above).
An array of char or pointer to char
may be initialized with a string constant.
Linkage
In this section object
refers to an object declared outside of any function.
The linkage
of an object determines its scope within the program. An object
with external linkage is known to all files in a program. An object with
internal linkage is known only to the file in which it is declared.
Current C compilers
often differentiate between the two in incompatible ways, an issue
which the Standard resolves.
An object is said to be defined
if it includes an initializer. A defined object has internal
linkage if the storage class static
is specified; otherwise it has external linkage. An object can
only be defined once.
Any object declaration without the extern
modifier and without an initializer
constitutes what is known as a tentative definition
of the object. If an actual definition for the object is
encountered in the same file, all tentative definitions are
considered to be simple declarations referring to that object.
Otherwise the first tentative definition is considered to be
an actual definition with initializer equal to 0.
/* example drawn from the Standard */
int i1 = 1; /* definition, external linkage */
static int i2 = 2; /* definition, internal linkage */
extern int i3 = 3; /* definition, external linkage */
int i4; /* tentative definition */
static int i5; /* tentative definition */
int i1; /* tentative def., refers to previous */
int i2; /* invalid -- linkage disagreement */
int i3; /* tentative def., refers to previous */
int i4; /* tentative def., refers to previous */
int i5; /* invalid -- linkage disagreement */
extern int i1; /* these are all valid references */
extern int i2;
extern int i3;
extern int i4;
extern int i5;
These complex rules provide the most flexibility and allow
the majority of current C code to be compatible with the
Standard.
The simplest way to declare an externally-linked object is to
define it in one file (with or without initializer) and reference
it in all others through the use of an appropriate
extern declaration.
4. Functions
Additions to the language definition
occur in the area of function declarations, function definitions
and variable parameter lists.
Function Definitions
The Standard now allows the types of formal parameters to be
specified within
the actual function declaration at the start of the function
definition. This new-style definition form more closely
resembles languages such as Pascal and Modula-2:
int main( int argc, char *argv[] ){
/* ... */
}
If this style is used, a type must be specified separately
for each formal parameter in the argument list. Mixing the
new-style with the K&R-style in the same definition is not allowed.
This style is intended by ANSI to become the favored style,
and a future Standard may disallow the K&R-style definition.
For the moment, however, both styles may be used interchangeably.
Functions may also be defined as explicitly having no return
values. Such functions are called void functions
and are defined using the type void:
void func( int a ){
/* ... */
return;
}
Though the use of the return statement is allowed, such functions must not
return an expression. If no type is explicitly specified, the
function return type still defaults to int to retain compatibility with K&R.
Function Declarations and Prototypes
Function type declarations are also consistent with
the new-style function definitions
and may include a list of formal parameters.
These parameters consist of type declarations with or without
identifiers. Identifiers are cosmetic only and need only be
included for readability. Some examples are:
int main( int, char *[] );
extern char *strcpy( char *dst, const char *src );
Note that declarations must be consistent as they will
be checked by the compiler. Each declaration
of a function should agree with all previous declarations
in both the number and types of parameters.
The following declarations illustrate two special cases:
extern int func1( void );
extern int func2();
The first case explicitly declares that the function func1
does not take any parameters; that the parameter list is
empty or void.
The second declaration states that no information is known
on the number and types of any formal parameter.
This is to provide compatibility with K&R.
A function declaration that provides the number and types
of parameters is called a function prototype.
The addition of prototypes to C allows for stricter type-checking
by the compiler. When a prototype for a function has been declared,
each subsequent call to that function is checked to make sure that
the correct number of arguments has been supplied. As well, the
type of each argument is compared with what was declared
in the prototype. If different, the argument is converted to
the required type as if it had been assigned to an object of that
type. The default argument promotions
(char and short to int, float to
double)
are not performed when a prototype has been declared.
(Note: The default argument promotions are separate from the
usual arithmetic conversions.)
If a function prototype occurs in the same file as the
definition of that function, both the prototype and the definition
must agree exactly if the definition is of the new style.
In K&R-style definitions, the formal parameters are first widened by the
default argument promotions and then compared to the prototype(s).
If no prototype occurs in the file, the function definition itself
serves as a prototype for the code following it.
Variable Parameter Lists
Certain C functions are designed to take a variable number of
parameters. Unfortunately, some compilers use different
schemes for handling such situations and what works in one
implementation may not work elsewhere. The Standard therefore
provides for the explicit declaration of such functions and
portable facilities for handling them. A function that takes a
variable number of parameters is defined by ending the
parameter list (new-style only) with an ellipsis:
int printf( const char *format, ... ){
/* ... */
}
Thus the only thing known about printf is that it takes
at least one parameter, the type of which is a pointer to const char.
Prototypes may also be declared in this fashion:
extern sprintf( char *dest, const char *format, ... );
The compiler will then make sure that each call to
sprintf has at least two arguments, both of which are pointers to char.
The arguments themselves are accessed through the use of
special macro facilities defined in the header file
<stdarg.h>, part of the ANSI Standard Library.
5. The Preprocessor
The C preprocessor, long since recognized as an integral part
of the language, has benefitted from a number of additions
and clarifications in the Standard.
New Directives
The #elif directive has been added as a shorthand form of the
#else #if preprocessor sequence.
The identifier defined is reserved during an #if or
#elif
so that:
#if defined( NULL )
#if !defined( TRUE )
are equivalent to:
#ifdef NULL
#ifndef TRUE
Also new on the list are the directives
#error and #pragma.
The former produces an error message at compile-time; the latter
is implementation-defined in its use and effects.
File Inclusion
Besides the two allowable forms:
#include <fname1>
#include "fname2"
a third form:
#include fname3
is acceptable, provided that fname3
is a macro which expands into one of the other two forms.
Macro Operators
Two new operators have been added for use within
a macro replacement string. The ## (concat)
operator concatenates two adjacent preprocessor tokens (a
preprocessor token is any consecutive series of non-blank
characters). The # (stringize)
operator places the parameter following it in string form.
For example, consider the following definition:
#define debug( s ) printf( "x" # s "= %d\\n", x ## s )
The following macro call:
debug( 1 );
expands to:
printf( "x" "1" " = %d\\n", x1 );
which after string concatenation (see Section 7) gives the
final result:
printf( "x1 = %d\\n", x1 );
Program debugging through the use of macros has thus been
made simpler.
Predefined Macros
Five new macros are predefined in the Standard, all of which are
expanded to their appropriate values upon file compilation.
Macro | Expands To |
__DATE__ | current date |
__TIME__ | current time |
__FILE__ | current file name |
__LINE__ | current line name |
__STDC__ | non-zero value |
The definition of the __STDC__ macro indicates an ANSI-conformant
compiler.
None of these macros may be redefined by a program.
6. The C Library
A standardized library of routines aids the programmer and
enhances portability. The Standard defines such a library,
which is too large to describe here in any detail. The
Standard Library is based on the library compiled by
/usr/group, a UNIX user's group, with all the UNIX dependencies deleted.
The Standard Library also provides a set of standard library
headers. These headers provide function prototypes for the
set of routines that make up the library and define
commonly-used macros. As well, the functions
and their prototypes have
been changed so as to be invariant to the default promotions
all are declared using promoted types (such as int and
double) for parameters.
Thus parameters passed to a library function will always be
of the same type, regardless of whether a prototype is in
scope or not.
Macros may also be defined in a header file to take the place
of actual calls to library routines. However, the library
routines themselves must exist as the macros may be subjected
to an #undef directive by the user at any time.
Among the most notable additions to the library are variable
argument handling, numeric limits information, and
locale (the current environment) information.
K&R library functions have also been converted to the new
style and syntax, so that malloc, for example, now returns a
void * as opposed to a char *.
7. Miscellaneous
Numerous other minor changes and additions have occurred throughout the
language:
- The escape characters '\a' and '\v' have been added for alarm (bell)
and vertical tab.
- A series of special trigraph characters
has been added as equivalents to ASCII characters which may
not appear in the character sets of some countries; a trigraph
is a three-character sequence starting with "??":
Character | Trigraph |
# | ??= |
[ | ??( |
\ | ??/ |
] | ??) |
^ | ??' |
{ | ??< |
| | ??! |
} | ??> |
~ | ??- |
- The suffixes u and l may be used with integer constants to specify
unsigned and long values; both may be used together to specify
unsigned long.
- The suffixes f and l may be used with floating-point constants to
specify float and long double values.
- If the high bit of an octal or hex constant is set, it is
considered to be unsigned.
- Adjacent strings separated only by white space are concatenated.
- The unary plus ('+') has been added to force the evaluation
of an arithmetic expression to occur before any others.
- A function can be called through a pointer using either the
K&R-style syntax (*fp)() or the new-style fp().
- External identifier length significance is still 6 characters
with no case sensitivity (for compatibility with existing
linkers).
- Internal identifier length significance is a minimum of 31
characters (case sensitive).
- Each macro function call is expanded only once,
which prevents the definition of recursive macros.
- External identifiers beginning with an underscore are
reserved for library usage.
- Identifiers beginning with an underscore followed
either by a capital letter or another underscore
are reserved for use as predefined macro names.
- Multi-byte character constants are allowed, though their
values are implementation-defined.
- Hexadecimal character constants may be specified using
\x followed by a series of hexadecimal digits (ex.: '\xff')
Identifiers in current programs that are now reserved by
the Standard will have to be altered to be portable
across compilers.
References
- American National Standards Institute, Inc.
- Draft Proposed American National Standard for Information Systems
- Programming Language C,
ANSI X3J11/87-221 (November 9, 1987).
- American National Standards Institute, Inc.
- Rationale for
Draft Proposed American National Standard for Information Systems
- Programming Language C,
ANSI X3J11/87-219 (November 6, 1987).
- Kernighan, Brian W., and Dennis M. Ritchie
- The C Programming Language,
Prentice-Hall, Englewood Cliffs, NJ (1978).
- Plum, Thomas
- Notes on The Draft C Standard,
Plum Hall Inc., Cardiff, NJ (1987).
User groups have permission to reprint this article for free
as described on the copyrights page.
|