Chapter 2: Tokens
What's in Chapter 2?
ASCII characters 
			Literals include numbers characters and strings 
			Keywords are predefined 
			Names are user-defined 
			Punctuation marks 
			Operators 
This chapter defines the basic building blocks of a C program. Understanding the concepts in this chapter will help eliminate the syntax bugs that confuse even the veteran C programmer. A simple syntax error can generate 100's of obscure compiler errors. In this chapter we will introduce some of the syntax of the language.
To understand the syntax of a C program, we divide it into tokens separated by white spaces and punctuation. Remember the white spaces include space, tab, carriage returns
		and line feeds. A token may be a single character or a sequence
		of characters that form a single item. The first step of a compiler
		is to process the program into a list of tokens and punctuation
		marks. The following example includes punctuation marks of ( ) { } ; The compiler then checks for proper syntax. And, finally, it
		creates object code that performs the intended operations. In
		the following example:
void main(void){ short z;
			   z=0; 
			   while(1){ 
			       z=z+1; 
			   }} 
The following sequence shows the tokens and punctuation marks from the above listing:
void main ( void ) { short z ; z = 0 ; while ( 1 ) { z = z + 1
		; } } 
Since tokens are the building blocks of programs, we begin our study of C language by defining its tokens.
Like most programming languages C uses the standard ASCII character set. The following table shows the 128 standard ASCII code. One or more white space can be used to separate tokens and or punctuation marks. The white space characters in C include horizontal tab (9=$09), the carriage return (13=$0D), the line feed (10=$0A), space (32=$20).
BITS 4 to 6 
0 | 
				1 | 
				2 | 
				3 | 
				4 | 
				5 | 
				6 | 
				7 | 
			||
0 | 
				NUL | 
				DLE | 
				SP | 
				0 | 
				@ | 
				P | 
				` | 
				p | 
			|
B | 
				1 | 
				SOH | 
				DC1 | 
				! | 
				1 | 
				A | 
				Q | 
				a | 
				q | 
			
I | 
				2 | 
				STX | 
				DC2 | 
				" | 
				2 | 
				B | 
				R | 
				b | 
				r | 
			
T | 
				3 | 
				ETX | 
				DC3 | 
				# | 
				3 | 
				C | 
				S | 
				c | 
				s | 
			
S | 
				4 | 
				EOT | 
				DC4 | 
				$ | 
				4 | 
				D | 
				T | 
				d | 
				t | 
			
5 | 
				ENQ | 
				NAK | 
				% | 
				5 | 
				E | 
				U | 
				e | 
				u | 
			|
0 | 
				6 | 
				ACK | 
				SYN | 
				& | 
				6 | 
				F | 
				V | 
				f | 
				v | 
			
7 | 
				BEL | 
				ETB | 
				' | 
				7 | 
				G | 
				W | 
				g | 
				w | 
			|
T | 
				8 | 
				BS | 
				CAN | 
				( | 
				8 | 
				H | 
				X | 
				h | 
				x | 
			
O | 
				9 | 
				HT | 
				EM | 
				) | 
				9 | 
				I | 
				Y | 
				i | 
				y | 
			
A | 
				LF | 
				SUB | 
				* | 
				: | 
				J | 
				Z | 
				j | 
				z | 
			|
3 | 
				B | 
				VT | 
				ESC | 
				+ | 
				; | 
				K | 
				[ | 
				k | 
				{ | 
			
C | 
				FF | 
				FS | 
				, | 
				< | 
				L | 
				\ | 
				l | 
				| | 
			|
D | 
				CR | 
				GS | 
				- | 
				= | 
				M | 
				] | 
				m | 
				} | 
			|
E | 
				SO | 
				RS | 
				. | 
				> | 
				N | 
				^ | 
				n | 
				~ | 
			|
F | 
				S1 | 
				US | 
				/ | 
				? | 
				O | 
				_ | 
				o | 
				DEL | 
			
The first 32 (values 0 to 31 or $00 to $1F) and the last one (127=$7F) are classified as control characters. Codes 32 to 126 (or $20 to $7E) include the "normal" characters. Normal characters are divided into
the space character (32=$20), the numeric digits 0 to 9 (48 to 57 or $30 to $39), 
			the uppercase alphabet A to Z (65 to 90 or $41 to $5A), 
			the lowercase alphabet a to z (97 to122 or $61 to $7A), and 
			the special characters (all the rest). 
			
Numeric literals consist of an uninterrupted sequence of digits delimited by white spaces or special characters (operators or punctuation). Although ICC12 and Hiware do support floating point, this document will not cover it. The use of floating point requires a substantial about of program memory and execution time, therefore most applications should be implemented using integer math. Consequently the period will not appear in numbers as described in this document. For more information about numbers see the sections on decimals, octals, or hexadecimals in Chapter 3.
Character literals are written by enclosing an ASCII character in apostrophes (single
		quotes). We would write 'a' for a character with the ASCII value of the lowercase a (97).
		The control characters can also be defined as constants. For example
		'\t' is the tab character. For more information about character literals
		see the section on characters in Chapter 3.
String literals are written as a sequence of ASCII characters bounded by quotation marks (double quotes). Thus, "ABC" describes a string of characters containing the first three letters of the alphabet in uppercase. For more information about string literals see the section on strings in Chapter 3.
There are some predefined tokens, called keywords, that have specific meaning in C programs. The reserved words we will cover in this document are:
				keyword | 
				
				meaning | 
			
asm | 
				Insert assembly code | 
			
auto | 
				Specifies a variable as automatic (created on the stack) | 
			
break | 
				Causes the program control structure to finish | 
			
case | 
				One possibility within a switch statement | 
			
char | 
				8 bit integer | 
			
const | 
				Defines parameter as constant in ROM | 
			
continue | 
				Causes the program to go to beginning of loop  | 
			
default | 
				Used in switch statement for all other cases | 
			
do | 
				Used for creating program loops | 
			
double | 
				Specifies variable as double precision floating point | 
			
else | 
				Alternative part of a conditional | 
			
extern | 
				Defined in another module | 
			
float | 
				Specifies variable as single precision floating point | 
			
for | 
				Used for creating program loops | 
			
goto | 
				Causes program to jump to specified location | 
			
if | 
				Conditional control structure | 
			
int | 
				16 bit integer (same as short on the 6811 and 6812) | 
			
long | 
				32 bit integer | 
			
register | 
				Specifies how to implement a local | 
			
return | 
				Leave function | 
			
short | 
				16 bit integer | 
			
signed | 
				Specifies variable as signed (default) | 
			
sizeof | 
				Built-in function returns the size of an object | 
			
static | 
				Stored permanently in memory, accessed locally | 
			
struct | 
				Used for creating data structures | 
			
switch | 
				Complex conditional control structure | 
			
typedef | 
				Used to create new data types | 
			
unsigned | 
				Always greater than or equal to zero | 
			
void | 
				Used in parameter list to mean no parameter | 
			
volatile | 
				Can change implicitly | 
			
while | 
				Used for creating program loops | 
			
Did you notice that all of the keywords in C are lowercase? Notice also that as a matter of style, I used a mixture of upper and lowercase for the names I created, and all uppercase for the I/O ports. It is a good programming practice not to use these keywords for your variable or function names.
We use names to identify our variables, functions, and macros. ICC11/ICC12 names may be up to 31 characters long. Hiware names may be up to xxx characters long. Names must begin with a letter or underscore and the remaining characters must be either letters or digits. We can use a mixture of upper and lower case or the underscore character to create self-explaining symbols. E.g.,
time_of_day    go_left_then_stop
TimeOfDay      GoLeftThenStop
The careful selection of names goes a long way to making our programs more readable. Names may be written with both upper and lowercase letters. The names are case sensitive. Therefore the following names are different:
thetemperature
			THETEMPERATURE
			TheTemperature
The practice of naming macros in uppercase calls attention to the fact that they are not variable names but defined symbols. Remember the I/O port names are implemented as macros in the header files HC11.h and HC12.h.
Every global name defined with the ICC11/ICC12 compiler generates an assembly language label of the same name, but preceded by an underscore. The purpose of the underscore is to avoid clashes with the assembler's reserved words. So, as a matter of practice, we should not ordinarily name globals with leading underscores. Hiware labels will not include the underscore. For examples of this naming convention, observe the assembly generated by the compiler (either the assembly itself in the *.s file or the listing file *.lst file.) These assembly names are important during the debugging stages. We can use the map file to get the absolute addresses for these labels, then use the debugger to observe and modify their contents.
Since the Imagecraft compiler adds its own underscore, names written with a leading underscore appear in the assembly file with two leading underscores.
Developing a naming convention will avoid confusion. Possible ideas to consider include:
1. Start every variable name with its type. E.g.,
b means boolean true/falsen means 8 bit signed integer
			u means 8 bit unsigned integer
			m means 16 bit signed integer
			v means 16 bit unsigned integer
			c means 8 bit ASCII character
			s means null terminated ASCII string
			
2. Start every local variable with "the" or "my"
3. Start every global variable and function with associated file
		or module name. In the following example the names all begin with
		Bit_. Notice how similar this naming convention recreates the look
		and feel of the modularity achieved by classes in C++. E.g., 
/* **********file=Bit.c*************
			   Pointer implementation of the a Bit_Fifo
			   These routines can be used to save (Bit_Put) and
			   recall (Bit_Get) binary data 1 bit at a time (bit streams)
			   Information is saved/recalled in a first in first out manner
			   Bit_FifoSize is the number of 16 bit words in the Bit_Fifo
			   The Bit_Fifo is full when it has 16*Bit_FifoSize-1 bits */
			#define Bit_FifoSize4
			// 16*4-1=31 bits of storage
			unsigned short Bit_Fifo[Bit_FifoSize]; // storage for Bit Stream
			struct Bit_Pointer{
			   unsigned short Mask; // 0x8000, 0x4000,...,2,1
			   unsigned short *WPt;}; // Pointer to word containing bit
			typedef struct Bit_Pointer Bit_PointerType;
			Bit_PointerType Bit_PutPt; // Pointer of where to put next
			Bit_PointerType Bit_GetPt; // Pointer of where to get next
			/* Bit_FIFO is empty if Bit_PutPt==Bit_GetPt */
			/* Bit_FIFO is full if Bit_PutPt+1==Bit_GetPt */
			short Bit_Same(Bit_PointerType p1, Bit_PointerType p2){
			   if((p1.WPt==p2.WPt)&&(p1.Mask==p2.Mask))
			      return(1); //yes
			   return(0);} // no
			void Bit_Init(void) {
			   Bit_PutPt.Mask=Bit_GetPt.Mask=0x8000;
			   Bit_PutPt.WPt=Bit_GetPt.WPt=&Bit_Fifo[0]; /* Empty */
			}
			// returns TRUE=1 if successful,
			// FALSE=0 if full and data not saved
			// input is boolean FALSE if data==0
			short Bit_Put (short data) { Bit_PointerType myPutPt;
			   myPutPt=Bit_PutPt;
			   myPutPt.Mask=myPutPt.Mask>>1;
			   if(myPutPt.Mask==0) {
			      myPutPt.Mask=0x8000;
			      if((++myPutPt.WPt)==&Bit_Fifo[Bit_FifoSize])
			         myPutPt.WPt=&Bit_Fifo[0]; // wrap
			   }
			   if (Bit_Same(myPutPt,Bit_GetPt))
			      return(0); /* Failed, Bit_Fifo was full */
			   else { 
			      if(data)
			         (*Bit_PutPt.WPt) |= Bit_PutPt.Mask; // set bit
			      else
			         (*Bit_PutPt.WPt) &= ~Bit_PutPt.Mask; // clear bit
			      Bit_PutPt=myPutPt;
			      return(1);
			   }
			}
			// returns TRUE=1 if successful,
			// FALSE=0 if empty and data not removed
			// output is boolean 0 means FALSE, nonzero is true
			short Bit_Get (unsigned short *datapt) {
			   if (Bit_Same(Bit_PutPt,Bit_GetPt))
			      return(0); /* Failed, Bit_Fifo was empty */
			   else { 
			      *datapt=(*Bit_GetPt.WPt)&Bit_GetPt.Mask;
			      Bit_GetPt.Mask=Bit_GetPt.Mask>>1;
			      if(Bit_GetPt.Mask==0) {
			         Bit_GetPt.Mask=0x8000;
			         if((++Bit_GetPt.WPt)==&Bit_Fifo[Bit_FifoSize])
			            Bit_GetPt.WPt=&Bit_Fifo[0]; // wrap
			      }
			      return(1); 
			   }
			}
Punctuation marks (semicolons, colons, commas, apostrophes, quotation marks, braces, brackets, and parentheses) are very important in C. It is one of the most frequent sources of errors for both the beginning and experienced programmers.
Semicolons are used as statement terminators. Strange and confusing syntax errors may be generated when you forget a semicolon, so this is one of the first things to check when trying to remove syntax errors. Notice that one semicolon is placed at the end of every simple statement in the following example
#define PORTB *(unsigned char volatile *)(0x1004)
			void Step(void){
			   PORTB = 10;
			   PORTB = 9;
			   PORTB = 5;
			   PORTB = 6;}
Preprocessor directives do not end with a semicolon since they
		are not actually part of the C language proper. Preprocessor directives
		begin in the first column with the #and conclude at the end of the line. The following example will
		fill the array DataBuffer with data read from the input port (PORTC). We assume in this
		example that Port C has been initialized as an input. Semicolons
		are also used in the for loop statement (see also Chapter 6), as illustrated by
void Fill(void){ short j;
			   for(j=0;j<100;j++){
			      DataBuffer[j]=PORTC;}
			}
We can define a label using the colon. Although C has a goto statement, I discourage its use. I believe the software is easier
		to understand using the block-structured control statements (if, if else, for, while, do while, and switch case.) The following example will return after the Port C input reads
		the same value 100 times in a row. Again we assume Port C has
		been initialized as an input. Notice that every time the current
		value on Port C is different from the previous value the counter
		is reinitialized.
char Debounce(void){ short Cnt; unsigned char LastData;
			Start:    Cnt=0;          /* number of times Port C is the same
			*/
			          LastData=PORTC; 
			Loop:     if(++Cnt==100) goto Done;     /* same thing 100 times
			*/
			          if(LastData!=PORTC) goto Start;/* changed */ 
			          goto Loop; 
			Done:     return(LastData);}
Colons also terminate case, and default prefixes that appear in switch statements. For more information
		see the section on switch in Chapter 6. In the following example, the next stepper motor
		output is found (the proper sequence is 10,9,5,6). The default
		case is used to restart the pattern.
unsigned char NextStep(unsigned char step){ unsigned char theNext;
			   switch(step){
			      case 10: theNext=9; break;
			      case 9: theNext=5; break;
			      case 5: theNext=6; break;
			      case 6: theNext=10; break;
			      default: theNext=10; 
			   } 
			return(theNext);}
For both applications of the colon (goto and switch), we see that a label is created that is a potential target for
		a transfer of control.
Commas separate items that appear in lists. We can create multiple variables of the same type. E.g.,
unsigned short beginTime,endTime,elapsedTime;
Lists are also used with functions having multiple parameters (both when the function is defined and called):
short add(short x, short y){ short z;
			    z=x+y; 
			    if((x>0)&&(y>0)&&(z<0))z=32767; 
			    if((x<0)&&(y<0)&&(z>0))z=-32768; 
			    return(z);} 
			void main(void){ short a,b;
			    a=add(2000,2000)
			    b=0
			    while(1){
			      b=add(b,1);
			  }
Listing 2-6: Commas separate the parameters of a function
Lists can also be used in general expressions. Sometimes it adds
		clarity to a program if related variables are modified at the
		same place. The value of a list of expressions is always the value
		of the last expression in the list. In the following example,
		first thetime is incremented, thedate is decremented, then x is set to k+2.
x=(thetime++,--thedate,k+2);
Apostrophes are used to specify character literals. For more information about character literals see the section
		on characters in Chapter 3. Assuming the function OutChar will print a single ASCII character, the following example will
		print the lower case alphabet:
void Alphabet(void){ unsigned char mych;
			   for(mych='a';mych<='z';mych++){
			      OutChar(mych);}     /* Print next letter */
			}
Quotation marks are used to specify string literals. For more information about string literals see the section on strings in Chapter 3. Example
unsigned char Name[12]; /* Place for 11 characters and termination*/
			void InitName(void){ 
			   Name="Hello World";
			}
The command Letter='A'; places the ASCII code (65) into the variable Letter. The command pt="A"; creates an ASCII string and places a pointer to it into the variable
		pt. 
Braces {} are used throughout C programs. The most common application is for creating a compound statement. Each open brace { must be matched with a closing brace }. One approach that helps to match up braces is to use indenting. Each time an open brace is used, the source code is tabbed over. In this way, it is easy to see at a glance the brace pairs. Examples of this approach to tabbing are the Bit_Put function within Listing 2-2 and the median function in Listing 1-4.
Square brackets enclose array dimensions (in declarations) and subscripts (in expressions). Thus,
short Fifo[100];
declares an integer array named Fifo consisting of 80 words numbered from 0 through 99, and
PutPt = &Fifo[0];
assigns the variable PutPt to the address of the first entry of the array.
Parentheses enclose argument lists that are associated with function declarations and calls. They are required even if there are no arguments.
As with all programming languages, C uses parentheses to control the order in which expressions are evaluated. Thus, (11+3)/2 yields 7, whereas 11+3/2 yields 12. Parentheses are very important when writing expressions.
The special characters used as expression operators are covered in the operator section in chapter 5. There are many operators, some of which are single characters
~  !  @  %  ^  &  *  -  +  =  |  /  :  ?  <  > ,
while others require two characters
++  --  <<  >>  <=  +=  -=  *=  /=  ==  |=  %=  &=  ^=  ||  &&  !=
and some even require three characters
<<=  >>=
The multiple-character operators can not have white spaces or comments between the characters.
The C syntax can be confusing to the beginning programmer. For example
z=x+y;   /* sets z equal to the sum of x and y */
			z=x_y;   /* sets z equal to the value of x_y */
Go to Chapter 3 on Literals Return to Table of Contents