Lexical Analysis

Lexical Analysis is the first step carried out during compilation. It involves breaking code into tokens and identifying their type, removing white-spaces and comments, and identifying any errors. The tokens are subsequently passed to a syntax analyser before heading to the pre-processor. This page provides a sample lexical analyser that identifies four types of tokens: numbers, text, operators and syntax and ignores white-spaces.

Input

Output

Description

While scanning a string, the lexical analyser keeps track of changes in state (number, text, operator or syntax). It keeps building up a character stack as long as the state remains the same. When the state changes, the pre-existing character stack is pushed into the value stack and it's type is pushed into the type stack. A new character stack is built up with the new state in mind. This process goes on until the entire string is scanned. A token consists of a (value, type) pair, and the lexical analyser outputs a list of these tokens. It is highly suggested that you view lexical_analysis.js (the code that runs lexical_analysis here).

Note:

The string data type can be scanned by triggering the string state when double-quotes (") are encountered. The character stack of the string is built up until double-quotes (") are encountered again.
The syntax analyser looks for unpaired containers, adjacent neighbours that make no sense (like two number tokens next to each other) and operations that are not defined on the operands (like division of a string).
The preprocessor views the text tokens and tries to associate pre-defined variables or functions with them. If none can be found, then it throws an error. If an unknown variable is to the left side of an assignment operator (=), then a new variable is created and assigned with the value on the right.

Developed by ChanRT | Fork me at GitHub