Skip to content



The compiler must preserve the meaning of the program being compiled.

The compiler must improve the input program in some discernible way.

  • Compiler: translates a program written in one programming language into a program written in another programming language
  • Has front and back end

  • Frontend deals with with source language, while the backend deals with the target language
  • Both are connected using (often more than one) intermediate representation (IR), which is independent from either language
  • Between those two, there is often one or many optimization phases
  • An optimizer can have different goals:
    • Make the program faster
    • Make the program smaller
    • Make the program use less energy
  • Since front and back end are decoupled, it's easy to write different back ends for different target machines
  • A compiler that translates between two programming languages, instead of machine code, is called source-to-source translator
  • An interpreter produces as output the result of executing the program, instead of machine code
  • Some languages use both a compiler and an interpreter: Java
    • Java is compiled into bytecode, which is a compact description of the program
    • The bytecode is then interpreted by the JVM (Java Virtual Machine)
    • But the JVM also includes a just-in-time (JIT) compiler, which is a runtime compiler

Why study compiler construction?

  • A compiler applies many concepts of computer science and software engineering
    • Greedy algorithms (register allocation)
    • Heuristic search techniques (list scheduling)
    • Graph algorithms (dead-code elimination)
    • Dynamic programming (instruction selection)
    • Finite automata and push-down automata (scanning and parsing)
    • Fixed-point algortihms (data-flow analysis)
  • It deals with problems such as dynamic allocation, synchronization, naming, locality, memory hierarchy management and pipeline scheduling


  • Scanner: Scan a source program and break it up into small, meaningful units, called tokens
  • Scanner and Lexer are used interchangeable
position := initial + rate * 60;

// Tokens
id position
assign :=
id initial
op +
id rate
op *
int 60
  • Tokens: identifiers, constants, operators, \(\dots\)
  • Also: Removal of comments/whitespace, preperation of output listing (line numbers, correspondance of error messages and line numbers), communication with symbol table

Tokens, Lexemes, Patterns

  • Token: classification of entities of a program
  • Lexeme: specific instance of a token
    • Example: position and instance are both identifiers, but different lexeme
  • Scanner has to keep track of "attributes" for each token:
    • Identifiers: string
    • Numbers: value
  • These attributes are used during semantic analysis and code generation
  • Patterns: Rules describing how tokens look like
    • Are needed because the source language may contain infinite possible strings
  • These patterns are specified usinga meta-language called regular expressions


Last update: January 2, 2021