Compilers
Overview
The compiler must preserve the meaning of the program being compiled.
The compiler must improve the input program in some discernible way.
- Compiler: translates a program written in one programming language into a program written in another programming language
- Has front and back end
- Frontend deals with with source language, while the backend deals with the target language
- Both are connected using (often more than one) intermediate representation (IR), which is independent from either language
- Between those two, there is often one or many optimization phases
- An optimizer can have different goals:
- Make the program faster
- Make the program smaller
- Make the program use less energy
- Since front and back end are decoupled, it's easy to write different back ends for different target machines
- A compiler that translates between two programming languages, instead of machine code, is called source-to-source translator
- An interpreter produces as output the result of executing the program, instead of machine code
- Some languages use both a compiler and an interpreter: Java
- Java is compiled into bytecode, which is a compact description of the program
- The bytecode is then interpreted by the JVM (Java Virtual Machine)
- But the JVM also includes a just-in-time (JIT) compiler, which is a runtime compiler
Why study compiler construction?
- A compiler applies many concepts of computer science and software engineering
- Greedy algorithms (register allocation)
- Heuristic search techniques (list scheduling)
- Graph algorithms (dead-code elimination)
- Dynamic programming (instruction selection)
- Finite automata and push-down automata (scanning and parsing)
- Fixed-point algortihms (data-flow analysis)
- It deals with problems such as dynamic allocation, synchronization, naming, locality, memory hierarchy management and pipeline scheduling
Scanner
- Scanner: Scan a source program and break it up into small, meaningful units, called tokens
- Scanner and Lexer are used interchangeable
position := initial + rate * 60;
// Tokens
id position
assign :=
id initial
op +
id rate
op *
int 60
semicolon
- Tokens: identifiers, constants, operators, \(\dots\)
- Also: Removal of comments/whitespace, preperation of output listing (line numbers, correspondance of error messages and line numbers), communication with symbol table
Tokens, Lexemes, Patterns
- Token: classification of entities of a program
- Lexeme: specific instance of a token
- Example:
position
andinstance
are both identifiers, but different lexeme
- Example:
- Scanner has to keep track of "attributes" for each token:
- Identifiers: string
- Numbers: value
- These attributes are used during semantic analysis and code generation
- Patterns: Rules describing how tokens look like
- Are needed because the source language may contain infinite possible strings
- These patterns are specified usinga meta-language called regular expressions
References
- Engineering a Compiler (2nd edition) by K. Cooper and L. Torczon
- Crafting A Compiler by Fischer et al.
- Compilers timeline
Last update: January 2, 2021