# Compilers

## Overview

The compiler must preserve the meaning of the program being compiled.

The compiler must improve the input program in some discernible way.

• Compiler: translates a program written in one programming language into a program written in another programming language
• Has front and back end

• Frontend deals with with source language, while the backend deals with the target language
• Both are connected using (often more than one) intermediate representation (IR), which is independent from either language
• Between those two, there is often one or many optimization phases
• An optimizer can have different goals:
• Make the program faster
• Make the program smaller
• Make the program use less energy
• Since front and back end are decoupled, it's easy to write different back ends for different target machines
• A compiler that translates between two programming languages, instead of machine code, is called source-to-source translator
• An interpreter produces as output the result of executing the program, instead of machine code
• Some languages use both a compiler and an interpreter: Java
• Java is compiled into bytecode, which is a compact description of the program
• The bytecode is then interpreted by the JVM (Java Virtual Machine)
• But the JVM also includes a just-in-time (JIT) compiler, which is a runtime compiler

### Why study compiler construction?

• A compiler applies many concepts of computer science and software engineering
• Greedy algorithms (register allocation)
• Heuristic search techniques (list scheduling)
• Dynamic programming (instruction selection)
• Finite automata and push-down automata (scanning and parsing)
• Fixed-point algortihms (data-flow analysis)
• It deals with problems such as dynamic allocation, synchronization, naming, locality, memory hierarchy management and pipeline scheduling

## Scanner

• Scanner: Scan a source program and break it up into small, meaningful units, called tokens
• Scanner and Lexer are used interchangeable
position := initial + rate * 60;

// Tokens
id position
assign :=
id initial
op +
id rate
op *
int 60
semicolon

• Tokens: identifiers, constants, operators, $$\dots$$
• Also: Removal of comments/whitespace, preperation of output listing (line numbers, correspondance of error messages and line numbers), communication with symbol table

### Tokens, Lexemes, Patterns

• Token: classification of entities of a program
• Lexeme: specific instance of a token
• Example: position and instance are both identifiers, but different lexeme
• Scanner has to keep track of "attributes" for each token:
• Identifiers: string
• Numbers: value
• These attributes are used during semantic analysis and code generation
• Patterns: Rules describing how tokens look like
• Are needed because the source language may contain infinite possible strings
• These patterns are specified usinga meta-language called regular expressions

## References

Last update: January 2, 2021