(i) Input to the Code Generator
The input to the code generator is the Intermediate Representation (IR) of the source program. This IR is generated by the front end of the compiler and comes along with a symbol table that helps determine memory addresses for variables and constants during code generation.
What does the input contain?
- Intermediate Code (IR):
- Examples:
- Three-address code (TAC) – e.g.,
t1 = a + b
- Quadruples, Triples, Indirect Triples
- Postfix notation
- Syntax trees or DAGs
- Bytecode or stack machine code
- Three-address code (TAC) – e.g.,
- Examples:
- Symbol Table Info:
- Contains mappings of variable names to memory locations, data types, and scope.
Assumptions:
- The input program is already lexically, syntactically, and semantically correct.
- Type checking is done.
- Any necessary type conversions are inserted.
- The IR uses values that are directly mappable to the target machine (e.g., integers, floats).
Purpose:
To allow the code generator to produce machine code or assembly cleanly, without worrying about high-level language rules or errors.
ii) The Target Program
The target program is the final output of the code generator, written in the machine or assembly language of the target architecture. It is ready for execution, either directly or after further steps like linking.
Common Target Architectures:
- RISC (Reduced Instruction Set Computer):
- Features: Many registers, 3-address instructions, simple and fast execution.
- Easy for compilers to optimize.
- CISC (Complex Instruction Set Computer):
- Features: Fewer registers, 2-address instructions, complex addressing modes.
- Harder for compilers to generate optimal code.
- Stack-Based Machines:
- Uses a stack for operand storage.
- JVM (Java Virtual Machine) is a modern example.
- Popular due to platform independence.
Forms of the Target Program:
- Absolute Machine Code
- Directly executable, stored at a fixed memory location.
- Fast compilation and execution.
- Less flexible.
- Relocatable Object Code
- Compiled as modules that can be linked later.
- Supports modular compilation and separate linking.
- Requires a linker and loader.
- Assembly Code
- Human-readable symbolic code.
- Easier for debugging and understanding.
- Needs an assembler to convert to machine code.
Performance Considerations:
- JIT (Just-In-Time) compilers translate Java bytecode to native code at runtime to speed up execution.
- Native code generators skip bytecode generation and compile Java source straight to machine code for performance.