Skip to content

danaie/DanaCompiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

235 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DanaCompiler

DanaCompiler is a compiler for the Dana programming language. It compiles a Dana source file into an intermediate representation (.imm), generates final assembly (.asm), and produces an executable binary.

This project was developed as part of the ECE NTUA "Compilers" course.

Authors

Requirements

To build and run the Dana compiler, the following dependencies must be installed:

  • clang (built with LLVM 18)
  • llvm-config-18 (version 18.1.3)
  • make (GNU Make for building the project)
  • flex and bison (for lexical and syntax analysis)
  • Standard Unix utilities (rm, mv, chmod, etc.)

Project Structure

DanaCompiler/
├── ast/                    ← Semantic analysis, type checking, and AST generation
├── compiler/               ← Build directory, testing scripts, and final compiler executable
├── danaprograms/           ← Collection of Dana test programs
│   ├── benchmark/              ← Computationally heavy programs for testing optimization and speed
│   ├── dana-programs/          ← Valid Dana programs to verify correct compilation
│   └── programs-erroneous/     ← Intentionally invalid programs to test compiler error handling
├── documentation/          ← Language specifications and Dana documentation as provided
├── lexer/                  ← Lexical analysis rules (Flex)
├── parser/                 ← Syntax analysis rules (Bison)
└── README.md

Building

First, clone the repository and then, navigate to the compiler directory to build the project:

cd compiler
make
  • make (run in compiler/) builds the final compiler executable (default target).
  • make distclean removes generated files and the compiler executable.
  • make debug for debugging purposes, prints logs during compiler construction.

Runtime Library

To compile Dana programs successfully, Runtime Library is necessary and it is provided, ready for use, as libdana.a in compiler directory.

You could also build it again using the file stdlib_runtime.cpp like this:

cd compiler
clang -c stdlib_runtime.cpp -o stdlib.o
ar rcs libdana.a stdlib.o

Usage

After building, run the compiler:

cd compiler
./danac [flags] source.dana

Compiler Usage

Running ./danac without arguments or with the help flag will display the basic usage instructions:

./danac <input-file.dana> [-o output-file] [-O0|-O1|-O2]
./danac -i [-O0|-O1|-O2]      # interactive stdin, print imm to stdout
./danac -f [-O0|-O1|-O2]      # interactive stdin, print asm to stdout
./danac -h | --help           # show this help message

Flags Desciption

Flag Description
None (Default) Reads source from the given file (e.g., hello.dana). Produces an intermediate file (.imm) and a final assembly file (.asm) in the same directory as the source. Example: /tmp/hello.dana -> /tmp/hello.imm and /tmp/hello.asm.
-o <file> Writes the produced executable to <file>. If omitted, the executable is written to ./a.out in the current working directory.
-O0, -O1, -O2 Optimization flags to control the level of performance optimization (optional).
-f Interactive mode: reads source code directly from stdin and outputs the final compiled code to stdout.
-i Interactive mode: reads source code directly from stdin and outputs the intermediate representation (IR) code to stdout.
-h, --help Displays the help message and usage instructions.

The compiler exits with status 0 on success and non‑zero on error.

Expected Errors

The compiler is designed to stop compilation immediately (exit(1)) upon encountering invalid code. It provides detailed, color-coded error messages that include the exact line and column numbers.

  • Lexical Errors: Caught during the Lexer phase and include errors like encountering unrecognized characters, invalid symbols, etc. The output specifies the exact line and column, along with the offending token.
  • Syntax Errors: Caught by the parser (missing semicolons, mismatched parentheses, unexpected tokens, etc). The output provides the starting and ending location range of the erroneous expression and details the unexpected token.
  • Semantic Errors: Caught during sematic analysis, type checking and Abstract Syntax Tree (AST) generation, including errors like type mismatches, using undeclared variables, invalid function calls, duplicate names, etc. The output provides the location range and a message explaining the rule violation.

Optimization

The compiler implements two levels of optimization, focusing on intermediate code optimization (using data flow and control flow analysis) and final code optimization (including register allocation). You can control these passes using the -O1 and -O2 flags.

Level 1: Intermediate Code Optimization (-O1)

This level performs optimization on the intermediate code by analyzing data and control flow. It runs on a per-function basis using the FunctionPassManager. The following LLVM passes are applied:

  • llvm::createPromoteMemoryToRegisterPass(): Critical pass that promotes memory allocations to CPU registers.
  • llvm::createEarlyCSEPass(): Performs early Common Subexpression Elimination.
  • llvm::createSROAPass(): Scalar Replacement of Aggregates. Takes variables allocated in memory and promotes them to live in fast CPU registers instead.
  • llvm::createInstructionCombiningPass(): A "peephole" optimizer that looks for simple patterns it can combine or reduce.
  • llvm::createReassociatePass(): Reorders associative mathematical expressions to allow for better constant propagation.
  • llvm::createGVNPass(): Global Value Numbering. It analyzes the code to find redundant calculations or repeated memory loads and eliminates them.
  • llvm::createCFGSimplificationPass(): Simplifies the Control Flow Graph by removing dead blocks and merging basic blocks.
  • llvm::createTailCallEliminationPass(): Optimizes recursive function calls and control flow.

Note: Tail Call Elimination Pass did not procuce the expected results in experiments. We included for further research.

Level 2: Final Code Optimization & Register Allocation (-O2)

This level includes all -O1 passes and adds a second layer of module-wide optimizations using the legacy::PassManager. It handles final code optimization and register allocation during object file emission.

  • Re-runs several core passes at the module level for further reduction:
    • llvm::createSROAPass()
    • llvm::createInstructionCombiningPass() (algebraic simplifications and bit-twiddling)
    • llvm::createReassociatePass() (shifts constants together for compile-time evaluation)
    • llvm::createGVNPass() (eliminates redundant calculations and memory loads)
    • llvm::createCFGSimplificationPass() (cleans up the CFG)
  • Register Allocation: Handled automatically during the object file emission phase via TargetMachine::addPassesToEmitFile.

Note on LLVM 18 Compatibility: The llvm::createAggressiveDCEPass() (Aggressive Dead Code Elimination) is NOT supported by LLVM 18 and has been intentionally omitted from the Level 2 optimization pipeline.

Testing

Functional Testing

To verify that the compiler correctly handles standard, valid code, we run the testing script. This script automatically compiles and executes all the Dana programs located inside the danaprograms/dana-programs/ directory with their given input, and also compares the result to the correct.

cd compiler
./testing

Error Handling Testing

To ensure the compiler catches invalid syntax and semantic errors properly, we run the testing-err script. This executes the intentionally broken programs in the danaprograms/programs-erroneous/ directory and verifies that the compiler fails gracefully with a non-zero exit code.

cd compiler
./testing-err

Benchmark and Optimization Testing

To evaluate the compiler's performance when using the optimization flags -O1 and -O2, we provide the benchmark script. This script runs the computationally intensive programs located in the danaprograms/benchmark/ directory and logs the compilation and execution times. Each benchmark target a specific llvm optimization.

cd compiler
./testing_opt_time

Expected Optimization Benchmark Output

When running the ./test_opt_time script, you should expect output demonstrating the performance gains across different optimization levels, similar to the following.

As demonstrated in the results down below, enabling Level 2 optimization (-O2) produces significant performance improvements. Especially in computationally heavy and redundant scenarios—like CombinedHeavy and Redundancy, the execution time is reduced drastically, running up to 8.5x faster compared to the unoptimized -O0 baseline. While -O1 focuses primarily on essential register allocation, the aggressive, module-level analytical passes in -O2 successfully strip away dead code, eliminate repeated instructions, and dramatically minimize total runtime.

======================================
Testing Algebraic
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 399484705 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 352999860 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 141395618 ns
  🚀 -O1 faster than -O0 (1.13x)
  🚀 -O2 faster than -O0 (2.83x)
  🚀 -O2 faster than -O1 (2.50x)

======================================
Testing ArithmeticHeavyLoop
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 1089183990 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 1032447681 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 298507511 ns
  🚀 -O1 faster than -O0 (1.05x)
  🚀 -O2 faster than -O0 (3.65x)
  🚀 -O2 faster than -O1 (3.46x)

======================================
Testing CombinedHeavy
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 1793638755 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 1658566736 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 209557643 ns
  🚀 -O1 faster than -O0 (1.08x)
  🚀 -O2 faster than -O0 (8.56x)
  🚀 -O2 faster than -O1 (7.91x)

======================================
Testing CommonSubElim
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 528556340 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 535314256 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 97831165 ns
  ⚠️ -O1 slower than -O0 (1.01x)
  🚀 -O2 faster than -O0 (5.40x)
  🚀 -O2 faster than -O1 (5.47x)

======================================
Testing DeadCodeElim
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 1119590349 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 1167908437 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 215565992 ns
  ⚠️ -O1 slower than -O0 (1.04x)
  🚀 -O2 faster than -O0 (5.19x)
  🚀 -O2 faster than -O1 (5.42x)

======================================
Testing InstrComb
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 1073919257 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 1075144276 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 178578798 ns
  ⚠️ -O1 slower than -O0 (1.00x)
  🚀 -O2 faster than -O0 (6.01x)
  🚀 -O2 faster than -O1 (6.02x)

======================================
Testing LargeIntSum
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 2456712006 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 2506200759 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 506449252 ns
  ⚠️ -O1 slower than -O0 (1.02x)
  🚀 -O2 faster than -O0 (4.85x)
  🚀 -O2 faster than -O1 (4.95x)

======================================
Testing Reassociate
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 1150398057 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 1204957542 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 220551538 ns
  ⚠️ -O1 slower than -O0 (1.05x)
  🚀 -O2 faster than -O0 (5.22x)
  🚀 -O2 faster than -O1 (5.46x)

======================================
Testing Redundancy
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 405572145 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 261516911 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 47207253 ns
  🚀 -O1 faster than -O0 (1.55x)
  🚀 -O2 faster than -O0 (8.59x)
  🚀 -O2 faster than -O1 (5.54x)

======================================
Testing SROA
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 1375834407 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 1273668433 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 222846641 ns
  🚀 -O1 faster than -O0 (1.08x)
  🚀 -O2 faster than -O0 (6.17x)
  🚀 -O2 faster than -O1 (5.72x)

======================================
Testing TailCallElim
======================================
  -> Compiling with -O0
    ⏱ Average time (-O0): 1404915119 ns
  -> Compiling with -O1
    ⏱ Average time (-O1): 1419327424 ns
  -> Compiling with -O2
    ⏱ Average time (-O2): 1339895799 ns
  ⚠️ -O1 slower than -O0 (1.01x)
  🚀 -O2 faster than -O0 (1.05x)
  🚀 -O2 faster than -O1 (1.06x)

About

DanaCompiler is a compiler for the Dana programming language. This project was developed as part of the ECE NTUA "Compilers" course.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors