raid-ppcoe · ElfredSeow · Mar 3, 2026 · Mar 3, 2026 · Mar 3, 2026
diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md
@@ -0,0 +1,126 @@
+# AeroScribe: Comprehensive System Documentation
+
+## 1. Project Overview
+AeroScribe (formerly ATC AI Assist System) is a real-time, offline, and CPU-compatible decision-support layer for Air Traffic Control. This system converts radio speech into structured ATC events, maintains an active operational state of aircraft and ground vehicles, and automatically detects conflicts (e.g., runway incursions) and emergency escalations.
+
+It is designed to be highly resilient, running completely locally without reliance on paid cloud APIs or GPUs.
+
+---
+
+## 2. Core Components & Architecture
+
+### 2.1 Audio Processing (`audio/`)
+This module handles the ingestion and transcription of audio.
+- **`speech_listener.py`**: Handles microphone input or `.wav` file ingestion. It relies on the `sounddevice` library for robust, cross-platform audio capturing, streaming audio in configurable chunks.
+- **`stt_engine.py`**: Employs `faster-whisper` for fast, CPU-only local transcription of audio chunks sent by the listener.
+- **`atc_parser.py`**: Provides initial regex and keyword-based parsing logic as a fallback or heuristic extractor for specific entities, though the heavy lifting is handled by the LLM.
+
+### 2.2 LLM Processing Agent (`agent/`)
+- **`llm_processor.py`**: Utilizes a highly efficient, CPU-friendly Hugging Face model (`Qwen/Qwen2.5-0.5B-Instruct`) to interpret transcripts.
+  - Takes the raw STT text and output JSON mapping the entity state (e.g., aircraft ID, destination runway, intent).
+  - Handles the complex logic of mapping fuzzy, phonetically inaccurate transcription text to strict operational schemas based on the current airport layout state.
+
+### 2.3 Operational State Management (`state/`)
+This module acts as the source of truth for all entities on the airfield.
+- **`aircraft_state.py`**: Tracks the dynamic state of airborne or taxiing aircraft, updating properties like current segment, destination, and clearance state based on processed LLM events.
+- **`ground_state.py`**: Manages ground vehicles (e.g., tugs, fire tenders, ambulances) on platforms and taxiways.
+- **`event_store.py`**: Provides event-sourced logging functionality. All transcripts, state snapshots, parsed events, and alerts are durably appended to JSONL log files (`logs/events.jsonl`, `logs/alerts.jsonl`) for replayability and audit trails.
+
+### 2.4 Detection & Alerting (`detection/`)
+- **`conflict_detection.py`**: Continuously monitors the state engines to detect unsafe conditions using deterministic rules:
+  - Unauthorized runway incursions
+  - Multiple entities cleared on the same runway
+  - Taxiway segment overlaps
+  - Unapproved movements or clearance violations
+- **`emergency_detection.py`**: Monitors for emergency flags raised by the LLM (based on phrases like "mayday", "fire", or "engine failure") and immediate scales alerts.
+
+### 2.5 Live Dashboard (`dashboard/`)
+- **`server.py`**: A FastAPI application that provides the WebSocket server.
+- **`templates/dashboard.html`**: The vanilla HTML/JS/CSS frontend. It connects to the WebSocket to consume and display `transcript`, `state`, and `alert` events in real-time. It features a responsive, glassmorphic UI.
+
+### 2.6 Simulation Engine (`simulation/`)
+- **`radio_simulator.py`**: A powerful testing harness that injects pre-scripted radio calls into the pipeline without needing a live microphone. It supports:
+  - **Normal Mode**: Standard arrival and departure flows.
+  - **Emergency Mode**: Tests extreme edge cases, simulating a rejected takeoff, engine fire, MAYDAY calls, and the dispatch of emergency response vehicles.
+
+---
+
+## 3. Technology Stack
+- **Backend Framework**: FastAPI (Uvicorn, WebSockets)
+- **Audio Capturing**: `sounddevice`
+- **Transcription**: `faster-whisper`
+- **Natural Language Understanding**: `transformers` (`Qwen/Qwen2.5-0.5B-Instruct` via pipeline)
+- **Frontend**: Vanilla HTML5, CSS3, JavaScript (WebSocket Client)
+- **Testing**: `pytest`
+
+---
+
+## 4. Setup & Installation
+
+The system is designed to run locally on Windows, macOS, or Linux. Python 3.10+ is recommended.
+
+1. **Clone the repository** and navigate to the project directory.
+2. **Create and activate a virtual environment**:
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # On Windows: .\venv\Scripts\Activate
+   ```
+3. **Install Dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+---
+
+## 5. Running the Application
+
+### 5.1 Simulation Mode (Recommended for Development)
+To run the system without a microphone using the built-in scripted scenarios:
+```bash
+python main.py --simulate
+```
+For testing emergency escalation paths:
+```bash
+python main.py --simulate-emergency
+```
+*Access the dashboard at: http://127.0.0.1:8080*
+
+### 5.2 Live Microphone Mode
+To transcribe and process your own live voice commands:
+```bash
+python main.py
+```
+
+### 5.3 Offline WAV File processing
+To run the system on a pre-recorded audio file:
+```bash
+python main.py --demo-wav path/to/audio.wav
+```
+
+---
+
+## 6. Challenges Faced During Development
+
+Building a real-time, local, and resilient AI system introduced several distinct engineering challenges:
+
+### 6.1 Audio Capture & Native Dependencies
+**Challenge**: Initially, the project relied on `pyaudio` for microphone capture. This created massive friction during installation, especially on newer Python versions (3.12+) and Windows machines, where missing C++ build tools or lack of pre-compiled wheels caused installations to fail completely.
+**Solution**: Migrated the audio ingestion pipeline to use `sounddevice`. This abstraction proved significantly more reliable across OS environments and avoided the `pyaudio` compilation nightmares, resulting in a smoother developer and user setup experience.
+
+### 6.2 CPU-Bound Performance Constraints
+**Challenge**: The system needed to perform real-time Speech-to-Text (STT) and Large Language Model (LLM) processing sequentially, strict local execution, without relying on a GPU. Standard Whisper models and 7B+ parameter LLMs took too long to infer, causing severe latency between an ATC command and the UI update.
+**Solution**:
+1. Implemented `faster-whisper` for optimized CTranslate2 CPU execution.
+2. Selected `Qwen/Qwen2.5-0.5B-Instruct`—a highly optimized, sub-1-billion parameter model—for the LLM parser. This allowed the system to parse transcripts into structured JSON rapidly on standard CPU threads, achieving acceptable real-time latency.
+
+### 6.3 STT Hallucinations & Phonetic Errors
+**Challenge**: Working in a noisy acoustic environment (aviation simulation) with smaller STT models leads to frequent phonetic misspellings. Crucial emergency phrases like "MAYDAY" were occasionally transcribed as "maybe", "may day", or "made a". Standard regex parsers would fail to catch these, potentially missing critical emergencies.
+**Solution**: Designed a resilient prompting pipeline for the LLM. Instead of expecting perfect text, the LLM prompt explicitly warns the model about STT phonetic spelling errors. It is instructed to use contextual clues (e.g., words related to fire, rejection, or failure) alongside phonetic similarities to correctly infer intent and raise the `emergency_flag`.
+
+### 6.4 Real-time State Synchronization
+**Challenge**: Managing the concurrency of streaming audio transcription, LLM processing, deterministic rules engine (conflict detection), and the FastAPI web server. Ensuring the UI correctly reflected the state without race conditions or missed events.
+**Solution**: Implemented a decoupled architecture using WebSockets (`broadcast_sync`). The core text processing pipeline modifies an in-memory `AircraftStateEngine` and `GroundStateEngine`. After processing, it synchronously blasts full state snapshots and alerts to all connected WebSocket clients, ensuring the dashboard remains durably consistent with the backend state. Event-sourced logging was also implemented for auditability.
+
+### 6.5 The Scripted Simulator Testing Burden
+**Challenge**: Testing the LLM state modifications manually required repeatedly speaking into the microphone, which was exhausting, inconsistent, and slow for iterative development.
+**Solution**: Built the `radio_simulator.py` component to inject perfect transcripts at timed intervals. This drastically improved the iteration lifecycle and allowed for reliable, reproducible testing of edge-case scenarios like the complex emergency towing sequence.
diff --git a/agent/llm_processor.py b/agent/llm_processor.py
@@ -72,7 +72,10 @@ def process(self, transcript: str, current_state: Dict[str, Any]) -> LLMResponse
 1. Parse the transcript to identify entity (aircraft/vehicle), intent, route, runway, etc.
 2. Compare with `current_state` to find conflicts or emergencies.
 3. Determine clearance (granted or pending).
-4. Output strict JSON matching this schema:
+4. **STT Resilience**: The transcript comes from an STT model and may contain phonetic spelling errors. 
+   - Ensure you infer the closest aviation terminology based on context. 
+   - For example, if you see words like 'maybe', 'may day', or 'made a' in the context of danger (fire, failure, rejecting takeoff), treat it as a `MAYDAY` emergency and set the `emergency_flag`.
+5. Output strict JSON matching this schema:
 {{
   "parsed_event": {{
     "entity_id": "string",

diff --git a/audio/stt_engine.py b/audio/stt_engine.py
@@ -22,7 +22,14 @@ def transcribe(self, audio_data) -> str:
             return ""
 
         try:
-            segments, info = self.model.transcribe(audio_data, beam_size=5, vad_filter=config.VAD_FILTER)
+            # Provide ATC context to bias the STT engine away from general conversational words
+            prompt = "ATC communications. MAYDAY, PAN-PAN, runway, taxiway, clearance, Changi, tower, ground, hold short."
+            segments, info = self.model.transcribe(
+                audio_data, 
+                beam_size=5, 
+                vad_filter=config.VAD_FILTER,
+                initial_prompt=prompt
+            )
             text = " ".join([segment.text for segment in segments])
             return text.strip()
         except Exception as e:

diff --git a/config.py b/config.py
@@ -13,7 +13,7 @@
 ALERTS_LOG_PATH = LOGS_DIR / "alerts.jsonl"
 
 # Audio Settings
-WHISPER_MODEL_SIZE = "tiny.en"  # "tiny.en", "base.en", "small.en" - depending on CPU power
+WHISPER_MODEL_SIZE = "base.en"  # "tiny.en", "base.en", "small.en" - depending on CPU power
 VAD_FILTER = True # Voice activity detection to ignore silence
 CPU_THREADS = 4
 
@@ -23,10 +23,10 @@
         "02L", "20R", "02C", "20C", "02R", "20L"
     ],
     "taxiways": [
-        "Alpha", "Bravo", "Charlie", "Delta", "Echo", "Foxtrot", "Victor"
+        "Alpha", "Bravo", "Victor", "Whiskey", "North Cross", "South Cross"
     ],
     "platforms": [
-        "Platform 1", "Platform 2", "Cargo", "Stand F42", "Terminal 1", "Terminal 2", "Terminal 3"
+        "Terminal 1", "Terminal 2", "Terminal 3", "Terminal 4", "Cargo", "Changi East"
     ]
 }
 

diff --git a/dashboard/templates/dashboard.html b/dashboard/templates/dashboard.html
@@ -901,34 +901,42 @@ <h2 style="color: var(--danger); margin-top: 0; display: flex; align-items: cent
         const MAP_SIZE = 1000;
         const layout = {
             runways: [
-                { id: "02L/20R", x: 300, y1: 150, y2: 850, label: "02L" },
+                { id: "02L/20R", x: 200, y1: 150, y2: 850, label: "02L" },
                 { id: "02C/20C", x: 500, y1: 150, y2: 850, label: "02C" },
-                { id: "02R/20L", x: 700, y1: 150, y2: 850, label: "02R" }
+                { id: "02R/20L", x: 800, y1: 150, y2: 850, label: "02R" }
             ],
             taxiways: [
-                { id: "Alpha", type: "v", x: 230, y1: 150, y2: 850 },
-                { id: "Bravo", type: "v", x: 370, y1: 150, y2: 850 },
-                { id: "Charlie", type: "h", y: 700, x1: 200, x2: 750 },
-                { id: "Delta", type: "h", y: 400, x1: 100, x2: 750 },
-                { id: "Echo", type: "h", y: 250, x1: 200, x2: 750 }
+                { id: "Alpha", type: "v", x: 150, y1: 150, y2: 850 },
+                { id: "Bravo", type: "v", x: 250, y1: 150, y2: 850 },
+                { id: "Victor", type: "v", x: 450, y1: 150, y2: 850 },
+                { id: "Whiskey", type: "v", x: 550, y1: 150, y2: 850 },
+                { id: "North Cross", type: "h", y: 250, x1: 100, x2: 900 },
+                { id: "South Cross", type: "h", y: 750, x1: 100, x2: 900 }
             ],
             areas: [
-                { id: "Stand F42", x: 100, y: 200, w: 80, h: 80 },
-                { id: "Platform 1", x: 100, y: 450, w: 100, h: 100 },
-                { id: "Cargo", x: 750, y: 650, w: 150, h: 150 }
+                { id: "Terminal 1", x: 280, y: 300, w: 140, h: 80 },
+                { id: "Terminal 2", x: 280, y: 400, w: 140, h: 80 },
+                { id: "Terminal 3", x: 280, y: 500, w: 140, h: 80 },
+                { id: "Terminal 4", x: 350, y: 880, w: 150, h: 80 },
+                { id: "Cargo", x: 850, y: 450, w: 120, h: 200 },
+                { id: "Changi East", x: 850, y: 700, w: 120, h: 150 }
             ],
             nodes: {
-                "02L": { x: 300, y: 800 }, "20R": { x: 300, y: 200 },
+                "02L": { x: 200, y: 800 }, "20R": { x: 200, y: 200 },
                 "02C": { x: 500, y: 800 }, "20C": { x: 500, y: 200 },
-                "02R": { x: 700, y: 800 }, "20L": { x: 700, y: 200 },
-                "Alpha": { x: 230, y: 500 },
-                "Bravo": { x: 370, y: 500 },
-                "Charlie": { x: 500, y: 700 },
-                "Delta": { x: 230, y: 400 },
-                "Echo": { x: 300, y: 250 },
-                "Stand F42": { x: 140, y: 240 },
-                "Platform 1": { x: 150, y: 500 },
-                "Cargo": { x: 800, y: 700 },
+                "02R": { x: 800, y: 800 }, "20L": { x: 800, y: 200 },
+                "Alpha": { x: 150, y: 500 },
+                "Bravo": { x: 250, y: 500 },
+                "Victor": { x: 450, y: 500 },
+                "Whiskey": { x: 550, y: 500 },
+                "North Cross": { x: 500, y: 250 },
+                "South Cross": { x: 500, y: 750 },
+                "Terminal 1": { x: 350, y: 340 },
+                "Terminal 2": { x: 350, y: 440 },
+                "Terminal 3": { x: 350, y: 540 },
+                "Terminal 4": { x: 425, y: 860 },
+                "Cargo": { x: 830, y: 550 },
+                "Changi East": { x: 830, y: 750 },
                 "airborne": { x: 500, y: 50 }
             }
         };

diff --git a/main.py b/main.py
@@ -3,6 +3,7 @@
 import logging
 import uvicorn
 import asyncio
+from fastapi import FastAPI
 from contextlib import asynccontextmanager
 import threading
 
@@ -73,11 +74,12 @@ async def lifespan(app: FastAPI):
     logger.info("Initializing ATC AI Assist Core...")
     import dashboard.server
     dashboard.server._main_loop = asyncio.get_running_loop()
-    if app.state.simulate:
+    if app.state.simulate or app.state.simulate_emergency:
         from simulation.radio_simulator import RadioSimulator
-        app.state.simulator = RadioSimulator(process_text_transcript, delay_between_calls=1.5)
+        sim_mode = "emergency" if app.state.simulate_emergency else "normal"
+        app.state.simulator = RadioSimulator(process_text_transcript, mode=sim_mode, delay_between_calls=1.5)
         app.state.simulator.start()
-        logger.info("Running in SIMULATION mode.")
+        logger.info(f"Running in SIMULATION mode ({sim_mode}).")
     else:
         # Load heavy ML models
         from audio.stt_engine import STTEngine
@@ -103,7 +105,7 @@ def audio_callback(np_data):
 
     # Shutdown Phase
     logger.info("Shutting down ATC AI Assist Core...")
-    if app.state.simulate:
+    if app.state.simulate or app.state.simulate_emergency:
         app.state.simulator.stop()
     elif hasattr(app.state, 'listener') and app.state.listener:
         app.state.listener.stop()
@@ -112,15 +114,21 @@ def audio_callback(np_data):
 
 if __name__ == "__main__":
     arg_parser = argparse.ArgumentParser(description="ATC AI Assist System")
-    arg_parser.add_argument("--simulate", action="store_true", help="Run with scripted simulator instead of audio.")
+    arg_parser.add_argument("--simulate", action="store_true", help="Run with normal scripted simulator instead of audio.")
+    arg_parser.add_argument("--simulate-emergency", action="store_true", help="Run with emergency scripted simulator (includes fire/medical/tow response).")
     arg_parser.add_argument("--demo-wav", type=str, help="Run offline using a specified .wav file.")
     args = arg_parser.parse_args()
 
     app.state.simulate = args.simulate
+    app.state.simulate_emergency = args.simulate_emergency
     app.state.demo_wav = args.demo_wav
 
-    if args.demo_wav and args.simulate:
-         logger.error("Cannot use --simulate and --demo-wav at the same time.")
+    if args.demo_wav and (args.simulate or args.simulate_emergency):
+         logger.error("Cannot use simulation arguments and --demo-wav at the same time.")
+         sys.exit(1)
+
+    if args.simulate and args.simulate_emergency:
+         logger.error("Cannot use both --simulate and --simulate-emergency at the same time.")
          sys.exit(1)
 
     logger.info(f"Starting API Server on http://{config.HOST}:{config.PORT}")