diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md new file mode 100644 index 0000000..c94ac56 --- /dev/null +++ b/DOCUMENTATION.md @@ -0,0 +1,126 @@ +# AeroScribe: Comprehensive System Documentation + +## 1. Project Overview +AeroScribe (formerly ATC AI Assist System) is a real-time, offline, and CPU-compatible decision-support layer for Air Traffic Control. This system converts radio speech into structured ATC events, maintains an active operational state of aircraft and ground vehicles, and automatically detects conflicts (e.g., runway incursions) and emergency escalations. + +It is designed to be highly resilient, running completely locally without reliance on paid cloud APIs or GPUs. + +--- + +## 2. Core Components & Architecture + +### 2.1 Audio Processing (`audio/`) +This module handles the ingestion and transcription of audio. +- **`speech_listener.py`**: Handles microphone input or `.wav` file ingestion. It relies on the `sounddevice` library for robust, cross-platform audio capturing, streaming audio in configurable chunks. +- **`stt_engine.py`**: Employs `faster-whisper` for fast, CPU-only local transcription of audio chunks sent by the listener. +- **`atc_parser.py`**: Provides initial regex and keyword-based parsing logic as a fallback or heuristic extractor for specific entities, though the heavy lifting is handled by the LLM. + +### 2.2 LLM Processing Agent (`agent/`) +- **`llm_processor.py`**: Utilizes a highly efficient, CPU-friendly Hugging Face model (`Qwen/Qwen2.5-0.5B-Instruct`) to interpret transcripts. + - Takes the raw STT text and output JSON mapping the entity state (e.g., aircraft ID, destination runway, intent). + - Handles the complex logic of mapping fuzzy, phonetically inaccurate transcription text to strict operational schemas based on the current airport layout state. + +### 2.3 Operational State Management (`state/`) +This module acts as the source of truth for all entities on the airfield. +- **`aircraft_state.py`**: Tracks the dynamic state of airborne or taxiing aircraft, updating properties like current segment, destination, and clearance state based on processed LLM events. +- **`ground_state.py`**: Manages ground vehicles (e.g., tugs, fire tenders, ambulances) on platforms and taxiways. +- **`event_store.py`**: Provides event-sourced logging functionality. All transcripts, state snapshots, parsed events, and alerts are durably appended to JSONL log files (`logs/events.jsonl`, `logs/alerts.jsonl`) for replayability and audit trails. + +### 2.4 Detection & Alerting (`detection/`) +- **`conflict_detection.py`**: Continuously monitors the state engines to detect unsafe conditions using deterministic rules: + - Unauthorized runway incursions + - Multiple entities cleared on the same runway + - Taxiway segment overlaps + - Unapproved movements or clearance violations +- **`emergency_detection.py`**: Monitors for emergency flags raised by the LLM (based on phrases like "mayday", "fire", or "engine failure") and immediate scales alerts. + +### 2.5 Live Dashboard (`dashboard/`) +- **`server.py`**: A FastAPI application that provides the WebSocket server. +- **`templates/dashboard.html`**: The vanilla HTML/JS/CSS frontend. It connects to the WebSocket to consume and display `transcript`, `state`, and `alert` events in real-time. It features a responsive, glassmorphic UI. + +### 2.6 Simulation Engine (`simulation/`) +- **`radio_simulator.py`**: A powerful testing harness that injects pre-scripted radio calls into the pipeline without needing a live microphone. It supports: + - **Normal Mode**: Standard arrival and departure flows. + - **Emergency Mode**: Tests extreme edge cases, simulating a rejected takeoff, engine fire, MAYDAY calls, and the dispatch of emergency response vehicles. + +--- + +## 3. Technology Stack +- **Backend Framework**: FastAPI (Uvicorn, WebSockets) +- **Audio Capturing**: `sounddevice` +- **Transcription**: `faster-whisper` +- **Natural Language Understanding**: `transformers` (`Qwen/Qwen2.5-0.5B-Instruct` via pipeline) +- **Frontend**: Vanilla HTML5, CSS3, JavaScript (WebSocket Client) +- **Testing**: `pytest` + +--- + +## 4. Setup & Installation + +The system is designed to run locally on Windows, macOS, or Linux. Python 3.10+ is recommended. + +1. **Clone the repository** and navigate to the project directory. +2. **Create and activate a virtual environment**: + ```bash + python -m venv venv + source venv/bin/activate # On Windows: .\venv\Scripts\Activate + ``` +3. **Install Dependencies**: + ```bash + pip install -r requirements.txt + ``` + +--- + +## 5. Running the Application + +### 5.1 Simulation Mode (Recommended for Development) +To run the system without a microphone using the built-in scripted scenarios: +```bash +python main.py --simulate +``` +For testing emergency escalation paths: +```bash +python main.py --simulate-emergency +``` +*Access the dashboard at: http://127.0.0.1:8080* + +### 5.2 Live Microphone Mode +To transcribe and process your own live voice commands: +```bash +python main.py +``` + +### 5.3 Offline WAV File processing +To run the system on a pre-recorded audio file: +```bash +python main.py --demo-wav path/to/audio.wav +``` + +--- + +## 6. Challenges Faced During Development + +Building a real-time, local, and resilient AI system introduced several distinct engineering challenges: + +### 6.1 Audio Capture & Native Dependencies +**Challenge**: Initially, the project relied on `pyaudio` for microphone capture. This created massive friction during installation, especially on newer Python versions (3.12+) and Windows machines, where missing C++ build tools or lack of pre-compiled wheels caused installations to fail completely. +**Solution**: Migrated the audio ingestion pipeline to use `sounddevice`. This abstraction proved significantly more reliable across OS environments and avoided the `pyaudio` compilation nightmares, resulting in a smoother developer and user setup experience. + +### 6.2 CPU-Bound Performance Constraints +**Challenge**: The system needed to perform real-time Speech-to-Text (STT) and Large Language Model (LLM) processing sequentially, strict local execution, without relying on a GPU. Standard Whisper models and 7B+ parameter LLMs took too long to infer, causing severe latency between an ATC command and the UI update. +**Solution**: +1. Implemented `faster-whisper` for optimized CTranslate2 CPU execution. +2. Selected `Qwen/Qwen2.5-0.5B-Instruct`—a highly optimized, sub-1-billion parameter model—for the LLM parser. This allowed the system to parse transcripts into structured JSON rapidly on standard CPU threads, achieving acceptable real-time latency. + +### 6.3 STT Hallucinations & Phonetic Errors +**Challenge**: Working in a noisy acoustic environment (aviation simulation) with smaller STT models leads to frequent phonetic misspellings. Crucial emergency phrases like "MAYDAY" were occasionally transcribed as "maybe", "may day", or "made a". Standard regex parsers would fail to catch these, potentially missing critical emergencies. +**Solution**: Designed a resilient prompting pipeline for the LLM. Instead of expecting perfect text, the LLM prompt explicitly warns the model about STT phonetic spelling errors. It is instructed to use contextual clues (e.g., words related to fire, rejection, or failure) alongside phonetic similarities to correctly infer intent and raise the `emergency_flag`. + +### 6.4 Real-time State Synchronization +**Challenge**: Managing the concurrency of streaming audio transcription, LLM processing, deterministic rules engine (conflict detection), and the FastAPI web server. Ensuring the UI correctly reflected the state without race conditions or missed events. +**Solution**: Implemented a decoupled architecture using WebSockets (`broadcast_sync`). The core text processing pipeline modifies an in-memory `AircraftStateEngine` and `GroundStateEngine`. After processing, it synchronously blasts full state snapshots and alerts to all connected WebSocket clients, ensuring the dashboard remains durably consistent with the backend state. Event-sourced logging was also implemented for auditability. + +### 6.5 The Scripted Simulator Testing Burden +**Challenge**: Testing the LLM state modifications manually required repeatedly speaking into the microphone, which was exhausting, inconsistent, and slow for iterative development. +**Solution**: Built the `radio_simulator.py` component to inject perfect transcripts at timed intervals. This drastically improved the iteration lifecycle and allowed for reliable, reproducible testing of edge-case scenarios like the complex emergency towing sequence. diff --git a/agent/llm_processor.py b/agent/llm_processor.py index 397d19a..4cfbe2a 100644 --- a/agent/llm_processor.py +++ b/agent/llm_processor.py @@ -72,7 +72,10 @@ def process(self, transcript: str, current_state: Dict[str, Any]) -> LLMResponse 1. Parse the transcript to identify entity (aircraft/vehicle), intent, route, runway, etc. 2. Compare with `current_state` to find conflicts or emergencies. 3. Determine clearance (granted or pending). -4. Output strict JSON matching this schema: +4. **STT Resilience**: The transcript comes from an STT model and may contain phonetic spelling errors. + - Ensure you infer the closest aviation terminology based on context. + - For example, if you see words like 'maybe', 'may day', or 'made a' in the context of danger (fire, failure, rejecting takeoff), treat it as a `MAYDAY` emergency and set the `emergency_flag`. +5. Output strict JSON matching this schema: {{ "parsed_event": {{ "entity_id": "string", diff --git a/audio/stt_engine.py b/audio/stt_engine.py index 85b978e..f3b00f7 100644 --- a/audio/stt_engine.py +++ b/audio/stt_engine.py @@ -22,7 +22,14 @@ def transcribe(self, audio_data) -> str: return "" try: - segments, info = self.model.transcribe(audio_data, beam_size=5, vad_filter=config.VAD_FILTER) + # Provide ATC context to bias the STT engine away from general conversational words + prompt = "ATC communications. MAYDAY, PAN-PAN, runway, taxiway, clearance, Changi, tower, ground, hold short." + segments, info = self.model.transcribe( + audio_data, + beam_size=5, + vad_filter=config.VAD_FILTER, + initial_prompt=prompt + ) text = " ".join([segment.text for segment in segments]) return text.strip() except Exception as e: diff --git a/config.py b/config.py index a565ec7..38a1e13 100644 --- a/config.py +++ b/config.py @@ -13,7 +13,7 @@ ALERTS_LOG_PATH = LOGS_DIR / "alerts.jsonl" # Audio Settings -WHISPER_MODEL_SIZE = "tiny.en" # "tiny.en", "base.en", "small.en" - depending on CPU power +WHISPER_MODEL_SIZE = "base.en" # "tiny.en", "base.en", "small.en" - depending on CPU power VAD_FILTER = True # Voice activity detection to ignore silence CPU_THREADS = 4 @@ -23,10 +23,10 @@ "02L", "20R", "02C", "20C", "02R", "20L" ], "taxiways": [ - "Alpha", "Bravo", "Charlie", "Delta", "Echo", "Foxtrot", "Victor" + "Alpha", "Bravo", "Victor", "Whiskey", "North Cross", "South Cross" ], "platforms": [ - "Platform 1", "Platform 2", "Cargo", "Stand F42", "Terminal 1", "Terminal 2", "Terminal 3" + "Terminal 1", "Terminal 2", "Terminal 3", "Terminal 4", "Cargo", "Changi East" ] } diff --git a/dashboard/templates/dashboard.html b/dashboard/templates/dashboard.html index 6e09de6..121aef9 100644 --- a/dashboard/templates/dashboard.html +++ b/dashboard/templates/dashboard.html @@ -901,34 +901,42 @@

Landing -> Taxi in + "Changi Tower, Jetstar 112 heavy, ILS approach Runway 02C.", + "Jetstar 112, Changi Tower, cleared to land Runway 02C. Wind 040 degrees at 12 knots.", + "Cleared to land Runway 02C, Jetstar 112.", + "Changi Tower, Jetstar 112, runway vacated via South Cross.", + "Jetstar 112, welcome to Changi. Contact Ground.", + "Changi Ground, Jetstar 112, clear of Runway 02C, request taxi.", + "Jetstar 112, Changi Ground, taxi to Terminal 1 via South Cross and Bravo.", + "Taxi to Terminal 1 via South Cross and Bravo, Jetstar 112.", + "Ground, Jetstar 112 approaching Terminal 1.", + "Jetstar 112, roger, dock at Gate clear.", - # --- ARRIVAL TAXI --- - "Scoot 421, welcome to Changi, runway vacated.", - "Scoot 421, taxi to Platform 1 via Delta and Bravo.", + # Departure: Taxi out -> Taxi to Runway -> Depart + "Changi Ground, Singapore 318 at Terminal 2, request pushback.", + "Singapore 318, Changi Ground, pushback and start approved.", + "Ground, Singapore 318 ready to taxi.", + "Singapore 318, taxi to holding point Runway 02L via Alpha and North Cross.", + "Taxi to holding point Runway 02L via Alpha and North Cross, Singapore 318.", + "Singapore 318, contact Changi Tower.", + "Changi Tower, Singapore 318 holding short Runway 02L.", + "Singapore 318, Changi Tower, line up and wait Runway 02L.", + "Line up and wait Runway 02L, Singapore 318.", + "Singapore 318, wind 050 degrees 14 knots, Runway 02L, cleared for takeoff.", + "Cleared for takeoff Runway 02L, Singapore 318.", + "Singapore 318, airborne, contact Departure. Good day." + ] + + self.script_emergency = [ + # --- EMERGENCY OPERATIONS --- + # Departure: Taxi out -> Taxi to Runway -> Emerg + "Changi Ground, Cargo 99 heavy at Cargo, request pushback.", + "Cargo 99, Changi Ground, pushback approved.", + "Ground, Cargo 99 ready to taxi.", + "Cargo 99, taxi to holding point Runway 02C via Whiskey and South Cross.", + "Taxi to holding point Runway 02C via Whiskey and South Cross, Cargo 99.", + "Cargo 99, contact Changi Tower.", + "Changi Tower, Cargo 99 holding short Runway 02C.", + "Cargo 99, Changi Tower, line up and wait Runway 02C.", + "Line up and wait Runway 02C, Cargo 99.", + "Cargo 99, wind 040 degrees 10 knots, Runway 02C, cleared for takeoff.", + "Cleared for takeoff Runway 02C, Cargo 99.", - # --- AFTERNOON DEPARTURE --- - "Changi Ground, Cathay 711, aircraft type Airbus A350, Terminal 2, request pushback.", - "Cathay 711, Changi Ground, pushback approved.", - "Cathay 711, request taxi.", - "Cathay 711, taxi to holding point Runway 02R via Victor.", + # THE EMERGENCY + "MAYDAY, MAYDAY, MAYDAY, Cargo 99, Engine 2 fire! Rejecting takeoff on Runway 02C. Requesting immediate fire and medical assistance!", + "Cargo 99, Changi Tower, roger MAYDAY. Emergency response activated. Hold position on Runway 02C.", - # --- CLEARANCE & DEPARTURE --- - "Cathay 711, wind 060 degrees 8 knots, Runway 02R, cleared for takeoff.", - "Cathay 711, cleared for takeoff Runway 02R.", - "Cathay 711, airborne, switching to departure control.", + # EMERGENCY RESPONSE DISPATCH + "Ground to Fire Tender 1 and Ambulance 1, proceed immediately to Runway 02C via North Cross. Aircraft Cargo 99 has engine fire.", + "Fire Tender 1 and Ambulance 1 proceeding to Runway 02C via North Cross.", + "Ground to Tug 4, proceed to Runway 02C via Victor to stand by for towing.", + "Tug 4 proceeding to Runway 02C via Victor, standing by for tow.", - # --- END OF ROTATION --- - "Ground, Sweeper 1 secured at Cargo.", - "Sweeper 1, roger, have a good day." + # RECOVERY AFTER EMERGENCY + "Tower, Fire Tender 1, fire extinguished. Passengers secure with Ambulance 1.", + "Roger Fire Tender 1. Tug 4, you are cleared to tow Cargo 99 off Runway 02C to Changi East.", + "Tug 4, returning to Changi East via Whiskey and South Cross with Cargo 99 in tow." ] def start(self): @@ -57,7 +78,7 @@ def start(self): self.is_running = True self._thread = threading.Thread(target=self._run_sim, daemon=True) self._thread.start() - logger.info("Started scripted radio simulator.") + logger.info(f"Started scripted radio simulator in {self.mode} mode.") def stop(self): self.is_running = False @@ -68,8 +89,10 @@ def _run_sim(self): # Give UI a moment to connect time.sleep(3.0) + script_to_run = self.script_emergency if self.mode == "emergency" else self.script_normal + while self.is_running: - for line in self.script: + for line in script_to_run: if not self.is_running: break logger.info(f"SIMULATOR 📡 : {line}")