Skip to content
Merged

d #2

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# AeroScribe: Comprehensive System Documentation

## 1. Project Overview
AeroScribe (formerly ATC AI Assist System) is a real-time, offline, and CPU-compatible decision-support layer for Air Traffic Control. This system converts radio speech into structured ATC events, maintains an active operational state of aircraft and ground vehicles, and automatically detects conflicts (e.g., runway incursions) and emergency escalations.

It is designed to be highly resilient, running completely locally without reliance on paid cloud APIs or GPUs.

---

## 2. Core Components & Architecture

### 2.1 Audio Processing (`audio/`)
This module handles the ingestion and transcription of audio.
- **`speech_listener.py`**: Handles microphone input or `.wav` file ingestion. It relies on the `sounddevice` library for robust, cross-platform audio capturing, streaming audio in configurable chunks.
- **`stt_engine.py`**: Employs `faster-whisper` for fast, CPU-only local transcription of audio chunks sent by the listener.
- **`atc_parser.py`**: Provides initial regex and keyword-based parsing logic as a fallback or heuristic extractor for specific entities, though the heavy lifting is handled by the LLM.

### 2.2 LLM Processing Agent (`agent/`)
- **`llm_processor.py`**: Utilizes a highly efficient, CPU-friendly Hugging Face model (`Qwen/Qwen2.5-0.5B-Instruct`) to interpret transcripts.
- Takes the raw STT text and output JSON mapping the entity state (e.g., aircraft ID, destination runway, intent).
- Handles the complex logic of mapping fuzzy, phonetically inaccurate transcription text to strict operational schemas based on the current airport layout state.

### 2.3 Operational State Management (`state/`)
This module acts as the source of truth for all entities on the airfield.
- **`aircraft_state.py`**: Tracks the dynamic state of airborne or taxiing aircraft, updating properties like current segment, destination, and clearance state based on processed LLM events.
- **`ground_state.py`**: Manages ground vehicles (e.g., tugs, fire tenders, ambulances) on platforms and taxiways.
- **`event_store.py`**: Provides event-sourced logging functionality. All transcripts, state snapshots, parsed events, and alerts are durably appended to JSONL log files (`logs/events.jsonl`, `logs/alerts.jsonl`) for replayability and audit trails.

### 2.4 Detection & Alerting (`detection/`)
- **`conflict_detection.py`**: Continuously monitors the state engines to detect unsafe conditions using deterministic rules:
- Unauthorized runway incursions
- Multiple entities cleared on the same runway
- Taxiway segment overlaps
- Unapproved movements or clearance violations
- **`emergency_detection.py`**: Monitors for emergency flags raised by the LLM (based on phrases like "mayday", "fire", or "engine failure") and immediate scales alerts.

### 2.5 Live Dashboard (`dashboard/`)
- **`server.py`**: A FastAPI application that provides the WebSocket server.
- **`templates/dashboard.html`**: The vanilla HTML/JS/CSS frontend. It connects to the WebSocket to consume and display `transcript`, `state`, and `alert` events in real-time. It features a responsive, glassmorphic UI.

### 2.6 Simulation Engine (`simulation/`)
- **`radio_simulator.py`**: A powerful testing harness that injects pre-scripted radio calls into the pipeline without needing a live microphone. It supports:
- **Normal Mode**: Standard arrival and departure flows.
- **Emergency Mode**: Tests extreme edge cases, simulating a rejected takeoff, engine fire, MAYDAY calls, and the dispatch of emergency response vehicles.

---

## 3. Technology Stack
- **Backend Framework**: FastAPI (Uvicorn, WebSockets)
- **Audio Capturing**: `sounddevice`
- **Transcription**: `faster-whisper`
- **Natural Language Understanding**: `transformers` (`Qwen/Qwen2.5-0.5B-Instruct` via pipeline)
- **Frontend**: Vanilla HTML5, CSS3, JavaScript (WebSocket Client)
- **Testing**: `pytest`

---

## 4. Setup & Installation

The system is designed to run locally on Windows, macOS, or Linux. Python 3.10+ is recommended.

1. **Clone the repository** and navigate to the project directory.
2. **Create and activate a virtual environment**:
```bash
python -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\Activate
```
3. **Install Dependencies**:
```bash
pip install -r requirements.txt
```

---

## 5. Running the Application

### 5.1 Simulation Mode (Recommended for Development)
To run the system without a microphone using the built-in scripted scenarios:
```bash
python main.py --simulate
```
For testing emergency escalation paths:
```bash
python main.py --simulate-emergency
```
*Access the dashboard at: http://127.0.0.1:8080*

### 5.2 Live Microphone Mode
To transcribe and process your own live voice commands:
```bash
python main.py
```

### 5.3 Offline WAV File processing
To run the system on a pre-recorded audio file:
```bash
python main.py --demo-wav path/to/audio.wav
```

---

## 6. Challenges Faced During Development

Building a real-time, local, and resilient AI system introduced several distinct engineering challenges:

### 6.1 Audio Capture & Native Dependencies
**Challenge**: Initially, the project relied on `pyaudio` for microphone capture. This created massive friction during installation, especially on newer Python versions (3.12+) and Windows machines, where missing C++ build tools or lack of pre-compiled wheels caused installations to fail completely.
**Solution**: Migrated the audio ingestion pipeline to use `sounddevice`. This abstraction proved significantly more reliable across OS environments and avoided the `pyaudio` compilation nightmares, resulting in a smoother developer and user setup experience.

### 6.2 CPU-Bound Performance Constraints
**Challenge**: The system needed to perform real-time Speech-to-Text (STT) and Large Language Model (LLM) processing sequentially, strict local execution, without relying on a GPU. Standard Whisper models and 7B+ parameter LLMs took too long to infer, causing severe latency between an ATC command and the UI update.
**Solution**:
1. Implemented `faster-whisper` for optimized CTranslate2 CPU execution.
2. Selected `Qwen/Qwen2.5-0.5B-Instruct`—a highly optimized, sub-1-billion parameter model—for the LLM parser. This allowed the system to parse transcripts into structured JSON rapidly on standard CPU threads, achieving acceptable real-time latency.

### 6.3 STT Hallucinations & Phonetic Errors
**Challenge**: Working in a noisy acoustic environment (aviation simulation) with smaller STT models leads to frequent phonetic misspellings. Crucial emergency phrases like "MAYDAY" were occasionally transcribed as "maybe", "may day", or "made a". Standard regex parsers would fail to catch these, potentially missing critical emergencies.
**Solution**: Designed a resilient prompting pipeline for the LLM. Instead of expecting perfect text, the LLM prompt explicitly warns the model about STT phonetic spelling errors. It is instructed to use contextual clues (e.g., words related to fire, rejection, or failure) alongside phonetic similarities to correctly infer intent and raise the `emergency_flag`.

### 6.4 Real-time State Synchronization
**Challenge**: Managing the concurrency of streaming audio transcription, LLM processing, deterministic rules engine (conflict detection), and the FastAPI web server. Ensuring the UI correctly reflected the state without race conditions or missed events.
**Solution**: Implemented a decoupled architecture using WebSockets (`broadcast_sync`). The core text processing pipeline modifies an in-memory `AircraftStateEngine` and `GroundStateEngine`. After processing, it synchronously blasts full state snapshots and alerts to all connected WebSocket clients, ensuring the dashboard remains durably consistent with the backend state. Event-sourced logging was also implemented for auditability.

### 6.5 The Scripted Simulator Testing Burden
**Challenge**: Testing the LLM state modifications manually required repeatedly speaking into the microphone, which was exhausting, inconsistent, and slow for iterative development.
**Solution**: Built the `radio_simulator.py` component to inject perfect transcripts at timed intervals. This drastically improved the iteration lifecycle and allowed for reliable, reproducible testing of edge-case scenarios like the complex emergency towing sequence.
5 changes: 4 additions & 1 deletion agent/llm_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,10 @@ def process(self, transcript: str, current_state: Dict[str, Any]) -> LLMResponse
1. Parse the transcript to identify entity (aircraft/vehicle), intent, route, runway, etc.
2. Compare with `current_state` to find conflicts or emergencies.
3. Determine clearance (granted or pending).
4. Output strict JSON matching this schema:
4. **STT Resilience**: The transcript comes from an STT model and may contain phonetic spelling errors.
- Ensure you infer the closest aviation terminology based on context.
- For example, if you see words like 'maybe', 'may day', or 'made a' in the context of danger (fire, failure, rejecting takeoff), treat it as a `MAYDAY` emergency and set the `emergency_flag`.
5. Output strict JSON matching this schema:
{{
"parsed_event": {{
"entity_id": "string",
Expand Down
9 changes: 8 additions & 1 deletion audio/stt_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,14 @@ def transcribe(self, audio_data) -> str:
return ""

try:
segments, info = self.model.transcribe(audio_data, beam_size=5, vad_filter=config.VAD_FILTER)
# Provide ATC context to bias the STT engine away from general conversational words
prompt = "ATC communications. MAYDAY, PAN-PAN, runway, taxiway, clearance, Changi, tower, ground, hold short."
segments, info = self.model.transcribe(
audio_data,
beam_size=5,
vad_filter=config.VAD_FILTER,
initial_prompt=prompt
)
text = " ".join([segment.text for segment in segments])
return text.strip()
except Exception as e:
Expand Down
6 changes: 3 additions & 3 deletions config.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
ALERTS_LOG_PATH = LOGS_DIR / "alerts.jsonl"

# Audio Settings
WHISPER_MODEL_SIZE = "tiny.en" # "tiny.en", "base.en", "small.en" - depending on CPU power
WHISPER_MODEL_SIZE = "base.en" # "tiny.en", "base.en", "small.en" - depending on CPU power
VAD_FILTER = True # Voice activity detection to ignore silence
CPU_THREADS = 4

Expand All @@ -23,10 +23,10 @@
"02L", "20R", "02C", "20C", "02R", "20L"
],
"taxiways": [
"Alpha", "Bravo", "Charlie", "Delta", "Echo", "Foxtrot", "Victor"
"Alpha", "Bravo", "Victor", "Whiskey", "North Cross", "South Cross"
],
"platforms": [
"Platform 1", "Platform 2", "Cargo", "Stand F42", "Terminal 1", "Terminal 2", "Terminal 3"
"Terminal 1", "Terminal 2", "Terminal 3", "Terminal 4", "Cargo", "Changi East"
]
}

Expand Down
48 changes: 28 additions & 20 deletions dashboard/templates/dashboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -901,34 +901,42 @@ <h2 style="color: var(--danger); margin-top: 0; display: flex; align-items: cent
const MAP_SIZE = 1000;
const layout = {
runways: [
{ id: "02L/20R", x: 300, y1: 150, y2: 850, label: "02L" },
{ id: "02L/20R", x: 200, y1: 150, y2: 850, label: "02L" },
{ id: "02C/20C", x: 500, y1: 150, y2: 850, label: "02C" },
{ id: "02R/20L", x: 700, y1: 150, y2: 850, label: "02R" }
{ id: "02R/20L", x: 800, y1: 150, y2: 850, label: "02R" }
],
taxiways: [
{ id: "Alpha", type: "v", x: 230, y1: 150, y2: 850 },
{ id: "Bravo", type: "v", x: 370, y1: 150, y2: 850 },
{ id: "Charlie", type: "h", y: 700, x1: 200, x2: 750 },
{ id: "Delta", type: "h", y: 400, x1: 100, x2: 750 },
{ id: "Echo", type: "h", y: 250, x1: 200, x2: 750 }
{ id: "Alpha", type: "v", x: 150, y1: 150, y2: 850 },
{ id: "Bravo", type: "v", x: 250, y1: 150, y2: 850 },
{ id: "Victor", type: "v", x: 450, y1: 150, y2: 850 },
{ id: "Whiskey", type: "v", x: 550, y1: 150, y2: 850 },
{ id: "North Cross", type: "h", y: 250, x1: 100, x2: 900 },
{ id: "South Cross", type: "h", y: 750, x1: 100, x2: 900 }
],
areas: [
{ id: "Stand F42", x: 100, y: 200, w: 80, h: 80 },
{ id: "Platform 1", x: 100, y: 450, w: 100, h: 100 },
{ id: "Cargo", x: 750, y: 650, w: 150, h: 150 }
{ id: "Terminal 1", x: 280, y: 300, w: 140, h: 80 },
{ id: "Terminal 2", x: 280, y: 400, w: 140, h: 80 },
{ id: "Terminal 3", x: 280, y: 500, w: 140, h: 80 },
{ id: "Terminal 4", x: 350, y: 880, w: 150, h: 80 },
{ id: "Cargo", x: 850, y: 450, w: 120, h: 200 },
{ id: "Changi East", x: 850, y: 700, w: 120, h: 150 }
],
nodes: {
"02L": { x: 300, y: 800 }, "20R": { x: 300, y: 200 },
"02L": { x: 200, y: 800 }, "20R": { x: 200, y: 200 },
"02C": { x: 500, y: 800 }, "20C": { x: 500, y: 200 },
"02R": { x: 700, y: 800 }, "20L": { x: 700, y: 200 },
"Alpha": { x: 230, y: 500 },
"Bravo": { x: 370, y: 500 },
"Charlie": { x: 500, y: 700 },
"Delta": { x: 230, y: 400 },
"Echo": { x: 300, y: 250 },
"Stand F42": { x: 140, y: 240 },
"Platform 1": { x: 150, y: 500 },
"Cargo": { x: 800, y: 700 },
"02R": { x: 800, y: 800 }, "20L": { x: 800, y: 200 },
"Alpha": { x: 150, y: 500 },
"Bravo": { x: 250, y: 500 },
"Victor": { x: 450, y: 500 },
"Whiskey": { x: 550, y: 500 },
"North Cross": { x: 500, y: 250 },
"South Cross": { x: 500, y: 750 },
"Terminal 1": { x: 350, y: 340 },
"Terminal 2": { x: 350, y: 440 },
"Terminal 3": { x: 350, y: 540 },
"Terminal 4": { x: 425, y: 860 },
"Cargo": { x: 830, y: 550 },
"Changi East": { x: 830, y: 750 },
"airborne": { x: 500, y: 50 }
}
};
Expand Down
22 changes: 15 additions & 7 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import logging
import uvicorn
import asyncio
from fastapi import FastAPI
from contextlib import asynccontextmanager
import threading

Expand Down Expand Up @@ -73,11 +74,12 @@ async def lifespan(app: FastAPI):
logger.info("Initializing ATC AI Assist Core...")
import dashboard.server
dashboard.server._main_loop = asyncio.get_running_loop()
if app.state.simulate:
if app.state.simulate or app.state.simulate_emergency:
from simulation.radio_simulator import RadioSimulator
app.state.simulator = RadioSimulator(process_text_transcript, delay_between_calls=1.5)
sim_mode = "emergency" if app.state.simulate_emergency else "normal"
app.state.simulator = RadioSimulator(process_text_transcript, mode=sim_mode, delay_between_calls=1.5)
app.state.simulator.start()
logger.info("Running in SIMULATION mode.")
logger.info(f"Running in SIMULATION mode ({sim_mode}).")
else:
# Load heavy ML models
from audio.stt_engine import STTEngine
Expand All @@ -103,7 +105,7 @@ def audio_callback(np_data):

# Shutdown Phase
logger.info("Shutting down ATC AI Assist Core...")
if app.state.simulate:
if app.state.simulate or app.state.simulate_emergency:
app.state.simulator.stop()
elif hasattr(app.state, 'listener') and app.state.listener:
app.state.listener.stop()
Expand All @@ -112,15 +114,21 @@ def audio_callback(np_data):

if __name__ == "__main__":
arg_parser = argparse.ArgumentParser(description="ATC AI Assist System")
arg_parser.add_argument("--simulate", action="store_true", help="Run with scripted simulator instead of audio.")
arg_parser.add_argument("--simulate", action="store_true", help="Run with normal scripted simulator instead of audio.")
arg_parser.add_argument("--simulate-emergency", action="store_true", help="Run with emergency scripted simulator (includes fire/medical/tow response).")
arg_parser.add_argument("--demo-wav", type=str, help="Run offline using a specified .wav file.")
args = arg_parser.parse_args()

app.state.simulate = args.simulate
app.state.simulate_emergency = args.simulate_emergency
app.state.demo_wav = args.demo_wav

if args.demo_wav and args.simulate:
logger.error("Cannot use --simulate and --demo-wav at the same time.")
if args.demo_wav and (args.simulate or args.simulate_emergency):
logger.error("Cannot use simulation arguments and --demo-wav at the same time.")
sys.exit(1)

if args.simulate and args.simulate_emergency:
logger.error("Cannot use both --simulate and --simulate-emergency at the same time.")
sys.exit(1)

logger.info(f"Starting API Server on http://{config.HOST}:{config.PORT}")
Expand Down
Loading