M18 Engine is the Jetson-side runtime for the M18 humanoid robot prototype. It coordinates wake-word listening, local and premium STT, query filtering, EdTalkies API calls, TTS, health/error handling, and facial/body expression commands for the Raspberry Pi Zero 2 W expression node.
- Brain: NVIDIA Jetson Orin Nano Developer Kit, 8GB
- OS: Ubuntu / Jetson Linux
- Microphone: reSpeaker XVF3800 USB Mic Array
- Expression controller: Raspberry Pi Zero 2 W
Boot -> health check -> idle wake listener
Wake word or any clear sound -> active listening -> STT -> query filter
Valid query -> EdTalkies intent metadata -> ExecutionPlan route executor
Generated response -> EdTalkies persistence -> TTS -> active follow-up listening
No voice for 5 minutes -> partial idle / sleepy expression
Errors -> log -> optional email -> friendly message -> recovery
After M18 accepts a valid user request, normal user speech is ignored until the response completes and the microphone is flushed. M18 then returns to wake-word listening instead of processing room conversation captured during STT/API/TTS time. During TTS, only interrupt commands such as stop, cancel, quiet, and enough are listened for.
Simple greetings and identity prompts such as hello, hi, good morning, and what is your name are answered locally and spoken with local TTS. They do not call EdTalkies or OpenAI.
By default, idle wake is not limited to fixed wake words. Any clear sound above the configured wake threshold can move M18 into the active listening window. Wake words still work, but they are no longer required.
Supported wake names include M18 and JAI. JAI wake phrases include Jai, Hey Jai, Hi Jai, Hello Jai, Listen Jai, Okay Jai, Wake up Jai, Jai listen, Jai are you there, and Jai help me. The wake matcher is case-insensitive and accepts common STT variation jay.
M18 does not send normal STT/API work while it is speaking. During TTS playback, only a lightweight stop-command listener is allowed. The engine also remembers its last spoken text and rejects likely self-echo before any API call.
Supported stop commands:
stopstop speakingenoughpausebe quiet
git clone https://github.com/aritworksdev/m18.engine.com.git
cd m18.engine.com
cp .env.example .env
bash scripts/install_jetson.sh
bash scripts/download_models.sh
python -m m18_engine --diagnosticsList audio devices:
python -m m18_engine --list-audio-devicesIf the reSpeaker is not the default input, set it in .env:
M18_AUDIO_INPUT_DEVICE=2
M18_AUDIO_INPUT_GAIN=2.0For Jetson/reSpeaker stability, bypass desktop default-device switching with ALSA. M18 can resolve auto: values on every startup by matching arecord -l / aplay -l device descriptions, so USB card numbers can change after reboot without breaking the service:
M18_AUDIO_CAPTURE_BACKEND=arecord
M18_ALSA_INPUT_DEVICE=auto:reSpeaker XVF3800
M18_AUDIO_PLAYBACK_BACKEND=aplay
M18_ALSA_OUTPUT_DEVICE=auto:KT USB AudioIf the exact device text is different on your Jetson, choose a stable unique part from:
arecord -l
aplay -lExamples: auto:reSpeaker, auto:XVF3800, auto:KT USB Audio. Avoid numeric values such as plughw:0,0 unless you are testing temporarily.
Test the selected mic level:
python -m m18_engine --test-micNormal query capture uses frame-level voice activity detection so steady microphone noise does not invoke STT. Optional tuning values are:
M18_QUERY_VAD_NOISE_MULTIPLIER=2.0
M18_QUERY_VAD_MIN_ACTIVE_MS=180Increase the multiplier slightly if steady room noise is still treated as speech. Lower it cautiously if quiet speakers are being missed.
Test wake recognition:
python -m m18_engine --test-wakeM18 can optionally use a USB UVC camera for visual events such as person presence, face proximity, wave wake, and object detection. Vision is disabled by default and does not start the camera unless explicitly enabled.
Install optional vision dependencies on Jetson:
bash scripts/install_vision.shEnable the service in .env:
M18_VISION_ENABLED=true
M18_VISION_CAMERA_INDEX=0
M18_VISION_PERSON_DETECTION_ENABLED=true
M18_VISION_FACE_TRACKING_ENABLED=true
M18_VISION_GESTURE_DETECTION_ENABLED=true
M18_VISION_OBJECT_RECOGNITION_ENABLED=true
M18_VISION_EXPRESSION_RECOGNITION_ENABLED=false
M18_VISION_OCR_ENABLED=false
M18_VISION_TEXT_READING_ENABLED=trueCamera test:
python -m m18_engine --test-cameraVision event test:
python -m m18_engine --test-vision --test-vision-seconds 10The vision service runs in a background thread and publishes small internal events. The audio conversation loop remains the priority. Normal vision events are ignored while M18 is doing STT, API work, TTS, or interruption handling. A hand-wave event only acts as a visual wake when M18 is idle or wake-word listening.
Object recognition requires a compatible OpenCV DNN model. Configure these only after the model files are installed:
M18_VISION_OBJECT_MODEL_PATH=/opt/m18/models/vision/object.onnx
M18_VISION_OBJECT_MODEL_CONFIG_PATH=
M18_VISION_OBJECT_CLASSES_PATH=/opt/m18/models/vision/classes.txtFor the default YOLOv8n object detector:
bash scripts/download_vision_models.shThen use:
M18_VISION_OBJECT_RECOGNITION_ENABLED=true
M18_VISION_OBJECT_MODEL_PATH=/opt/m18/models/vision/yolov8n.onnx
M18_VISION_OBJECT_MODEL_CONFIG_PATH=
M18_VISION_OBJECT_CLASSES_PATH=/opt/m18/models/vision/coco.namesThe first supported object responses are intentionally short and low-frequency: book, phone, bottle, cup, toy, scissors, and laptop. The COCO model does not reliably include small items such as pens; those need a custom trained model later.
If the USB UVC camera is wide-angle, keep important detections near the center of the image. Faces, objects, and text near the edges can distort and fail detection.
Recommended placement ranges:
| Task | Recommended position |
|---|---|
| Person detection | Stand about 1-3 m / 3-10 ft from M18 |
| Face tracking | Keep face centered at 45-120 cm / 18-48 in |
| Close-face greeting | Face centered at 35-70 cm / 14-28 in |
| Facial expression recognition | Face centered at 45-90 cm / 18-36 in with good light |
| Hand wave | Wave beside face/chest within 50-150 cm / 20-60 in |
| Object recognition | Hold object centered at chest/face height, 35-90 cm / 14-36 in |
| Book cover / paper OCR | Hold the page flat and centered, 25-55 cm / 10-22 in; the readable text area should fill at least 35% of frame width |
Estimate camera horizontal FOV from a known visible width:
python -m m18_engine --vision-estimate-fov 170 100This means: at 100 cm from the camera, the visible width is 170 cm.
Run live calibration preview:
python -m m18_engine --vision-calibrate --vision-calibrate-seconds 30For headless Jetson testing, save one annotated frame:
python -m m18_engine --vision-calibrate-save /tmp/m18-vision-calibration.jpgCalibration config:
M18_VISION_FACE_MIN_DISTANCE_CM=45
M18_VISION_FACE_MAX_DISTANCE_CM=120
M18_VISION_OBJECT_MIN_DISTANCE_CM=35
M18_VISION_OBJECT_MAX_DISTANCE_CM=90
M18_VISION_OCR_MIN_WIDTH_PERCENT=35
M18_VISION_CENTER_BOX_PERCENT=60
M18_VISION_FOV_HORIZONTAL_DEGREES=0
M18_VISION_KNOWN_FACE_WIDTH_CM=16Set M18_VISION_FOV_HORIZONTAL_DEGREES after measuring the camera. When it is 0, calibration still shows guide boxes and size hints, but face distance is shown as unknown.
To enable local text reading from book covers or paper, install OCR support and enable OCR:
bash scripts/install_vision.shM18_VISION_OCR_ENABLED=true
M18_VISION_TEXT_READING_ENABLED=true
M18_VISION_TEXT_MIN_CHARACTERS=6
M18_VISION_TEXT_MAX_CHARACTERS=120
M18_VISION_TEXT_SCAN_INTERVAL_SECONDS=2
M18_VISION_TEXT_COOLDOWN_SECONDS=20Text reading scans multiple center-weighted document areas and does not require a book object detection. It is intended for short notes, business cards, labels, paper pages, and book covers. For best results, hold the text flat, centered, bright, and close enough that the letters are sharp and large. Very small handwriting, curved pages, glare, or text near the wide-angle edge of the camera can still fail.
Run once:
bash scripts/run_m18.sh --listen-onceRun continuously:
bash scripts/run_m18.sh --loopInstall as a systemd service:
sudo bash scripts/install_service.sh
sudo systemctl enable --now m18-engineThe first hardware path uses serial expression commands. HTTP mode is also available for development.
git clone https://github.com/aritworksdev/m18.engine.com.git
cd m18.engine.com
sudo bash scripts/install_pi_zero.sh
sudo systemctl enable --now m18-expressionDefault Pi serial service:
/dev/ttyGS0 at 115200 baud
Jetson sends one newline-delimited JSON message to the Pi on every engine state change:
{"event":"engine_state","state":"listening","expression":"listening","intensity":70,"duration_ms":0}Common states are idle, partial_idle, wake_detected, listening, thinking, speaking, error_recovery, and shutdown.
To view live Jetson-to-Pi expression messages from the Jetson without touching the serial port:
python scripts/show_pi_messages.pyFor IP/network communication, run the Pi expression node in HTTP mode:
sudo tee /etc/m18-expression.env >/dev/null <<'EOF'
M18_PI_MODE=http
M18_PI_HTTP_HOST=0.0.0.0
M18_PI_HTTP_PORT=8787
EOF
sudo systemctl restart m18-expressionThen set Jetson .env:
PI_CONNECTION_TYPE=http
M18_EXPRESSION_ENDPOINT=http://m18pi.local:8787/eventFor HTTP testing:
python pi_zero/expression_node.py --mode http --http-port 8787Secrets and feature switches live in .env.
Normal settings live in config.yaml.
Important switches:
ENABLE_EDTALKIES_API=false
ENABLE_PREMIUM_STT=false
ENABLE_PREMIUM_TTS=false
ENABLE_BODY_PHYSICS=true
ENABLE_ERROR_EMAIL=false
JAI_DEMO_MODE=false
PI_CONNECTION_TYPE=serial
PI_SERIAL_PORT=/dev/ttyACM0EdTalkies is the authoritative routing brain. After STT, M18 calls GetResolveIntentMetadataAsync and executes AiBotIntent.ExecutionPlan.Route. M18 does not classify current affairs, education generation, RAG, or local responses using query keywords.
Configure:
ENABLE_OPENAI=true
ENABLE_PREMIUM_STT=true
M18_PREMIUM_STT_MODE=openai
OPENAI_API_KEY=
M18_EDTALKIES_INTENT_PATH=/AiBotTalkies/GetResolveIntentMetadataAsync
M18_EDTALKIES_UPDATE_RESPONSE_PATH=/AiBotTalkies/UpdateGeneratedResponseAsync
OPENAI_RESPONSES_PATH=/v1/responsesSupported plan routes are OPENAI_DIRECT, OPENAI_WEB_SEARCH, GOVERNMENT_RAG, EDTALKIES_GENERATION, and LOCAL_RESPONSE. If intent resolution or its plan is invalid, M18 uses the configured OpenAI model as a short direct fallback. M18_RESPONSE_MODE and JAI_DEMO_MODE remain readable for configuration compatibility, but they do not override an EdTalkies execution plan.
Default local TTS uses Piper with the male en_US-hfc_male-medium voice. To switch voices, update:
M18_PIPER_MODEL=/opt/m18/models/piper/en_US-hfc_male-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-hfc_male-medium.onnx.json
M18_PIPER_LENGTH_SCALE=1.0To set and verify the active local Piper voice on Jetson:
cd ~/aritworks/projects/m18.engine.com
bash scripts/set_piper_voice.sh hfc_male
sudo systemctl restart m18-engineThe model downloader installs these Piper voices by default:
M18_PIPER_VOICES=joe bryce norman hfc_maleTo test another installed local voice, point .env at that model:
M18_PIPER_MODEL=/opt/m18/models/piper/en_US-bryce-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-bryce-medium.onnx.jsonor:
M18_PIPER_MODEL=/opt/m18/models/piper/en_US-norman-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-norman-medium.onnx.jsonor:
M18_PIPER_MODEL=/opt/m18/models/piper/en_US-hfc_male-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-hfc_male-medium.onnx.jsonTo audition installed local Piper voices on Jetson:
cd /opt/m18/app
/opt/m18/venv/bin/python scripts/audition_piper_voices.pyDefault OpenAI TTS uses the male-sounding onyx voice:
OPENAI_TTS_VOICE=onyx
OPENAI_TTS_INSTRUCTIONS=Speak clearly, warmly, and naturally in a plain adult male voice.Wake-on-sound tuning:
M18_WAKE_ON_ANY_SOUND=true
M18_MIN_WAKE_RMS=80
M18_MIN_WAKE_PEAK=500
M18_ACTIVE_SESSION_SECONDS=30m18_engine/runtime.py: main state machinem18_engine/stt_engine.py: STT provider factorym18_engine/tts_engine.py: TTS provider factorym18_engine/edtalkies_api.py: EdTalkies chat/API adapterm18_engine/query_filter.py: invalid query and echo rejectionm18_engine/health_engine.py: health checksm18_engine/error_engine.py: logs and optional email alertsm18_engine/body_physics_engine.py: Pi expression command transportm18_engine/pi_client.py: Pi client compatibility exportm18_engine/services/vision/vision_service.py: optional background camera service
python -m compileall m18_engine pi_zero tests
python -m unittest discover -s tests
python -m m18_engine --diagnosticsOn a development machine without Jetson audio/model dependencies, diagnostics will report providers as not ready. That is expected until scripts/install_jetson.sh and scripts/download_models.sh run on the Jetson.