M18 Engine

M18 Engine is the Jetson-side runtime for the M18 humanoid robot prototype. It coordinates wake-word listening, local and premium STT, query filtering, EdTalkies API calls, TTS, health/error handling, and facial/body expression commands for the Raspberry Pi Zero 2 W expression node.

Hardware Target

Brain: NVIDIA Jetson Orin Nano Developer Kit, 8GB
OS: Ubuntu / Jetson Linux
Microphone: reSpeaker XVF3800 USB Mic Array
Expression controller: Raspberry Pi Zero 2 W

Runtime Flow

Boot -> health check -> idle wake listener
Wake word or any clear sound -> active listening -> STT -> query filter
Valid query -> EdTalkies intent metadata -> ExecutionPlan route executor
Generated response -> EdTalkies persistence -> TTS -> active follow-up listening
No voice for 5 minutes -> partial idle / sleepy expression
Errors -> log -> optional email -> friendly message -> recovery

After M18 accepts a valid user request, normal user speech is ignored until the response completes and the microphone is flushed. M18 then returns to wake-word listening instead of processing room conversation captured during STT/API/TTS time. During TTS, only interrupt commands such as stop, cancel, quiet, and enough are listened for.

Simple greetings and identity prompts such as hello, hi, good morning, and what is your name are answered locally and spoken with local TTS. They do not call EdTalkies or OpenAI.

By default, idle wake is not limited to fixed wake words. Any clear sound above the configured wake threshold can move M18 into the active listening window. Wake words still work, but they are no longer required.

Supported wake names include M18 and JAI. JAI wake phrases include Jai, Hey Jai, Hi Jai, Hello Jai, Listen Jai, Okay Jai, Wake up Jai, Jai listen, Jai are you there, and Jai help me. The wake matcher is case-insensitive and accepts common STT variation jay.

Important Echo Protection

M18 does not send normal STT/API work while it is speaking. During TTS playback, only a lightweight stop-command listener is allowed. The engine also remembers its last spoken text and rejects likely self-echo before any API call.

Supported stop commands:

stop
stop speaking
enough
pause
be quiet

Install On Jetson

git clone https://github.com/aritworksdev/m18.engine.com.git
cd m18.engine.com
cp .env.example .env
bash scripts/install_jetson.sh
bash scripts/download_models.sh
python -m m18_engine --diagnostics

List audio devices:

python -m m18_engine --list-audio-devices

If the reSpeaker is not the default input, set it in .env:

M18_AUDIO_INPUT_DEVICE=2
M18_AUDIO_INPUT_GAIN=2.0

For Jetson/reSpeaker stability, bypass desktop default-device switching with ALSA. M18 can resolve auto: values on every startup by matching arecord -l / aplay -l device descriptions, so USB card numbers can change after reboot without breaking the service:

M18_AUDIO_CAPTURE_BACKEND=arecord
M18_ALSA_INPUT_DEVICE=auto:reSpeaker XVF3800
M18_AUDIO_PLAYBACK_BACKEND=aplay
M18_ALSA_OUTPUT_DEVICE=auto:KT USB Audio

If the exact device text is different on your Jetson, choose a stable unique part from:

arecord -l
aplay -l

Examples: auto:reSpeaker, auto:XVF3800, auto:KT USB Audio. Avoid numeric values such as plughw:0,0 unless you are testing temporarily.

Test the selected mic level:

python -m m18_engine --test-mic

Normal query capture uses frame-level voice activity detection so steady microphone noise does not invoke STT. Optional tuning values are:

M18_QUERY_VAD_NOISE_MULTIPLIER=2.0
M18_QUERY_VAD_MIN_ACTIVE_MS=180

Increase the multiplier slightly if steady room noise is still treated as speech. Lower it cautiously if quiet speakers are being missed.

Test wake recognition:

python -m m18_engine --test-wake

Optional Vision Service

M18 can optionally use a USB UVC camera for visual events such as person presence, face proximity, wave wake, and object detection. Vision is disabled by default and does not start the camera unless explicitly enabled.

Install optional vision dependencies on Jetson:

bash scripts/install_vision.sh

Enable the service in .env:

M18_VISION_ENABLED=true
M18_VISION_CAMERA_INDEX=0
M18_VISION_PERSON_DETECTION_ENABLED=true
M18_VISION_FACE_TRACKING_ENABLED=true
M18_VISION_GESTURE_DETECTION_ENABLED=true
M18_VISION_OBJECT_RECOGNITION_ENABLED=true
M18_VISION_EXPRESSION_RECOGNITION_ENABLED=false
M18_VISION_OCR_ENABLED=false
M18_VISION_TEXT_READING_ENABLED=true

Camera test:

python -m m18_engine --test-camera

Vision event test:

python -m m18_engine --test-vision --test-vision-seconds 10

The vision service runs in a background thread and publishes small internal events. The audio conversation loop remains the priority. Normal vision events are ignored while M18 is doing STT, API work, TTS, or interruption handling. A hand-wave event only acts as a visual wake when M18 is idle or wake-word listening.

Object recognition requires a compatible OpenCV DNN model. Configure these only after the model files are installed:

M18_VISION_OBJECT_MODEL_PATH=/opt/m18/models/vision/object.onnx
M18_VISION_OBJECT_MODEL_CONFIG_PATH=
M18_VISION_OBJECT_CLASSES_PATH=/opt/m18/models/vision/classes.txt

For the default YOLOv8n object detector:

bash scripts/download_vision_models.sh

Then use:

M18_VISION_OBJECT_RECOGNITION_ENABLED=true
M18_VISION_OBJECT_MODEL_PATH=/opt/m18/models/vision/yolov8n.onnx
M18_VISION_OBJECT_MODEL_CONFIG_PATH=
M18_VISION_OBJECT_CLASSES_PATH=/opt/m18/models/vision/coco.names

The first supported object responses are intentionally short and low-frequency: book, phone, bottle, cup, toy, scissors, and laptop. The COCO model does not reliably include small items such as pens; those need a custom trained model later.

Vision Placement And Calibration

If the USB UVC camera is wide-angle, keep important detections near the center of the image. Faces, objects, and text near the edges can distort and fail detection.

Recommended placement ranges:

Task	Recommended position
Person detection	Stand about `1-3 m` / `3-10 ft` from M18
Face tracking	Keep face centered at `45-120 cm` / `18-48 in`
Close-face greeting	Face centered at `35-70 cm` / `14-28 in`
Facial expression recognition	Face centered at `45-90 cm` / `18-36 in` with good light
Hand wave	Wave beside face/chest within `50-150 cm` / `20-60 in`
Object recognition	Hold object centered at chest/face height, `35-90 cm` / `14-36 in`
Book cover / paper OCR	Hold the page flat and centered, `25-55 cm` / `10-22 in`; the readable text area should fill at least `35%` of frame width

Estimate camera horizontal FOV from a known visible width:

python -m m18_engine --vision-estimate-fov 170 100

This means: at 100 cm from the camera, the visible width is 170 cm.

Run live calibration preview:

python -m m18_engine --vision-calibrate --vision-calibrate-seconds 30

For headless Jetson testing, save one annotated frame:

python -m m18_engine --vision-calibrate-save /tmp/m18-vision-calibration.jpg

Calibration config:

M18_VISION_FACE_MIN_DISTANCE_CM=45
M18_VISION_FACE_MAX_DISTANCE_CM=120
M18_VISION_OBJECT_MIN_DISTANCE_CM=35
M18_VISION_OBJECT_MAX_DISTANCE_CM=90
M18_VISION_OCR_MIN_WIDTH_PERCENT=35
M18_VISION_CENTER_BOX_PERCENT=60
M18_VISION_FOV_HORIZONTAL_DEGREES=0
M18_VISION_KNOWN_FACE_WIDTH_CM=16

Set M18_VISION_FOV_HORIZONTAL_DEGREES after measuring the camera. When it is 0, calibration still shows guide boxes and size hints, but face distance is shown as unknown.

To enable local text reading from book covers or paper, install OCR support and enable OCR:

bash scripts/install_vision.sh

M18_VISION_OCR_ENABLED=true
M18_VISION_TEXT_READING_ENABLED=true
M18_VISION_TEXT_MIN_CHARACTERS=6
M18_VISION_TEXT_MAX_CHARACTERS=120
M18_VISION_TEXT_SCAN_INTERVAL_SECONDS=2
M18_VISION_TEXT_COOLDOWN_SECONDS=20

Text reading scans multiple center-weighted document areas and does not require a book object detection. It is intended for short notes, business cards, labels, paper pages, and book covers. For best results, hold the text flat, centered, bright, and close enough that the letters are sharp and large. Very small handwriting, curved pages, glare, or text near the wide-angle edge of the camera can still fail.

Run once:

bash scripts/run_m18.sh --listen-once

Run continuously:

bash scripts/run_m18.sh --loop

Install as a systemd service:

sudo bash scripts/install_service.sh
sudo systemctl enable --now m18-engine

Install On Raspberry Pi Zero 2 W

The first hardware path uses serial expression commands. HTTP mode is also available for development.

git clone https://github.com/aritworksdev/m18.engine.com.git
cd m18.engine.com
sudo bash scripts/install_pi_zero.sh
sudo systemctl enable --now m18-expression

Default Pi serial service:

/dev/ttyGS0 at 115200 baud

Jetson sends one newline-delimited JSON message to the Pi on every engine state change:

{"event":"engine_state","state":"listening","expression":"listening","intensity":70,"duration_ms":0}

Common states are idle, partial_idle, wake_detected, listening, thinking, speaking, error_recovery, and shutdown.

To view live Jetson-to-Pi expression messages from the Jetson without touching the serial port:

python scripts/show_pi_messages.py

For IP/network communication, run the Pi expression node in HTTP mode:

sudo tee /etc/m18-expression.env >/dev/null <<'EOF'
M18_PI_MODE=http
M18_PI_HTTP_HOST=0.0.0.0
M18_PI_HTTP_PORT=8787
EOF
sudo systemctl restart m18-expression

Then set Jetson .env:

PI_CONNECTION_TYPE=http
M18_EXPRESSION_ENDPOINT=http://m18pi.local:8787/event

For HTTP testing:

python pi_zero/expression_node.py --mode http --http-port 8787

Configuration

Secrets and feature switches live in .env.

Normal settings live in config.yaml.

Important switches:

ENABLE_EDTALKIES_API=false
ENABLE_PREMIUM_STT=false
ENABLE_PREMIUM_TTS=false
ENABLE_BODY_PHYSICS=true
ENABLE_ERROR_EMAIL=false
JAI_DEMO_MODE=false
PI_CONNECTION_TYPE=serial
PI_SERIAL_PORT=/dev/ttyACM0

Execution-Plan Routing

EdTalkies is the authoritative routing brain. After STT, M18 calls GetResolveIntentMetadataAsync and executes AiBotIntent.ExecutionPlan.Route. M18 does not classify current affairs, education generation, RAG, or local responses using query keywords.

Configure:

ENABLE_OPENAI=true
ENABLE_PREMIUM_STT=true
M18_PREMIUM_STT_MODE=openai
OPENAI_API_KEY=
M18_EDTALKIES_INTENT_PATH=/AiBotTalkies/GetResolveIntentMetadataAsync
M18_EDTALKIES_UPDATE_RESPONSE_PATH=/AiBotTalkies/UpdateGeneratedResponseAsync
OPENAI_RESPONSES_PATH=/v1/responses

Supported plan routes are OPENAI_DIRECT, OPENAI_WEB_SEARCH, GOVERNMENT_RAG, EDTALKIES_GENERATION, and LOCAL_RESPONSE. If intent resolution or its plan is invalid, M18 uses the configured OpenAI model as a short direct fallback. M18_RESPONSE_MODE and JAI_DEMO_MODE remain readable for configuration compatibility, but they do not override an EdTalkies execution plan.

Default local TTS uses Piper with the male en_US-hfc_male-medium voice. To switch voices, update:

M18_PIPER_MODEL=/opt/m18/models/piper/en_US-hfc_male-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-hfc_male-medium.onnx.json
M18_PIPER_LENGTH_SCALE=1.0

To set and verify the active local Piper voice on Jetson:

cd ~/aritworks/projects/m18.engine.com
bash scripts/set_piper_voice.sh hfc_male
sudo systemctl restart m18-engine

The model downloader installs these Piper voices by default:

M18_PIPER_VOICES=joe bryce norman hfc_male

To test another installed local voice, point .env at that model:

M18_PIPER_MODEL=/opt/m18/models/piper/en_US-bryce-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-bryce-medium.onnx.json

or:

M18_PIPER_MODEL=/opt/m18/models/piper/en_US-norman-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-norman-medium.onnx.json

or:

M18_PIPER_MODEL=/opt/m18/models/piper/en_US-hfc_male-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-hfc_male-medium.onnx.json

To audition installed local Piper voices on Jetson:

cd /opt/m18/app
/opt/m18/venv/bin/python scripts/audition_piper_voices.py

Default OpenAI TTS uses the male-sounding onyx voice:

OPENAI_TTS_VOICE=onyx
OPENAI_TTS_INSTRUCTIONS=Speak clearly, warmly, and naturally in a plain adult male voice.

Wake-on-sound tuning:

M18_WAKE_ON_ANY_SOUND=true
M18_MIN_WAKE_RMS=80
M18_MIN_WAKE_PEAK=500
M18_ACTIVE_SESSION_SECONDS=30

Main Modules

m18_engine/runtime.py: main state machine
m18_engine/stt_engine.py: STT provider factory
m18_engine/tts_engine.py: TTS provider factory
m18_engine/edtalkies_api.py: EdTalkies chat/API adapter
m18_engine/query_filter.py: invalid query and echo rejection
m18_engine/health_engine.py: health checks
m18_engine/error_engine.py: logs and optional email alerts
m18_engine/body_physics_engine.py: Pi expression command transport
m18_engine/pi_client.py: Pi client compatibility export
m18_engine/services/vision/vision_service.py: optional background camera service

Verification

python -m compileall m18_engine pi_zero tests
python -m unittest discover -s tests
python -m m18_engine --diagnostics

On a development machine without Jetson audio/model dependencies, diagnostics will report providers as not ready. That is expected until scripts/install_jetson.sh and scripts/download_models.sh run on the Jetson.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

M18 Engine

Hardware Target

Runtime Flow

Important Echo Protection

Install On Jetson

Optional Vision Service

Vision Placement And Calibration

Install On Raspberry Pi Zero 2 W

Configuration

Execution-Plan Routing

Main Modules

Verification

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
m18_engine		m18_engine
pi_zero		pi_zero
responses		responses
scripts		scripts
systemd		systemd
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
REPOSITORY_SYNC_TEST.txt		REPOSITORY_SYNC_TEST.txt
config.yaml		config.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

M18 Engine

Hardware Target

Runtime Flow

Important Echo Protection

Install On Jetson

Optional Vision Service

Vision Placement And Calibration

Install On Raspberry Pi Zero 2 W

Configuration

Execution-Plan Routing

Main Modules

Verification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages