Skip to content

aritworksdev/m18.engine.com

Repository files navigation

M18 Engine

M18 Engine is the Jetson-side runtime for the M18 humanoid robot prototype. It coordinates wake-word listening, local and premium STT, query filtering, EdTalkies API calls, TTS, health/error handling, and facial/body expression commands for the Raspberry Pi Zero 2 W expression node.

Hardware Target

  • Brain: NVIDIA Jetson Orin Nano Developer Kit, 8GB
  • OS: Ubuntu / Jetson Linux
  • Microphone: reSpeaker XVF3800 USB Mic Array
  • Expression controller: Raspberry Pi Zero 2 W

Runtime Flow

Boot -> health check -> idle wake listener
Wake word or any clear sound -> active listening -> STT -> query filter
Valid query -> EdTalkies intent metadata -> ExecutionPlan route executor
Generated response -> EdTalkies persistence -> TTS -> active follow-up listening
No voice for 5 minutes -> partial idle / sleepy expression
Errors -> log -> optional email -> friendly message -> recovery

After M18 accepts a valid user request, normal user speech is ignored until the response completes and the microphone is flushed. M18 then returns to wake-word listening instead of processing room conversation captured during STT/API/TTS time. During TTS, only interrupt commands such as stop, cancel, quiet, and enough are listened for.

Simple greetings and identity prompts such as hello, hi, good morning, and what is your name are answered locally and spoken with local TTS. They do not call EdTalkies or OpenAI.

By default, idle wake is not limited to fixed wake words. Any clear sound above the configured wake threshold can move M18 into the active listening window. Wake words still work, but they are no longer required.

Supported wake names include M18 and JAI. JAI wake phrases include Jai, Hey Jai, Hi Jai, Hello Jai, Listen Jai, Okay Jai, Wake up Jai, Jai listen, Jai are you there, and Jai help me. The wake matcher is case-insensitive and accepts common STT variation jay.

Important Echo Protection

M18 does not send normal STT/API work while it is speaking. During TTS playback, only a lightweight stop-command listener is allowed. The engine also remembers its last spoken text and rejects likely self-echo before any API call.

Supported stop commands:

  • stop
  • stop speaking
  • enough
  • pause
  • be quiet

Install On Jetson

git clone https://github.com/aritworksdev/m18.engine.com.git
cd m18.engine.com
cp .env.example .env
bash scripts/install_jetson.sh
bash scripts/download_models.sh
python -m m18_engine --diagnostics

List audio devices:

python -m m18_engine --list-audio-devices

If the reSpeaker is not the default input, set it in .env:

M18_AUDIO_INPUT_DEVICE=2
M18_AUDIO_INPUT_GAIN=2.0

For Jetson/reSpeaker stability, bypass desktop default-device switching with ALSA. M18 can resolve auto: values on every startup by matching arecord -l / aplay -l device descriptions, so USB card numbers can change after reboot without breaking the service:

M18_AUDIO_CAPTURE_BACKEND=arecord
M18_ALSA_INPUT_DEVICE=auto:reSpeaker XVF3800
M18_AUDIO_PLAYBACK_BACKEND=aplay
M18_ALSA_OUTPUT_DEVICE=auto:KT USB Audio

If the exact device text is different on your Jetson, choose a stable unique part from:

arecord -l
aplay -l

Examples: auto:reSpeaker, auto:XVF3800, auto:KT USB Audio. Avoid numeric values such as plughw:0,0 unless you are testing temporarily.

Test the selected mic level:

python -m m18_engine --test-mic

Normal query capture uses frame-level voice activity detection so steady microphone noise does not invoke STT. Optional tuning values are:

M18_QUERY_VAD_NOISE_MULTIPLIER=2.0
M18_QUERY_VAD_MIN_ACTIVE_MS=180

Increase the multiplier slightly if steady room noise is still treated as speech. Lower it cautiously if quiet speakers are being missed.

Test wake recognition:

python -m m18_engine --test-wake

Optional Vision Service

M18 can optionally use a USB UVC camera for visual events such as person presence, face proximity, wave wake, and object detection. Vision is disabled by default and does not start the camera unless explicitly enabled.

Install optional vision dependencies on Jetson:

bash scripts/install_vision.sh

Enable the service in .env:

M18_VISION_ENABLED=true
M18_VISION_CAMERA_INDEX=0
M18_VISION_PERSON_DETECTION_ENABLED=true
M18_VISION_FACE_TRACKING_ENABLED=true
M18_VISION_GESTURE_DETECTION_ENABLED=true
M18_VISION_OBJECT_RECOGNITION_ENABLED=true
M18_VISION_EXPRESSION_RECOGNITION_ENABLED=false
M18_VISION_OCR_ENABLED=false
M18_VISION_TEXT_READING_ENABLED=true

Camera test:

python -m m18_engine --test-camera

Vision event test:

python -m m18_engine --test-vision --test-vision-seconds 10

The vision service runs in a background thread and publishes small internal events. The audio conversation loop remains the priority. Normal vision events are ignored while M18 is doing STT, API work, TTS, or interruption handling. A hand-wave event only acts as a visual wake when M18 is idle or wake-word listening.

Object recognition requires a compatible OpenCV DNN model. Configure these only after the model files are installed:

M18_VISION_OBJECT_MODEL_PATH=/opt/m18/models/vision/object.onnx
M18_VISION_OBJECT_MODEL_CONFIG_PATH=
M18_VISION_OBJECT_CLASSES_PATH=/opt/m18/models/vision/classes.txt

For the default YOLOv8n object detector:

bash scripts/download_vision_models.sh

Then use:

M18_VISION_OBJECT_RECOGNITION_ENABLED=true
M18_VISION_OBJECT_MODEL_PATH=/opt/m18/models/vision/yolov8n.onnx
M18_VISION_OBJECT_MODEL_CONFIG_PATH=
M18_VISION_OBJECT_CLASSES_PATH=/opt/m18/models/vision/coco.names

The first supported object responses are intentionally short and low-frequency: book, phone, bottle, cup, toy, scissors, and laptop. The COCO model does not reliably include small items such as pens; those need a custom trained model later.

Vision Placement And Calibration

If the USB UVC camera is wide-angle, keep important detections near the center of the image. Faces, objects, and text near the edges can distort and fail detection.

Recommended placement ranges:

Task Recommended position
Person detection Stand about 1-3 m / 3-10 ft from M18
Face tracking Keep face centered at 45-120 cm / 18-48 in
Close-face greeting Face centered at 35-70 cm / 14-28 in
Facial expression recognition Face centered at 45-90 cm / 18-36 in with good light
Hand wave Wave beside face/chest within 50-150 cm / 20-60 in
Object recognition Hold object centered at chest/face height, 35-90 cm / 14-36 in
Book cover / paper OCR Hold the page flat and centered, 25-55 cm / 10-22 in; the readable text area should fill at least 35% of frame width

Estimate camera horizontal FOV from a known visible width:

python -m m18_engine --vision-estimate-fov 170 100

This means: at 100 cm from the camera, the visible width is 170 cm.

Run live calibration preview:

python -m m18_engine --vision-calibrate --vision-calibrate-seconds 30

For headless Jetson testing, save one annotated frame:

python -m m18_engine --vision-calibrate-save /tmp/m18-vision-calibration.jpg

Calibration config:

M18_VISION_FACE_MIN_DISTANCE_CM=45
M18_VISION_FACE_MAX_DISTANCE_CM=120
M18_VISION_OBJECT_MIN_DISTANCE_CM=35
M18_VISION_OBJECT_MAX_DISTANCE_CM=90
M18_VISION_OCR_MIN_WIDTH_PERCENT=35
M18_VISION_CENTER_BOX_PERCENT=60
M18_VISION_FOV_HORIZONTAL_DEGREES=0
M18_VISION_KNOWN_FACE_WIDTH_CM=16

Set M18_VISION_FOV_HORIZONTAL_DEGREES after measuring the camera. When it is 0, calibration still shows guide boxes and size hints, but face distance is shown as unknown.

To enable local text reading from book covers or paper, install OCR support and enable OCR:

bash scripts/install_vision.sh
M18_VISION_OCR_ENABLED=true
M18_VISION_TEXT_READING_ENABLED=true
M18_VISION_TEXT_MIN_CHARACTERS=6
M18_VISION_TEXT_MAX_CHARACTERS=120
M18_VISION_TEXT_SCAN_INTERVAL_SECONDS=2
M18_VISION_TEXT_COOLDOWN_SECONDS=20

Text reading scans multiple center-weighted document areas and does not require a book object detection. It is intended for short notes, business cards, labels, paper pages, and book covers. For best results, hold the text flat, centered, bright, and close enough that the letters are sharp and large. Very small handwriting, curved pages, glare, or text near the wide-angle edge of the camera can still fail.

Run once:

bash scripts/run_m18.sh --listen-once

Run continuously:

bash scripts/run_m18.sh --loop

Install as a systemd service:

sudo bash scripts/install_service.sh
sudo systemctl enable --now m18-engine

Install On Raspberry Pi Zero 2 W

The first hardware path uses serial expression commands. HTTP mode is also available for development.

git clone https://github.com/aritworksdev/m18.engine.com.git
cd m18.engine.com
sudo bash scripts/install_pi_zero.sh
sudo systemctl enable --now m18-expression

Default Pi serial service:

/dev/ttyGS0 at 115200 baud

Jetson sends one newline-delimited JSON message to the Pi on every engine state change:

{"event":"engine_state","state":"listening","expression":"listening","intensity":70,"duration_ms":0}

Common states are idle, partial_idle, wake_detected, listening, thinking, speaking, error_recovery, and shutdown.

To view live Jetson-to-Pi expression messages from the Jetson without touching the serial port:

python scripts/show_pi_messages.py

For IP/network communication, run the Pi expression node in HTTP mode:

sudo tee /etc/m18-expression.env >/dev/null <<'EOF'
M18_PI_MODE=http
M18_PI_HTTP_HOST=0.0.0.0
M18_PI_HTTP_PORT=8787
EOF
sudo systemctl restart m18-expression

Then set Jetson .env:

PI_CONNECTION_TYPE=http
M18_EXPRESSION_ENDPOINT=http://m18pi.local:8787/event

For HTTP testing:

python pi_zero/expression_node.py --mode http --http-port 8787

Configuration

Secrets and feature switches live in .env.

Normal settings live in config.yaml.

Important switches:

ENABLE_EDTALKIES_API=false
ENABLE_PREMIUM_STT=false
ENABLE_PREMIUM_TTS=false
ENABLE_BODY_PHYSICS=true
ENABLE_ERROR_EMAIL=false
JAI_DEMO_MODE=false
PI_CONNECTION_TYPE=serial
PI_SERIAL_PORT=/dev/ttyACM0

Execution-Plan Routing

EdTalkies is the authoritative routing brain. After STT, M18 calls GetResolveIntentMetadataAsync and executes AiBotIntent.ExecutionPlan.Route. M18 does not classify current affairs, education generation, RAG, or local responses using query keywords.

Configure:

ENABLE_OPENAI=true
ENABLE_PREMIUM_STT=true
M18_PREMIUM_STT_MODE=openai
OPENAI_API_KEY=
M18_EDTALKIES_INTENT_PATH=/AiBotTalkies/GetResolveIntentMetadataAsync
M18_EDTALKIES_UPDATE_RESPONSE_PATH=/AiBotTalkies/UpdateGeneratedResponseAsync
OPENAI_RESPONSES_PATH=/v1/responses

Supported plan routes are OPENAI_DIRECT, OPENAI_WEB_SEARCH, GOVERNMENT_RAG, EDTALKIES_GENERATION, and LOCAL_RESPONSE. If intent resolution or its plan is invalid, M18 uses the configured OpenAI model as a short direct fallback. M18_RESPONSE_MODE and JAI_DEMO_MODE remain readable for configuration compatibility, but they do not override an EdTalkies execution plan.

Default local TTS uses Piper with the male en_US-hfc_male-medium voice. To switch voices, update:

M18_PIPER_MODEL=/opt/m18/models/piper/en_US-hfc_male-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-hfc_male-medium.onnx.json
M18_PIPER_LENGTH_SCALE=1.0

To set and verify the active local Piper voice on Jetson:

cd ~/aritworks/projects/m18.engine.com
bash scripts/set_piper_voice.sh hfc_male
sudo systemctl restart m18-engine

The model downloader installs these Piper voices by default:

M18_PIPER_VOICES=joe bryce norman hfc_male

To test another installed local voice, point .env at that model:

M18_PIPER_MODEL=/opt/m18/models/piper/en_US-bryce-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-bryce-medium.onnx.json

or:

M18_PIPER_MODEL=/opt/m18/models/piper/en_US-norman-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-norman-medium.onnx.json

or:

M18_PIPER_MODEL=/opt/m18/models/piper/en_US-hfc_male-medium.onnx
M18_PIPER_CONFIG=/opt/m18/models/piper/en_US-hfc_male-medium.onnx.json

To audition installed local Piper voices on Jetson:

cd /opt/m18/app
/opt/m18/venv/bin/python scripts/audition_piper_voices.py

Default OpenAI TTS uses the male-sounding onyx voice:

OPENAI_TTS_VOICE=onyx
OPENAI_TTS_INSTRUCTIONS=Speak clearly, warmly, and naturally in a plain adult male voice.

Wake-on-sound tuning:

M18_WAKE_ON_ANY_SOUND=true
M18_MIN_WAKE_RMS=80
M18_MIN_WAKE_PEAK=500
M18_ACTIVE_SESSION_SECONDS=30

Main Modules

  • m18_engine/runtime.py: main state machine
  • m18_engine/stt_engine.py: STT provider factory
  • m18_engine/tts_engine.py: TTS provider factory
  • m18_engine/edtalkies_api.py: EdTalkies chat/API adapter
  • m18_engine/query_filter.py: invalid query and echo rejection
  • m18_engine/health_engine.py: health checks
  • m18_engine/error_engine.py: logs and optional email alerts
  • m18_engine/body_physics_engine.py: Pi expression command transport
  • m18_engine/pi_client.py: Pi client compatibility export
  • m18_engine/services/vision/vision_service.py: optional background camera service

Verification

python -m compileall m18_engine pi_zero tests
python -m unittest discover -s tests
python -m m18_engine --diagnostics

On a development machine without Jetson audio/model dependencies, diagnostics will report providers as not ready. That is expected until scripts/install_jetson.sh and scripts/download_models.sh run on the Jetson.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors