xtan.ai explores a new approach to precision gesture interaction for professional 3D workflows such as digital content creation (DCC), CAD and virtual production.
Instead of relying purely on AI-based depth estimation, the system is designed around metric stereo geometry and deterministic tracking pipelines. The goal is to provide stable, low-latency spatial interaction that can be reliably integrated into professional software tools.
It is important to distinguish between two fundamentally different approaches:
-
Direct AI based recognition from camera images, where gestures are inferred directly from raw visual input. This approach is often less transparent, less deterministic, and more error prone under difficult real world conditions. It currently dominates many mainstream use cases.
-
Geometry first 3D reconstruction followed by structured recognition, where the system first reconstructs stable, high quality 3D data without relying on AI for the initial perception stage. Only afterward is the resulting 3D representation passed to a higher level model, such as a GCN, for gesture interpretation. When the 3D signal is clean and stable, this approach is typically more robust, more transparent, and often better suited for real time use with lower error rates. It can also simplify model training, improve the usefulness of augmentation, and make it easier to scale the recognition pipeline to additional gestures or application domains.
What this layer does: It converts poses and keypoints into high-level intents using gesture grammars, state machines, and context rules (tool modes, constraints, safety).
The layer handles:
- debounce
- disambiguation
- confidence scoring
- temporal consistency
The result is a set of deterministic, low-latency interaction events suitable for professional applications.
| 🧩 Module | 📝 Short Description | ⚖️ License | 🚦 Status | 🔗 Link |
|---|---|---|---|---|
| MotionCoder | Real-time gesture interpretation engine with state machine and context logic. | Apache-2.0 | 🟡 Planned | MotionCoder |
For high-precision tracking, the system can be combined with EdgeTrack stereo rigs, which provide synchronized NIR stereo capture and geometry-based depth reconstruction.
More information about the hardware layer can be found here:
What this layer does: It maps interaction intents from MotionCoder to application-native actions such as operators, hotkeys, API calls, or engine events.
Each connector module (Coder2XY) targets a specific software ecosystem and translates MotionCoder output into application commands.
| 🧩 Module | 📝 Short Description | 📲 Target System | ⚖️ License | 🚦 Status | 🔗 Link | |
|---|---|---|---|---|---|---|
| Coder2Blender | Add-on/API bridge: gestures → operators, hotkeys, nodes. | Blender | MIT | — | 🟡 Research (API exploration) | coming soon |
| Coder2Unreal | Plugin bridge: gestures → Blueprint/C++ events. | Unreal Engine | MIT | — | 🟡 Planned | coming soon |
| Coder2Dassault | Macro/API bridge: gestures → CAD commands. | Dassault (SolidWorks/CATIA) | MIT | — | 🟠 Targeted for next year | coming soon |
Modularity MotionCoder (gesture detection and semantics) remains independent from any target application.
Reuse One interpretation engine can support many software integrations.
Breadth Potentially 100+ software targets (CAD, DCC, robotics tools, assistive interfaces).
Maintainability API changes affect only the relevant adapter, not the core engine.
Portability Enables fast integration into new software ecosystems.
This layer includes optional hardware devices designed to improve ergonomics and interaction precision.
Examples include:
- clutch/confirm buttons
- mode switches
- haptic feedback
These devices communicate via BLE or USB and are designed to avoid NIR interference, ensuring compatibility with camera-based tracking systems.
Note: These peripherals do not require MotionCoder and can operate as standard input devices (HID).
| 🧩 Module | 📝 Short Description | 🔌 Hardware / Dependencies | ⚖️ License | 🚦 Status | 🔗 Link | |
|---|---|---|---|---|---|---|
| Pen3D | Tracked 3D pen with buttons and optional haptic feedback. | Optional ESP32-S3 (BLE) or mechanical design. | Apache-2.0 | BLE GATT notifications, deep-sleep wake-on-button, optional USB-CDC debug. Designed for 850 nm NIR environments. | 🟡 Planned | Pen3D |
| HMDone | Minimal VR headset concept using external marker-based tracking only. | Works with high-resolution HMDs (e.g. Pimax Crystal, Valve headsets). | Apache-2.0 | Designed for multi-view NIR tracking. Inside-out tracking intentionally ignored. | 🟠 Later | HMDone |
Coming soon.
The project is currently in the research and prototyping phase, focusing on:
- architecture definition
- stereo-based tracking experiments
- gesture interaction models
- early software integrations
More details will be published as the ecosystem evolves. 🚀