LLM inference researcher · AI infrastructure engineer · Embedded systems developer
| Domain | Technologies |
|---|---|
| LLM Inference | llama.cpp, GGUF, EXL2, speculative decoding, EAGLE, MTP, MoE routing, quantization (Q4_K_M, Q5_K_S) |
| GPU Compute | CUDA workarounds for unsupported hardware (NVIDIA Tesla P40, RTX 3050), sm_61 compatibility, VRAM optimization |
| AI Infrastructure | Ollama, OpenClaw gateway, Hugging Face Hub, automated model benchmarking, EXL2 conversion pipelines |
| Automation | systemd services & timers, watchdog daemons, Telegram notification bots, bash orchestration |
| Embedded | ESP32, Arduino R4 WiFi, MQTT, Tailscale, HID devices, OLED/LCD displays, thermal sensing |
| Languages & Tools | Python, C++, C, Bash, CMake, Flutter, Flask, Node.js, OpenCV |
- auto-quantization-pipeline-gguf — Automated GGUF quantization of new LLM checkpoints with benchmark-driven calibration for P40 / RTX 3050
- automated-exl2-conversion-validation-pipeline — C++/Python toolchain for architecture-level fixes on Qwen MoE models, EXL2 conversion, mixed quantization, and validation benchmarks
- benchmark-4-agent-wrappers-on-qwen3627b-llamacpp — Comparative benchmark of four agent wrappers (Pi, OpenCode, Hermes, Qwen-Code) on Qwen3.6-27B quantized via llama.cpp
- add-video-input-support-to-llamacpp-mtmd — Video input integration for llama.cpp with CMake flags and Python client for frame acquisition
- nex2-mini-phase-twin-30b-lowvram-gguf-model — Low-VRAM GGUF model ready for deployment on consumer hardware
- benchmark-4-agent-wrappers-latency-vram — Latency, VRAM, and output quality profiling for agent wrappers on llama.cpp
- ai-dashboard — Local monitoring dashboard (port :9190) with GPU status, AGENDA system, task management, unified security scanner, and auto-generated ideation pipeline
- sistema-di-benchmarking-automatizzato-per-nuovi-mo — Automated GGUF benchmarking system for new models running on Tesla P40 and RTX 3050 with automatic report generation
- openclaw — Node.js Ollama gateway service for model routing and orchestration
- automazione-boot-watchdog-ai-avanzato — Advanced systemd watchdog monitoring llama-stack with VRAM, token rate, anomaly detection, and Telegram alerting
- voice-dictate — Local Whisper-based dictation system using Claude Code as a GPU-accelerated alternative to native voice mode, optimized on Italian benchmarks
- bias-personalizzato-per-whisper-locale — Custom bias configuration and fine-tuning for local Whisper transcription
- megatool — OSINT platform in C++ with Flask web app (port :7788), AI photo analysis, and offline EXIF/GPS geotagging module
- reddit-monitor — Automated subreddit scanner for AI/tech content piped into AGENDA idea generation with configurable loop interval
- automazione-systemd-timer-html-minification — systemd-timer-driven HTML minifier for automated static site updates and GitHub Pages deployments
- secure-llm-context-vault — Encrypted archive system for managing and persisting LLM conversation contexts
- bot-short — Telegram bot with AI-powered SVG graphics pipeline
- ai-home-assistant-hid-dashboard — Hardware dashboard (Arduino R4 WiFi + ESP32) displaying P40/3050 VRAM, tok/s, uptime via MQTT over Tailscale
- ai-model-selector-physical-controller — ESP32-based physical model selector with rotary encoder, OLED feedback, HID interfacing, and OpenClaw gateway integration
- controller-termico-proattivo-esp32 — Proactive thermal fan controller on ESP32 with multi-sensor input
- digital-thermal-lcd — Thermalright temperature display via USB HID and LCD output