Ph.D. student @ HKUST(GZ) · Research Intern @ Mind Lab
Efficient and reliable LLMs: inference, long context, KV cache, retrieval, and agentic workflows.
|
40+ stars personal public non-fork repos |
8.4k+ / 1.1k+ contributed projects: LMFlow / kvpress |
benchmark → method → artifact how I like research to ship |
|
Inference efficiency KV-cache compression, token-efficient reasoning, energy-to-token evaluation, serving bottlenecks. |
Long-context evaluation Generation-focused benchmarks, dense reasoning integrity, multi-turn coherence. |
|
Agent systems Tool use, post-training, harness design, local-first agent workflow infrastructure. |
Research infrastructure Reproducible artifacts, project pages, scholar tracking, figure and report tooling. |
|
Contributed to an extensible toolkit for fine-tuning and inference of large foundation models. |
Long-context generation benchmark for coherent, context-aware long-form responses. |
|
Policy-conditioned live-market evaluation for LLM trading agents. Benchmark the policy, not just the model. |
Local-first agent task hub with SQLite queueing, dependency-aware dispatch, templates, and dashboards. |
|
Adapters between XML-like tool calls and OpenAI-style structured tool-call histories. |
Project page for evaluating LLM inference as energy-to-token production. |
long-context generation ──┬── LongGenBench
├── semantic integrity under KV compression
└── multi-turn coherence / FlowKV
agent capability eval ────┬── QuantArena
├── tool-use adapters
└── local-first agent workflow runtime
efficient inference ──────┬── ChunkKV / KV compression
├── token-efficient reasoning
└── energy-to-token production




