I build high-performance systems for modern datacenters, focused on making AI/ML workloads efficient at scale — across networking, memory, and runtime layers.
- 🔬 Visiting researcher at the Cambridge Computer Laboratory (with Andrew Moore)
- 🎓 PhD in Computer Systems, Queen Mary University of London (adv. Gianni Antichi & Brent Stephens)
- ⚡ Deep experience with Compute Express Link (CXL) across QEMU and the Linux kernel
- 🧩 Comfortable across the stack — kernel, networking datapaths, emulation, and runtimes
- qemu-cxl — CXL device & memory experiments in QEMU (correct + performant non-interleaved path, MHSLD / dynamic capacity)
- qemu-lab — QEMU sandbox tracking upstream, for fast systems experiments
- httpd-ab — ApacheBench extended with nanosecond timing + per-request tracing
- Cook-RDMA — curated, tested RDMA examples
Backdraft — Lossless virtual switch · USENIX NSDI 2022 First author. A software virtual switch that eliminates the slow-receiver problem in data planes via per-flow queues, dynamic buffering, and backpressure-aware scheduling. I led the design, the DPDK implementation, and the evaluation.
Morpheus — Run-time data-plane optimization · ASPLOS 2022 Co-author. A framework that specializes software data planes to their actual workload, applying domain-specific optimizations at run time for large throughput gains. I contributed to the system design and experimental evaluation.
machnet — Low-latency cloud messaging (Microsoft) Contributor. DPDK-based messaging for public-cloud VMs (~750K RPS, 61µs P99.9 on Azure). I worked on the SACK-based reliable-transport path and hardened the packet-processing fast path for correctness and maintainability.
qemu-cxl — CXL emulation for systems research Added a correct and performant emulation path for non-interleaved Compute Express Link (CXL) memory configurations in QEMU — used to study how datacenter memory should evolve.
CXL & memory systems · RDMA / high-performance networking · systems for AI/ML · datacenter energy efficiency




