µShell is a hardware/OS co-design for modular accelerator deployment. Inspired by the microkernel principle, individual hardware modules (FFT, RSA, AES, …) are deployed into separate vFPGAs and dynamically chained together by a host-side dataflow graph (DFG) API to compose end-to-end accelerators.
This repo contains the µShell shell, runtime, driver, example modules, and the
end-to-end applications used in the paper. The accompanying baseline (the same
applications written against an unmodified Coyote shell) lives at
TUM-DSE/microShell, branch baseline.
Figure 11 — End-to-end performance: µShell vs. Coyote baseline and a monolithic single-binary variant, across the five composed applications.
Figure 12 — Component-aware scheduling vs. Coyote's FIFO scheduler, across five metrics: (a) end-to-end latency, (b) reconfiguration count, (c) average response time, (d) tail (95%) response time, (e) deadline misses.
Figure 13 — Application-deployment overhead: µShell capability/buffer updates vs. Coyote partial-reconfiguration cost, for accelerators of 1–4 user-logic components.
These are the hardware and software environment on our servers.
- AMD EPYC 7413 CPU × 2
- Xilinx Alveo U280 FPGA × 2
- 100 GbE FPGA-attached NIC
- Bitstream generation (Vivado) and FPGA tests can run on the same host or be split across a build host and an FPGA host to keep Vivado off the test path.
- Linux 6.9.0-rc7 / NixOS 23.11
- Nix — all build dependencies are pinned
via
shell.nix(host build/run) andxilinx-shell(Vivado toolchain) - Vivado 2022.x (loaded by
xilinx-shell) - Python ≥ 3.11 for the plotting scripts under
evaluation/scripts/
Due to the special hardware and software requirments, we provide ssh access to our evaluation machines so you don't have to help you. Please contact the paper author through hotcrp to obtain ssh keys. The machines will have the correct hardware and also software installed to run the experiments. If you run into problems you can contact us through hotcrp for further questions.
A "hello world"-equivalent run using perf_local — the smallest end-to-end test, a host loop across two vFPGAs.
The artifact uses two branches of this repo:
master— the µShell shell, runtime, and modular appsbaseline— the same applications written against an unmodified Coyote shell
Both branches need to be cloned and they are used by the REPRODUCE.md commands.
Clone the repo
cd ~
git clone [email protected]:TUM-DSE/microShell.git
git clone -b baseline [email protected]:TUM-DSE/microShell.git microShell_baseperf_local is shipped as a pre-built bitstream under bitstreams/perf_local/. This step installs the pre-compiled FPGA driver on the host and program the FPGA with the specified bitstream.
cd microShell
bash ./program_fpga.sh perf_localIf you see error rmmod: ERROR: Module coyote_drv is not currently loaded while the script is running, this is OK and the error can be ignored.
nix-shell shell.nix
mkdir build_perf_local_sw && cd build_perf_local_sw
cmake ../examples_sw/ -DEXAMPLE=perf_local
makeAssuming you are already in build_perf_local_sw
sudo ./bin/testThe application reports average throughput across two vFPGAs.
All the figures and tables from the paper can be generated using the collected execution data in evaluation/data/. The detailed step-by-step flow — bitstream generation, host-side measurements, CSV outputs — lives in REPRODUCE.md.
Assuming your repo is cloned at ~/microShell, run the following command to set up the environment:
cd ~\microShell
nix-shell shell.nix
cd evaluation/scripts/python3 e2e_6.1/plot_e2e.py
# → evaluation/plots/e2e_6.1/e2e.{pdf,png}python3 scheduling_6.2/plot_sched.py
# → evaluation/plots/scheduling_6.2/sched.{pdf,png}python3 deployment_6.3/plot_reconfig_overhead.py
# → evaluation/plots/deployment_6.3/reconfig_overhead.{pdf,png}python3 complexity_6.4/extract_complexity.py \
--baseline-csv ../data/complexity_6.4/complexity_baseline_results.csv \
--ushell-csv ../data/complexity_6.4/complexity_ushell_results.csv
# Prints Table rows to stdoutpython3 resource_usage_6.5/extract_util.py
# Prints Table 6 rows (Coyote, µShell, Inter 3/4/6/8, PCIe DMA, MMU, CEU) to stdout
python3 resource_usage_6.5/extract_modules.py
# Prints per-module utilization (Figure 5 source) to stdoutAll figures and tables in §6 of the paper are reproducible with the commands above. Numbers may differ from the paper run depending on system state (driver version, FPGA load, hugepage availability, transient queueing under Vivado-PR). Pre-built bitstreams under bitstreams/ let
you reproduce the host-side measurements without re-running Vivado.
microShell/
├── examples_hw/apps/ # HW pipelines (audio_processing, digital_signature, ...)
│ └── modules/ # single-module bring-ups (fft, rsa, sha256, ...)
├── examples_sw/apps/ # Host programs, mirrors examples_hw
│ ├── *_monolithic/ # single-binary versions used in §6.1
│ └── modules/ # per-module test programs
├── sw/{include,src}/ # µShell runtime: DFG API, capabilities, dataflow
├── driver/ # Linux kernel driver (Coyote-derived)
├── bitstreams/ # Pre-built .bit / .ltx, one folder per EXAMPLE target
├── evaluation/{scripts,data,plots}
├── program_fpga.sh # Load bitstream + driver + hugepages
├── shell.nix # Reproducible build environment
└── REPRODUCE.md # Full reproduction instructions
- Driver rmmod error -
rmmod: ERROR: Module coyote_drv is not currently loaded. This is OK if the script ends withvm.nr_hugepages = 1024. - Driver won't load —
sudo rmmod coyote_drv && sudo insmod driver/coyote_drv.ko. If that fails, reboot and retry; stuck driver state usually clears. - FPGA programming fails — verify
bitstreams/cyt_top.bitexists before runningprogram_fpga.sh. Checksudo dmesg | tail -50for PCIe / programming errors. - Hugepage shortage —
cat /proc/sys/vm/nr_hugepagesshould be ≥ 1024. Re-run thesysctlcommand if the number resets after reboot. - Test process hangs —
sudo pkill -9 test, then re-program the FPGA before retrying. - bThread could not be obtained, vfid: 0 — This error means the driver is not installed correctly. A reboot is usually needed.
MIT — see LICENSE.md. Portions derived from Coyote under BSD-3-Clause; original copyright headers retained per file.
@inproceedings{ushell,
title = {{µShell}: A Microkernel-based FPGA Shell Architecture},
author = {TBD},
year = {TBD},
note = {Citation pending publication.}
}