µShell: a microkernel-based FPGA shell architecture

µShell is a hardware/OS co-design for modular accelerator deployment. Inspired by the microkernel principle, individual hardware modules (FFT, RSA, AES, …) are deployed into separate vFPGAs and dynamically chained together by a host-side dataflow graph (DFG) API to compose end-to-end accelerators.

This repo contains the µShell shell, runtime, driver, example modules, and the end-to-end applications used in the paper. The accompanying baseline (the same applications written against an unmodified Coyote shell) lives at TUM-DSE/microShell, branch baseline.

Main Results

Figure 11 — End-to-end performance: µShell vs. Coyote baseline and a monolithic single-binary variant, across the five composed applications.

Figure 12 — Component-aware scheduling vs. Coyote's FIFO scheduler, across five metrics: (a) end-to-end latency, (b) reconfiguration count, (c) average response time, (d) tail (95%) response time, (e) deadline misses.

Figure 13 — Application-deployment overhead: µShell capability/buffer updates vs. Coyote partial-reconfiguration cost, for accelerators of 1–4 user-logic components.

Prerequisites

These are the hardware and software environment on our servers.

Hardware

AMD EPYC 7413 CPU × 2
Xilinx Alveo U280 FPGA × 2
100 GbE FPGA-attached NIC
Bitstream generation (Vivado) and FPGA tests can run on the same host or be split across a build host and an FPGA host to keep Vivado off the test path.

Software

Linux 6.9.0-rc7 / NixOS 23.11
Nix — all build dependencies are pinned via shell.nix (host build/run) and xilinx-shell (Vivado toolchain)
Vivado 2022.x (loaded by xilinx-shell)
Python ≥ 3.11 for the plotting scripts under evaluation/scripts/

For OSDI Evaluation Testers

Due to the special hardware and software requirments, we provide ssh access to our evaluation machines so you don't have to help you. Please contact the paper author through hotcrp to obtain ssh keys. The machines will have the correct hardware and also software installed to run the experiments. If you run into problems you can contact us through hotcrp for further questions.

Getting Started

A "hello world"-equivalent run using perf_local — the smallest end-to-end test, a host loop across two vFPGAs.

1. Clone the repository

The artifact uses two branches of this repo:

master — the µShell shell, runtime, and modular apps
baseline — the same applications written against an unmodified Coyote shell

Both branches need to be cloned and they are used by the REPRODUCE.md commands.

Clone the repo

cd ~
git clone [email protected]:TUM-DSE/microShell.git 
git clone -b baseline [email protected]:TUM-DSE/microShell.git microShell_base

2. Program the FPGA

perf_local is shipped as a pre-built bitstream under bitstreams/perf_local/. This step installs the pre-compiled FPGA driver on the host and program the FPGA with the specified bitstream.

cd microShell
bash ./program_fpga.sh perf_local

If you see error rmmod: ERROR: Module coyote_drv is not currently loaded while the script is running, this is OK and the error can be ignored.

3. Compile perf_local software

nix-shell shell.nix
mkdir build_perf_local_sw && cd build_perf_local_sw
cmake ../examples_sw/ -DEXAMPLE=perf_local
make

3. Run the host application

Assuming you are already in build_perf_local_sw

sudo ./bin/test

The application reports average throughput across two vFPGAs.

Detailed Instructions: Reproducing the paper results

All the figures and tables from the paper can be generated using the collected execution data in evaluation/data/. The detailed step-by-step flow — bitstream generation, host-side measurements, CSV outputs — lives in REPRODUCE.md.

Assuming your repo is cloned at ~/microShell, run the following command to set up the environment:

cd ~\microShell
nix-shell shell.nix
cd evaluation/scripts/

6.1 Performance (Figure 11)

python3 e2e_6.1/plot_e2e.py
# → evaluation/plots/e2e_6.1/e2e.{pdf,png}

6.2 Scheduling Improvements (Figure 12)

python3 scheduling_6.2/plot_sched.py
# → evaluation/plots/scheduling_6.2/sched.{pdf,png}

6.3 Application Deployment Overheads (Figure 13)

python3 deployment_6.3/plot_reconfig_overhead.py
# → evaluation/plots/deployment_6.3/reconfig_overhead.{pdf,png}

6.4 Programmability (Table 5)

python3 complexity_6.4/extract_complexity.py \
    --baseline-csv ../data/complexity_6.4/complexity_baseline_results.csv \
    --ushell-csv   ../data/complexity_6.4/complexity_ushell_results.csv
# Prints Table rows to stdout

6.5 Resource Overheads (Table 6)

python3 resource_usage_6.5/extract_util.py
# Prints Table 6 rows (Coyote, µShell, Inter 3/4/6/8, PCIe DMA, MMU, CEU) to stdout

python3 resource_usage_6.5/extract_modules.py
# Prints per-module utilization (Figure 5 source) to stdout

Artifact Claims

All figures and tables in §6 of the paper are reproducible with the commands above. Numbers may differ from the paper run depending on system state (driver version, FPGA load, hugepage availability, transient queueing under Vivado-PR). Pre-built bitstreams under bitstreams/ let you reproduce the host-side measurements without re-running Vivado.

Repository Layout

microShell/
├── examples_hw/apps/         # HW pipelines (audio_processing, digital_signature, ...)
│   └── modules/              #   single-module bring-ups (fft, rsa, sha256, ...)
├── examples_sw/apps/         # Host programs, mirrors examples_hw
│   ├── *_monolithic/         #   single-binary versions used in §6.1
│   └── modules/              #   per-module test programs
├── sw/{include,src}/         # µShell runtime: DFG API, capabilities, dataflow
├── driver/                   # Linux kernel driver (Coyote-derived)
├── bitstreams/               # Pre-built .bit / .ltx, one folder per EXAMPLE target
├── evaluation/{scripts,data,plots}
├── program_fpga.sh           # Load bitstream + driver + hugepages
├── shell.nix                 # Reproducible build environment
└── REPRODUCE.md              # Full reproduction instructions

Troubleshooting

Driver rmmod error - rmmod: ERROR: Module coyote_drv is not currently loaded. This is OK if the script ends with vm.nr_hugepages = 1024.
Driver won't load — sudo rmmod coyote_drv && sudo insmod driver/coyote_drv.ko. If that fails, reboot and retry; stuck driver state usually clears.
FPGA programming fails — verify bitstreams/cyt_top.bit exists before running program_fpga.sh. Check sudo dmesg | tail -50 for PCIe / programming errors.
Hugepage shortage — cat /proc/sys/vm/nr_hugepages should be ≥ 1024. Re-run the sysctl command if the number resets after reboot.
Test process hangs — sudo pkill -9 test, then re-program the FPGA before retrying.
bThread could not be obtained, vfid: 0 — This error means the driver is not installed correctly. A reboot is usually needed.

License

MIT — see LICENSE.md. Portions derived from Coyote under BSD-3-Clause; original copyright headers retained per file.

Citation

@inproceedings{ushell,
  title  = {{µShell}: A Microkernel-based FPGA Shell Architecture},
  author = {TBD},
  year   = {TBD},
  note   = {Citation pending publication.}
}

Name		Name	Last commit message	Last commit date
Latest commit History 508 Commits
.github/workflows		.github/workflows
Coyote		Coyote
bitstreams		bitstreams
cmake		cmake
docs		docs
driver-guest		driver-guest
driver		driver
evaluation		evaluation
examples_hw		examples_hw
examples_sw		examples_sw
hw		hw
img		img
scripts		scripts
sw		sw
util		util
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
README_coyote.md		README_coyote.md
REPRODUCE.md		REPRODUCE.md
auto_fpga.tcl		auto_fpga.tcl
extract_csv.sh		extract_csv.sh
extract_util.tcl		extract_util.tcl
program_fpga.sh		program_fpga.sh
run_vivado.sh		run_vivado.sh
run_vivado_project.sh		run_vivado_project.sh
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

µShell: a microkernel-based FPGA shell architecture

Main Results

Prerequisites

Hardware

Software

For OSDI Evaluation Testers

Getting Started

1. Clone the repository

2. Program the FPGA

3. Compile perf_local software

3. Run the host application

Detailed Instructions: Reproducing the paper results

6.1 Performance (Figure 11)

6.2 Scheduling Improvements (Figure 12)

6.3 Application Deployment Overheads (Figure 13)

6.4 Programmability (Table 5)

6.5 Resource Overheads (Table 6)

Artifact Claims

Repository Layout

Troubleshooting

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

µShell: a microkernel-based FPGA shell architecture

Main Results

Prerequisites

Hardware

Software

For OSDI Evaluation Testers

Getting Started

1. Clone the repository

2. Program the FPGA

3. Compile perf_local software

3. Run the host application

Detailed Instructions: Reproducing the paper results

6.1 Performance (Figure 11)

6.2 Scheduling Improvements (Figure 12)

6.3 Application Deployment Overheads (Figure 13)

6.4 Programmability (Table 5)

6.5 Resource Overheads (Table 6)

Artifact Claims

Repository Layout

Troubleshooting

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages