Skip to content

Numba-jit statevector backend#518

Open
matulni wants to merge 27 commits into
TeamGraphix:masterfrom
matulni:sv-backend
Open

Numba-jit statevector backend#518
matulni wants to merge 27 commits into
TeamGraphix:masterfrom
matulni:sv-backend

Conversation

@matulni
Copy link
Copy Markdown
Contributor

@matulni matulni commented Jun 1, 2026

This PR replaces the statevector backend with a new version orders of magnitude more efficient. It is largely inspired from Ref. [1].

Key features:

  • Functions modifying the quantum state (exponentially expensive) are jit compiled with numba.

  • Modifications of the quantum state are done in-place so data is not unnecessarily copied during the simulation. To that purpose, StatevectorBackend has a new class method constructor with_capacity which preallocates a given space in memory. Typically, we pass it a parameter n = Pattern.max_space() to allocate a complex-type array with size 2**n. The PatternSimualator class handles this internally, so the API for pattern or circuit simulation remains the same.

  • Adaptive parallelization: jit-compiled functions are parallelized for qubit counts larger than NUM_QUBIT_PARALLEL. This compile constant (empirically determined to be 15), is the number above which the multi-thread overhead does not compensate the speed gains.

In addition:

  • Support for symbolic simulation with the statevector backend is moved to the graphix-symbolic plugin. See accompanying PR: Add symbolic backends graphix-symbolic#9
  • Parameter names of methods in DenseState are modified to follow a consistent naming. This effectively changes the public API.
  • Test suit is extended and files test_statevec.py and test_statevectorbackend.py are unified in a single file.

Discussion

Below I show the profiling of simulating a QFT circuit on 23 qubits (from the Munich Benchmark suite) . With the current transpiler, this pattern has max_space=24 and the following command count: {'N': 2523, 'E': 3023, 'M': 2523, 'Z': 23, 'X': 23}:

image

Execution time is dominated by the measurement process (in particular, remove_qubit subcall), so further optimization efforts should focus on this part of the simulation pipeline.
Current implementation is probably as good as we can do with numba. I recommend reading this very insighful thread: https://stackoverflow.com/questions/79948374/improving-efficiency-of-numba-jit-function (credits to Jérôme Richard @zephyr111).

If we want to push this further, it may be worth to write specialized code for measurements on the XY, XZ and YZ planes. Currently, measurement needs four passes over psi (see method graphix.sim.base_backend.DenseStateBackend.measure):

  1. Compute expectation value of projector (expectation_single).
  2. Evolve the statevector with the appropriate projector (evolve_single).
  3. Compute norm of 0 and 1 branches of the resulting statevector (remove_qubit).
  4. Normalize the selected statevector (remove_qubit).

Step 1) is not necessary if we use the constant branch selector (this is legit for deterministic patterns, and btw, the pattern above runs in 78.9s with the 0-branch selector).
I suspect it is possible to merge steps 2) and 3) in a single pass which may give a ~30% speed improvement.

P.S. Comments by @thierry-martinez in the preliminary discussion have been addressed.

References

[1] McGuffin, M. J., Robert J-M., and Ikeda K. "How to Write a Simulator for Quantum Circuits from Scratch: A Tutorial.", 2025 (arXiv:2506.08142).

UPDATE (01/06/2026)

In commit 3ddc462 I reimplemented the method StatevectorBackend.measure to merge evolve_single and remove_qubit into a single call project_qubit as discussed above. Effectively, we are fusing two loops over psi which is advantageous in memory-bound programs. New implementation of qubit measurement is about a 23% faster:

image

In the new implementation of the qubit-removal (step 3), we compute the squared norm of the two subvectors in the same loop. This is not the most efficient choice, since most of the times we only need the norm of one subvector. However, I observed that the difference is negligible in the example with 24 qubits, and calculating it separately leads to much more code repetition. Note that in both cases, we have to read all elements of psi to apply the projector operator which is probably more costly than doing the arithmetic operation to compute the squared norm.

TODO

Update CHANGELOG

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 53.91061% with 165 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.10%. Comparing base (6dd835a) to head (f4bdad0).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
graphix/sim/statevec.py 47.77% 164 Missing ⚠️
graphix/simulator.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #518      +/-   ##
==========================================
- Coverage   88.85%   87.10%   -1.76%     
==========================================
  Files          49       49              
  Lines        7135     7334     +199     
==========================================
+ Hits         6340     6388      +48     
- Misses        795      946     +151     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant