Numba-jit statevector backend#518
Open
matulni wants to merge 27 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #518 +/- ##
==========================================
- Coverage 88.85% 87.10% -1.76%
==========================================
Files 49 49
Lines 7135 7334 +199
==========================================
+ Hits 6340 6388 +48
- Misses 795 946 +151 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR replaces the statevector backend with a new version orders of magnitude more efficient. It is largely inspired from Ref. [1].
Key features:
Functions modifying the quantum state (exponentially expensive) are jit compiled with numba.
Modifications of the quantum state are done in-place so data is not unnecessarily copied during the simulation. To that purpose,
StatevectorBackendhas a new class method constructorwith_capacitywhich preallocates a given space in memory. Typically, we pass it a parametern = Pattern.max_space()to allocate a complex-type array with size2**n. ThePatternSimualatorclass handles this internally, so the API for pattern or circuit simulation remains the same.Adaptive parallelization: jit-compiled functions are parallelized for qubit counts larger than
NUM_QUBIT_PARALLEL. This compile constant (empirically determined to be 15), is the number above which the multi-thread overhead does not compensate the speed gains.In addition:
graphix-symbolicplugin. See accompanying PR: Add symbolic backends graphix-symbolic#9DenseStateare modified to follow a consistent naming. This effectively changes the public API.test_statevec.pyandtest_statevectorbackend.pyare unified in a single file.Discussion
Below I show the profiling of simulating a QFT circuit on 23 qubits (from the Munich Benchmark suite) . With the current transpiler, this pattern has
max_space=24and the following command count:{'N': 2523, 'E': 3023, 'M': 2523, 'Z': 23, 'X': 23}:Execution time is dominated by the measurement process (in particular,
remove_qubitsubcall), so further optimization efforts should focus on this part of the simulation pipeline.Current implementation is probably as good as we can do with numba. I recommend reading this very insighful thread: https://stackoverflow.com/questions/79948374/improving-efficiency-of-numba-jit-function (credits to Jérôme Richard @zephyr111).
If we want to push this further, it may be worth to write specialized code for measurements on the XY, XZ and YZ planes. Currently, measurement needs four passes over
psi(see methodgraphix.sim.base_backend.DenseStateBackend.measure):expectation_single).evolve_single).remove_qubit).remove_qubit).Step 1) is not necessary if we use the constant branch selector (this is legit for deterministic patterns, and btw, the pattern above runs in 78.9s with the 0-branch selector).
I suspect it is possible to merge steps 2) and 3) in a single pass which may give a ~30% speed improvement.
P.S. Comments by @thierry-martinez in the preliminary discussion have been addressed.
References
[1] McGuffin, M. J., Robert J-M., and Ikeda K. "How to Write a Simulator for Quantum Circuits from Scratch: A Tutorial.", 2025 (arXiv:2506.08142).
UPDATE (01/06/2026)
In commit 3ddc462 I reimplemented the method
StatevectorBackend.measureto mergeevolve_singleandremove_qubitinto a single callproject_qubitas discussed above. Effectively, we are fusing two loops overpsiwhich is advantageous in memory-bound programs. New implementation of qubit measurement is about a 23% faster:In the new implementation of the qubit-removal (step 3), we compute the squared norm of the two subvectors in the same loop. This is not the most efficient choice, since most of the times we only need the norm of one subvector. However, I observed that the difference is negligible in the example with 24 qubits, and calculating it separately leads to much more code repetition. Note that in both cases, we have to read all elements of
psito apply the projector operator which is probably more costly than doing the arithmetic operation to compute the squared norm.TODO
Update CHANGELOG