ENH: Replace C extension with pure Python _dbdreader#34
Open
jklymak wants to merge 5 commits into
Open
Conversation
Drop the compiled _dbdreader C extension in favour of a pure Python/NumPy implementation (dbdreader/_dbdreader.py). The public API of get() is identical to the C extension, including the (error_no, result) return convention. The pure Python version uses a two-pass strategy: a cheap Python loop to locate cycle boundaries, followed by vectorised NumPy operations over all cycles at once, avoiding per-cycle array allocations. Changes: - Add dbdreader/_dbdreader.py (pure Python replacement) - dbdreader/dbdreader.py: use relative import (from . import _dbdreader) - setup.py: remove ext_modules and all lz4/gcc build machinery - pyproject.toml: remove [tool.setuptools.ext-modules]; mark OS Independent - MANIFEST.in: remove C source file entries - INSTALL.rst: update install instructions; drop gcc/compiler requirement - Move extension/ and lz4/ source trees to _old_extension/ and _old_lz4/ The lz4 Python package (already a dependency) handles decompression via dbdreader.decompress, so no build toolchain is required.
For files with large sensor counts (e.g. 1696 sensors, n_state_bytes=424), the Pass 1 inner loop over state bytes dominated runtime. Replace with a numpy fancy-index + sum when n_state_bytes >= 32, cutting ~100M Python list-index ops per 400-file run to ~400K numpy calls. Also truncate Pass 2 state-field decoding to max(requested sensor index)+1 columns instead of all n_sensors, so numpy work scales with the highest- indexed requested sensor rather than the full file sensor count. Result (400 files, 1696 sensors/file): 2 sensors: 37.8s → 28.7s (vs C: 25.5s) 15 sensors: 43.9s → 34.1s (vs C: 46.2s)
Contributor
Author
|
Latest commit d944a3a has some more optimizations in it. For a test file: import dbdreader
import time
print(dbdreader.__file__)
search = '*.[d|e]cd'
indir = 'delayed_raw_sub'
cachedir = 'cac'
dbd = dbdreader.DBD('delayed_raw_sub/03130024.dcd', cacheDir='cac')
print("total sensors:", dbd.headerInfo['sensors_per_cycle'])
print("sci_m_present_time index:", dbd.parameterNames.index('sci_m_present_time'))
print("m_depth index:", dbd.parameterNames.index('m_depth'))
sources = [
"sci_m_present_time",
"m_gps_lat",
"m_gps_lon",
"m_heading",
"m_pitch",
"m_roll",
"c_wpt_lat",
"c_wpt_lon",
"sci_water_cond",
"sci_water_temp",
"sci_water_pressure",
"sci_flbbcd_chlor_units",
"sci_flbbcd_cdom_units",
"sci_flbbcd_bb_units",
"sci_oxy4_oxygen",
]
sourcegroups = [["sci_m_present_time", "m_depth"],
["sci_m_present_time", "m_depth", "sci_water_cond"],
["sci_m_present_time", "m_depth", "sci_water_cond", "sci_water_pressure"],
sources]
for N in range(4):
source = sourcegroups[N]
print(f'source={source}')
start = time.perf_counter()
dbd = dbdreader.MultiDBD(pattern=f'{indir}/{search}',
cacheDir=cachedir)
data = dbd.get_sync(*source)
end = time.perf_counter()
print(f'Elapsed time: {end-start:.2f} seconds')Working with 400 files, we get for the C-code: For the python-only code: So a bit slower for few sensors, a bit faster (perhaps just noise) for more sensors. |
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Drop the compiled _dbdreader C extension in favour of a pure Python/NumPy implementation (dbdreader/_dbdreader.py). The public API of get() is identical to the C extension, including the (error_no, result) return convention.
The pure Python version uses a two-pass strategy: a cheap Python loop to locate cycle boundaries, followed by vectorised NumPy operations over all cycles at once, avoiding per-cycle array allocations.
Changes:
The lz4 Python package (already a dependency) handles decompression via dbdreader.decompress, so no build toolchain is required.
Performance:
For the most files in my dataset, C is still a bit faster, though there was some jitter in the times. However, overall the python code is often a bit faster than the C.
For sure, it seems likely/possible that the C code could be made faster if it was also optimized. However, I think the argument here is that the python is "fast enough".
AI use:
This was just done with Claude and a few prompts to tell it it was slower than the C code so it found the speed up by preallocating the arrays.