Skip to content

ENH: Replace C extension with pure Python _dbdreader#34

Open
jklymak wants to merge 5 commits into
smerckel:masterfrom
jklymak:enh-remove-C
Open

ENH: Replace C extension with pure Python _dbdreader#34
jklymak wants to merge 5 commits into
smerckel:masterfrom
jklymak:enh-remove-C

Conversation

@jklymak

@jklymak jklymak commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Drop the compiled _dbdreader C extension in favour of a pure Python/NumPy implementation (dbdreader/_dbdreader.py). The public API of get() is identical to the C extension, including the (error_no, result) return convention.

The pure Python version uses a two-pass strategy: a cheap Python loop to locate cycle boundaries, followed by vectorised NumPy operations over all cycles at once, avoiding per-cycle array allocations.

Changes:

  • Add dbdreader/_dbdreader.py (pure Python replacement)
  • dbdreader/dbdreader.py: use relative import (from . import _dbdreader)
  • setup.py: remove ext_modules and all lz4/gcc build machinery
  • pyproject.toml: remove [tool.setuptools.ext-modules]; mark OS Independent
  • MANIFEST.in: remove C source file entries
  • INSTALL.rst: update install instructions; drop gcc/compiler requirement
  • Move extension/ and lz4/ source trees to _old_extension/ and _old_lz4/

The lz4 Python package (already a dependency) handles decompression via dbdreader.decompress, so no build toolchain is required.

Performance:

For the most files in my dataset, C is still a bit faster, though there was some jitter in the times. However, overall the python code is often a bit faster than the C.

Files Python (s) C (s)
100 11.03 16.72
200 23.77 24.02
300 79.65 109.75
462 147.69 134.36

For sure, it seems likely/possible that the C code could be made faster if it was also optimized. However, I think the argument here is that the python is "fast enough".

AI use:

This was just done with Claude and a few prompts to tell it it was slower than the C code so it found the speed up by preallocating the arrays.

jklymak added 4 commits June 22, 2026 17:08
Drop the compiled _dbdreader C extension in favour of a pure
Python/NumPy implementation (dbdreader/_dbdreader.py). The public
API of get() is identical to the C extension, including the
(error_no, result) return convention.

The pure Python version uses a two-pass strategy: a cheap Python
loop to locate cycle boundaries, followed by vectorised NumPy
operations over all cycles at once, avoiding per-cycle array
allocations.

Changes:
- Add dbdreader/_dbdreader.py (pure Python replacement)
- dbdreader/dbdreader.py: use relative import (from . import _dbdreader)
- setup.py: remove ext_modules and all lz4/gcc build machinery
- pyproject.toml: remove [tool.setuptools.ext-modules]; mark OS Independent
- MANIFEST.in: remove C source file entries
- INSTALL.rst: update install instructions; drop gcc/compiler requirement
- Move extension/ and lz4/ source trees to _old_extension/ and _old_lz4/

The lz4 Python package (already a dependency) handles decompression
via dbdreader.decompress, so no build toolchain is required.
For files with large sensor counts (e.g. 1696 sensors, n_state_bytes=424),
the Pass 1 inner loop over state bytes dominated runtime.  Replace with a
numpy fancy-index + sum when n_state_bytes >= 32, cutting ~100M Python
list-index ops per 400-file run to ~400K numpy calls.

Also truncate Pass 2 state-field decoding to max(requested sensor index)+1
columns instead of all n_sensors, so numpy work scales with the highest-
indexed requested sensor rather than the full file sensor count.

Result (400 files, 1696 sensors/file):
  2 sensors:  37.8s → 28.7s  (vs C: 25.5s)
  15 sensors: 43.9s → 34.1s  (vs C: 46.2s)
@jklymak

jklymak commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Latest commit d944a3a has some more optimizations in it.

For a test file:

import dbdreader
import time

print(dbdreader.__file__)

search = '*.[d|e]cd'
indir = 'delayed_raw_sub'
cachedir = 'cac'

dbd = dbdreader.DBD('delayed_raw_sub/03130024.dcd', cacheDir='cac')
print("total sensors:", dbd.headerInfo['sensors_per_cycle'])
print("sci_m_present_time index:", dbd.parameterNames.index('sci_m_present_time'))
print("m_depth index:", dbd.parameterNames.index('m_depth'))

sources = [
    "sci_m_present_time",
    "m_gps_lat",
    "m_gps_lon",
    "m_heading",
    "m_pitch",
    "m_roll",
    "c_wpt_lat",
    "c_wpt_lon",
    "sci_water_cond",
    "sci_water_temp",
    "sci_water_pressure",
    "sci_flbbcd_chlor_units",
    "sci_flbbcd_cdom_units",
    "sci_flbbcd_bb_units",
    "sci_oxy4_oxygen",
]

sourcegroups = [["sci_m_present_time", "m_depth"],
                ["sci_m_present_time", "m_depth", "sci_water_cond"],
                ["sci_m_present_time", "m_depth", "sci_water_cond", "sci_water_pressure"],
                sources]

for N in range(4):
    source = sourcegroups[N]
    print(f'source={source}')
    start = time.perf_counter()
    dbd = dbdreader.MultiDBD(pattern=f'{indir}/{search}',
                                cacheDir=cachedir)
    data = dbd.get_sync(*source)
    end = time.perf_counter()
    print(f'Elapsed time: {end-start:.2f} seconds')

Working with 400 files, we get for the C-code:

pixi run python testSpeed.py
/Users/jklymak/Downloads/dfo-hal1002-20260127/.pixi/envs/default/lib/python3.14/site-packages/dbdreader/__init__.py
total sensors: 1696
sci_m_present_time index: 812
m_depth index: 534
source=['sci_m_present_time', 'm_depth']
Elapsed time: 25.53 seconds
source=['sci_m_present_time', 'm_depth', 'sci_water_cond']
Elapsed time: 29.19 seconds
source=['sci_m_present_time', 'm_depth', 'sci_water_cond', 'sci_water_pressure']
Elapsed time: 33.46 seconds
source=['sci_m_present_time', 'm_gps_lat', 'm_gps_lon', 'm_heading', 'm_pitch', 'm_roll', 'c_wpt_lat', 'c_wpt_lon', 'sci_water_cond', 'sci_water_temp', 'sci_water_pressure', 'sci_flbbcd_chlor_units', 'sci_flbbcd_cdom_units', 'sci_flbbcd_bb_units', 'sci_oxy4_oxygen']
Elapsed time: 46.16 seconds

For the python-only code:

pixi run python testSpeed.py
Python only!
/Users/jklymak/Dropbox/dbdreader/dbdreader/__init__.py
total sensors: 1696
sci_m_present_time index: 812
m_depth index: 534
source=['sci_m_present_time', 'm_depth']
Elapsed time: 28.72 seconds
source=['sci_m_present_time', 'm_depth', 'sci_water_cond']
Elapsed time: 31.94 seconds
source=['sci_m_present_time', 'm_depth', 'sci_water_cond', 'sci_water_pressure']
Elapsed time: 31.70 seconds
source=['sci_m_present_time', 'm_gps_lat', 'm_gps_lon', 'm_heading', 'm_pitch', 'm_roll', 'c_wpt_lat', 'c_wpt_lon', 'sci_water_cond', 'sci_water_temp', 'sci_water_pressure', 'sci_flbbcd_chlor_units', 'sci_flbbcd_cdom_units', 'sci_flbbcd_bb_units', 'sci_oxy4_oxygen']
Elapsed time: 34.09 seconds

So a bit slower for few sensors, a bit faster (perhaps just noise) for more sensors.

@jklymak jklymak mentioned this pull request Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant