Releases: KernelTuner/kernel_tuner
Version 1.4.0
This release marks a large step for Kernel Tuner in terms of new functionality and changes. The most important additions/changes being:
- Parallel tuning on multiple GPUs and GPU clusters using Ray
- Multi-objective optimization through Pymoo
- Change of default CUDA backend from PyCuda to cuda-python
And of course many other smaller additions and fixes. Below is a more detailed overview of all merged pull requests:
What's Changed
- Add
AMDSMIObserverthat usesamdsmito measure energy by @stijnh in #372 - Added the best generated LLaMEA algorithms by @fjwillemsen in #368
- Add support for template kernels in the NVCUDA backend by @stijnh in #367
- Add support for template kernels in HIP by @stijnh in #366
- Programmatic imports by @benvanwerkhoven in #359
- Add parallel tuning on multiple remote GPUs using Ray by @isazi in #328
- Multiobjective optimization by @maric-a-b in #358
- update citation info by @benvanwerkhoven in #382
- Fix how UUID/PCI bus is acquired in CUDA backends and processed by NVML by @stijnh in #383
- add support for setting sm count in nvcuda backend by @benvanwerkhoven in #384
- update changelog by @benvanwerkhoven in #387
- Add
AMDSMIContinuousObserverby @stijnh in #386 - fix for using locked clocks by @benvanwerkhoven in #390
- Default cuda backend selection by @benvanwerkhoven in #385
- fix issue with nvml cleanup in case of crash, issue #377 by @benvanwerkhoven in #392
New Contributors
- @maric-a-b made their first contribution in #358
Full Changelog: 1.3.3...1.4.0
Version 1.3.3
Forgot to bump the version on the 1.3.2 release, fixing this by making a new release.
Version 1.3.2
This is a minor release that mostly fixes a few bugs and warnings. The largest addition is the addition of optimization strategies based on scikit-optimize.
What's Changed
- Add initial support for scikit-optimize minimize methods (
skopt) by @stijnh in #340 - Bump nbconvert from 7.16.6 to 7.17.0 in /doc by @dependabot[bot] in #360
- Handle special case where error is a tuple in
cuda_error_checkby @stijnh in #361 - Fix overflow dual annealing by @benvanwerkhoven in #364
- Bump pygments from 2.19.1 to 2.20.0 in /doc by @dependabot[bot] in #370
- Bump requests from 2.32.4 to 2.33.0 in /doc by @dependabot[bot] in #369
- Bump tornado from 6.5.1 to 6.5.5 in /doc by @dependabot[bot] in #365
- Fix
_find_bfloat16_if_availablealways returningNoneby @stijnh in #373 - Bump nbconvert from 7.17.0 to 7.17.1 in /doc by @dependabot[bot] in #375
- Bump pytest from 8.3.5 to 9.0.3 in /doc by @dependabot[bot] in #374
- Updated cupyx namespace usage by @fjwillemsen in #376
Full Changelog: 1.3.1...1.3.2
Version 1.3.1
This release brings together several optimizations and smaller bug fixes. It also adds support for cuda-python versions 13 and higher.
What's Changed
- Fix issue 335
ValueError: (...) is not in listduring Bayesian optimization by @stijnh in #336 - Remove
error_message_searchspace_fully_observedmessage in BO by @stijnh in #338 - fix issue #332 by @benvanwerkhoven in #334
- Recalculate metrics for each configuration in simulation runner by @stijnh in #307
- Bump urllib3 from 2.5.0 to 2.6.0 in /doc by @dependabot[bot] in #341
- Change
test-python-package.ymlto usemacos-latestby @stijnh in #344 - Bump filelock from 3.18.0 to 3.20.1 in /doc by @dependabot[bot] in #346
- Bump urllib3 from 2.6.0 to 2.6.3 in /doc by @dependabot[bot] in #347
- Bump virtualenv from 20.30.0 to 20.36.1 in /doc by @dependabot[bot] in #349
- Bump filelock from 3.20.1 to 3.20.3 in /doc by @dependabot[bot] in #350
- Clarify contribution guidelines regarding AI-generated code by @benvanwerkhoven in #353
- Use the new cuda-python modules by @isazi in #345
- Optimized searchspace operations by @fjwillemsen in #354
- Add new
Hamming-adjacentneighborhood method by @stijnh in #313 - This fixes issue #333 on backwards compatibility with the old restrictions function by @fjwillemsen in #337
- Fix bug in simulated annealing when dealing with negative objectives by @stijnh in #331
- Replace bfloat16 dtype from
bfloat16package by one fromml_dtypespackage by @stijnh in #330 - Fix evaluation count in PyATF search strategies by @stijnh in #342
Full Changelog: 1.3.0...1.3.1
Version 1.3.0
This release presents another major step forwards in particular with regard to hyperparameter tuning of the optimization strategies in Kernel Tuner. In addition, many of the optimization strategies have been made aware of constraints. This means they will initialize with only valid configurations, use the search space object to query only valid neighbors, and when needed repair invalid configs to valid neighboring ones.
In addition, the Differential Evolution strategy previously relied on scipy.optimize.diff_evo, which has now been replaced with a brand new implementation that is more suited for discrete search spaces, including those with strings as parameter valus, and the strategy is also constraint-aware.
Finally, Kernel Tuner now also allows users to pass their own optimization algorithms as search strategies for auto-tuning. For this purpose, kernel_tuner.strategies.wrapper implements an OptAlgWrapper class that can wrap an existing optimizer.
What's Changed
- Hyperparametertuning custom strategies by @nikivanstein in #325
- Hyperparameter tuning for custom strategies by @fjwillemsen in #329
- add support for user-defined optimization algorithms by @benvanwerkhoven in #287
- Hyperparameter tuning by @fjwillemsen in #289
- Constrained optimization by @benvanwerkhoven in #298
- Tunable constrained optimization algorithms by @fjwillemsen in #324
- Replace differential evolution strategy by @benvanwerkhoven #322
New Contributors
- @nikivanstein made their first contribution in #325
Full Changelog: 1.2...1.3.0
Version 1.2
This release includes many different fixes and upgrades in different areas. In particular, the search space construction, and OpenMP support. Bugs were fixed related to optimizing using maximization instead of minimization impacting all strategies and in particular for Firefly. Smaller improvements have been made to improve user-friendliness, documentation, Python 3.13 compatibility, the HIP backend, support for string-valued tunable parameters for mixed-precision tuning.
What's Changed
- OpenMP by @isazi in #273
- Bump tornado from 6.4.2 to 6.5.1 in /doc by @dependabot[bot] in #309
- Resolve regex calls warnings by @emmanuel-ferdman in #308
- More user-friendly error messages for HIP backend by @benvanwerkhoven in #303
- Add support for 16-bit floats in HIP backend by @stijnh in #301
- Display the invalid identifier name on error by @emmanuel-ferdman in #310
- Bump requests from 2.32.3 to 2.32.4 in /doc by @dependabot[bot] in #311
- Change Firefly algorithm to use negation instead of division by @stijnh in #317
- Bump urllib3 from 2.3.0 to 2.5.0 in /doc by @dependabot[bot] in #316
- Change
CostFuncto return+infwhenobjective_higher_is_betterby @stijnh in #315 - Extended searchspace construction and input format support by @fjwillemsen in #278
- Fix documentation by @benvanwerkhoven in #319
- fix issue #318 by @benvanwerkhoven in #320
- Fix error 'grid divisor cannot be integer' (issue #264) by @stijnh in #306
- re-add support for user-specified starting point by @benvanwerkhoven in #297
- add default optimization direction for 'fitness' and 'cost' by @benvanwerkhoven in #323
- Improve warning on kernel source not found by @benvanwerkhoven in #321
- use searchspace to check config validity in costfunc by @benvanwerkhoven in #327
Full Changelog: 1.1.3...1.2
Version 1.1.3
This release contains a number of small bugfixes and enables support on Nvidia Blackwell GPUs.
What's Changed
- Resolve deprecation warnings of regex library by @emmanuel-ferdman in #296
- Support three-digit compute capability by @csbnw in #299
- Add support for half and bfloat16 scalars in pyCUDA backend by @stijnh in #300
- Fix issue #245 by @stijnh in #302
New Contributors
- @emmanuel-ferdman made their first contribution in #296
Full Changelog: 1.1.2...1.1.3
Version 1.1.2
This release would not have been necessary if I had not forgotten to increment the version number on the previous release that I made 20 minutes ago. Alas, we all make mistakes sometimes.
Version 1.1.1
The sole purpose of this release is to support Numpy 2.0 and newer. The main motivation for this is to make the examples and tutorial notebooks working again on Google Colab.
What's Changed
- Numpy2 support by @benvanwerkhoven in #295
Full Changelog: 1.1.0...1.1.1
Version 1.1.0
This release integrates many smaller changes that have been made over the past year.
The most significant new features are:
- The NCUObserver to include performance metrics from the Nvidia Profiler during tuning
- TegraObserver to read/set clock frequencies, power and temperature on Nvidia Jetson GPUs
In addition, a lot of work has been put into several backends, including OpenACC, the compiler backend, the HIP backend and so on.
Thanks to everyone who contributed to Kernel Tuner in the past year!
What's Changed
- Add Tegra Observer to control clocks on Jetson devices by @loostrum in #243
- Catch RuntimeError when importing from pyhip by @loostrum in #252
- Bump pillow from 10.2.0 to 10.3.0 by @dependabot in #249
- Read instant power in pwr_usage by @csbnw in #247
- Bump idna from 3.6 to 3.7 by @dependabot in #250
- Register observer & correct clock setting by @fjwillemsen in #242
- Compiler backend uses g++ instead of gcc by @benvanwerkhoven in #254
- Improved OpenACC support by @isazi in #248
- Small improvements to searchspaces and simulation mode by @fjwillemsen in #251
- Simplify contributing info by @benvanwerkhoven in #255
- Support Python 3.12 and drop Python 3.8 by @benvanwerkhoven in #256
- Support Python 3.12 and drop Python 3.8 (2) by @fjwillemsen in #260
- Add NCUObserver by @csbnw in #253
- Update PMTObserver for latest PMT changes by @csbnw in #261
- OpenACC bug fixing by @isazi in #262
- ESiWACE3 hackathon by @isazi in #267
- fix reading of graphics and memory clocks by @benvanwerkhoven in #271
- Directives: summer refactoring by @isazi in #269
- Tegra observer by @MartijnFr in #270
- Tegra observer with continuous observer by @benvanwerkhoven in #275
- base implementation for pmt continuous observer by @benvanwerkhoven in #276
- Add support for float16 to HIP backend by @loostrum in #280
- Fix: out-of-date PMTContinuousObserver readings by @wvbbreu in #283
- Hip local memory error handling by @MiloLurati in #284
- Replacing PyHIP with new official python wrapper of ROCm HIP by @MiloLurati in #285
- update observer to latest python bindings by @benvanwerkhoven in #279
- add support for any case spelling of block size name defaults by @benvanwerkhoven in #277
- update documentation by @benvanwerkhoven in #293
- Updated pyproject to use hip-python from testpypi by @fjwillemsen in #294
New Contributors
- @MartijnFr made their first contribution in #270
- @wvbbreu made their first contribution in #283
Full Changelog: 1.0...1.1.0