1207 fix#1266
Open
HannoSpreeuw wants to merge 23 commits into
Open
Conversation
Perform these calculations only if particles have mass, i.e. if they can exert gravity. The lower limit of 1e-30 kg is somewhat arbitrary.
Commit 460169d did not yield any speedup, possibly because of branch divergence: "Branch divergence occurs: If roughly half the particles are massless test particles, about half the threads in each warp will take the if branch while the other half skip it" This commit essentially reverts commit 460169d.
The lower limit on the mass that discerns test particles from particles that exert gravity should not be hard coded. That is why the comment has been updated.
By introducing "nj_massive" we try to enforce that only massive particles will reach the GPU.
1) "SAPPORO_TEST_PARTICLE_MASS" gets a default value of 1e-30 unless "_TINY_" as used in "stdinc.h" is set. To enable the override, one would have to set "_TINY_" as an env var, I reckon, i.e. in the current implementation it will not be copied from "src/amuse_ph4/src/stdinc.h". 2) Declaration and initialisation of "nj_massive".
Declaration of "nj_massive"; apparently this is needed both in "sapporo.h" and "sapporo_multi.h".
Particles are rearranged depending on mass in descending order. The idea is that test particles, i.e. particles with a mass less than "SAPPORO_TEST_PARTICLE_MASS" will not reach the GPU; they do not need to reach the GPU since they do not exert gravity. "nj_massive", i.e. the number of particles with a mass > "SAPPORO_TEST_PARTICLE_MASS" is computed. The reordering of particles, i.e. the fact that after reordering an index does no longer point to the same particle, may have unforeseen consequences, will check this later.
Set "nj_massive" for "gpu", which is a "sapporo_multi_struct".
We could build Sapporo Light but running yielded unforeseen results. With no test particles, the code seems a lot faster, it completes in 3s instead of 39s, that is suspicious. But with e.g. 50% test particles, the code seems to run forever, with about 35% GPU utilisation, which is twice as low as before. It seems that computations could be incorrect. To fix this, we will again send all particles to the GPU, but reimplement the mass condition. This should result in correct computations, but the speedup may vanish.
The conditional
"if (mass > SAPPORO_TEST_PARTICLE_MASS) {"
does not yield any speedup, since branch divergence occurs: if roughly half the particles are massless test particles,
about half the threads in each warp will take the if branch while the other half skip it and will simply wait.
Previously, massless particles, i.e. test particles were weeded out by sorting all the particles based on their mass, subsequently determining "nj_massive" from that sort and copying all particles up to index "nj_massive" to the GPU as j-particles, such that massless particles will not be added as j-particles. However, it may be much simpler to add a conditional to "set_j_particle": " if (mass <= SAPPORO_TEST_PARTICLE_MASS) return 0; " This may work, but remains to be tested.
With the new method, implemented in the previous commit (ae1c36e), of weeding out test particles from the j-particles, "nj_massive" has become redundant.
1) The line "if (nj == 0) return 0;" should be a safeguard against crashing for runs with only massless particles. 2) "nj_massive" has become redundant since we started with the new method of weeding out massless particles from the j-particles.
This check should be made at the caller, not at the callee.
We do not want to call "evaluate_gravity" when "nj==0". The check for this is now made at the caller, not at the callee.
j-particles are the particles that exert gravity. If they are massless, they should not become a j-particle.
When deploying PH4 with a CPU backend, this is taken care of in line 207 of idata.cc in "get_partial_acc_and_jerk":
"
// Skip particles with zero mass.
if (jdat->mass[j] > _TINY_){
"
However, this is not the best place to do this, it would be better to not create the j-particle at all.
For both the CPU and GPU backends this commit fixes this.
This means that the conditional in line 208 of idata.cc has now become redundant.
LourensVeen
reviewed
May 19, 2026
Member
LourensVeen
left a comment
There was a problem hiding this comment.
Thanks! I have some questions....
| #include <math.h> | ||
|
|
||
| #ifndef SAPPORO_TEST_PARTICLE_MASS | ||
| # if defined(_TINY_) |
Member
There was a problem hiding this comment.
Is TINY ever not defined? If so, why?
Collaborator
Author
There was a problem hiding this comment.
_TINY_ may not be defined if PH4 is not the running AMUSE code in which case the _TINY_ definition in stdinc.h may be skipped:
#define _TINY_ (pow(2.0, -52)) // double limit on powers of 2
|
|
||
| if (pmass <= _TINY_) { | ||
| if (mpi_rank == 0) { | ||
| cerr << "Warning: ignoring particle with mass=" |
Member
There was a problem hiding this comment.
Is this going to produce a line of output for every massless test particle? That could be a lot, and what does that add? After all, ignoring a particle with no mass won't make a difference to the result anyway.
Collaborator
Author
There was a problem hiding this comment.
Good point, let me remove that.
This print statement is redundant, addressing comment amusecode#1266 (comment) by @LourensVeen.
The
"
if (jdat->mass[j] > _TINY_){
"
conditional has become redundant since commit 1d0386d, i.e. since particles are only added as j-particles if they have mass.
We should not add a particle as a j-particle when massless, since j-particles are the particles that exert gravity.
This increases achieved occupancy by almost a factor four. For the "dev_evaluate_gravity" kernel, the occupancy is increased from 16.66% to 61.65%. The grid size was 16, which yielded the following recommendation from profiling " The grid for this launch is configured to execute only 16 blocks, which is less than the GPU's 64 multiprocessors. This can underutilize some multiprocessors." Now it is 256. The improvement in occupancy was investigated using ncu: " ncu --kernel-name dev_evaluate_gravity -c 100 --print-summary per-kernel -o dev_evaluate_gravity.prof python scripts/Erwan_1207_imp.py ncu -i dev_evaluate_gravity.prof.ncu-rep | less "
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1207 at the level of PH4, i.e. at the level of jdata.cc
Massless particles will no longer be added as j-particles.
This will work for both CPU and GPU backends in PH4.
Consequently, the conditional in line 208 in idata.cc
will become redundant.