Skip to content

1207 fix#1266

Open
HannoSpreeuw wants to merge 23 commits into
amusecode:mainfrom
HannoSpreeuw:1207_fix
Open

1207 fix#1266
HannoSpreeuw wants to merge 23 commits into
amusecode:mainfrom
HannoSpreeuw:1207_fix

Conversation

@HannoSpreeuw
Copy link
Copy Markdown
Collaborator

Fixes #1207 at the level of PH4, i.e. at the level of jdata.cc

Massless particles will no longer be added as j-particles.
This will work for both CPU and GPU backends in PH4.

Consequently, the conditional in line 208 in idata.cc

		// Skip particles with zero mass.
		if (jdat->mass[j] > _TINY_){

will become redundant.

Perform these calculations only if particles have mass, i.e. if they can exert gravity.
The lower limit of 1e-30 kg is somewhat arbitrary.
Commit 460169d did not yield any speedup, possibly because of branch divergence:
"Branch divergence occurs: If roughly half the particles are massless test particles,
about half the threads in each warp will take the if branch while the other half skip it"
This commit essentially reverts commit 460169d.
The lower limit on the mass that discerns test particles from particles that exert gravity should not be hard coded.
That is why the comment has been updated.
By introducing "nj_massive" we try to enforce that only massive particles will reach the GPU.
1) "SAPPORO_TEST_PARTICLE_MASS" gets a default value of 1e-30 unless "_TINY_" as used in "stdinc.h" is set. To enable the override, one would have to set "_TINY_" as an env var, I reckon, i.e. in the current implementation it will not be copied from "src/amuse_ph4/src/stdinc.h".
2) Declaration and initialisation of "nj_massive".
Declaration of "nj_massive"; apparently this is needed both in "sapporo.h" and "sapporo_multi.h".
Particles are rearranged depending on mass in descending order. The idea is that test particles, i.e. particles with a mass less than "SAPPORO_TEST_PARTICLE_MASS" will not reach the GPU; they do not need to reach the GPU since they do not exert gravity.
"nj_massive", i.e. the number of particles with a mass > "SAPPORO_TEST_PARTICLE_MASS" is computed.
The reordering of particles, i.e. the fact that after reordering an index does no longer point to the same particle, may have unforeseen consequences, will check this later.
Set "nj_massive" for "gpu", which is a "sapporo_multi_struct".
We could build Sapporo Light but running yielded unforeseen results.
With no test particles, the code seems a lot faster, it completes in 3s instead of 39s, that is suspicious.
But with e.g. 50% test particles, the code seems to run forever, with about 35% GPU utilisation, which is twice as low as before.
It seems that computations could be incorrect.
To fix this, we will again send all particles to the GPU, but reimplement the mass condition.
This should result in correct computations, but the speedup may vanish.
The conditional
"if (mass > SAPPORO_TEST_PARTICLE_MASS) {"
does not yield any speedup, since branch divergence occurs: if roughly half the particles are massless test particles,
about half the threads in each warp will take the if branch while the other half skip it and will simply wait.
Previously, massless particles, i.e. test particles were weeded out by sorting all the particles based on their mass, subsequently determining "nj_massive" from that sort and copying all particles up to index "nj_massive" to the GPU as j-particles, such that massless particles will not be added as j-particles.
However, it may be much simpler to add a conditional to "set_j_particle":
"
if (mass <= SAPPORO_TEST_PARTICLE_MASS) return 0;
"
This may work, but remains to be tested.
With the new method, implemented in the previous commit (ae1c36e), of weeding out test particles from the j-particles, "nj_massive" has become redundant.
1) The line "if (nj == 0) return 0;" should be a safeguard against crashing for runs with only massless particles.
2) "nj_massive" has become redundant since we started with the new method of weeding out massless particles from the j-particles.
This check should be made at the caller, not at the callee.
We do not want to call "evaluate_gravity" when "nj==0". The check for this is now made at the caller, not at the callee.
j-particles are the particles that exert gravity. If they are massless, they should not become a j-particle.
When deploying PH4 with a CPU backend, this is taken care of in line 207 of idata.cc in "get_partial_acc_and_jerk":
"
		// Skip particles with zero mass.
		if (jdat->mass[j] > _TINY_){
"
However, this is not the best place to do this, it would be better to not create the j-particle at all.
For both the CPU and GPU backends this commit fixes this.
This means that the conditional in line 208 of idata.cc has now become redundant.
Copy link
Copy Markdown
Member

@LourensVeen LourensVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have some questions....

#include <math.h>

#ifndef SAPPORO_TEST_PARTICLE_MASS
# if defined(_TINY_)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is TINY ever not defined? If so, why?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_TINY_ may not be defined if PH4 is not the running AMUSE code in which case the _TINY_ definition in stdinc.h may be skipped:

#define _TINY_	   (pow(2.0, -52))	  // double limit on powers of 2

Comment thread src/amuse_ph4/src/jdata.cc Outdated

if (pmass <= _TINY_) {
if (mpi_rank == 0) {
cerr << "Warning: ignoring particle with mass="
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to produce a line of output for every massless test particle? That could be a lot, and what does that add? After all, ignoring a particle with no mass won't make a difference to the result anyway.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me remove that.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed through commit 3a2a913.

HannoSpreeuw and others added 5 commits May 19, 2026 16:21
This print statement is redundant, addressing comment amusecode#1266 (comment) by @LourensVeen.
The
"
if (jdat->mass[j] > _TINY_){
"
conditional has become redundant since commit 1d0386d, i.e. since particles are only added as j-particles if they have mass.
We should not add a particle as a j-particle when massless, since j-particles are the particles that exert gravity.
This increases achieved occupancy by almost a factor four.

For the "dev_evaluate_gravity" kernel, the occupancy is increased from 16.66% to 61.65%. The grid size was 16, which yielded the following recommendation from profiling
"
The grid for this launch is configured to execute only 16 blocks, which is less than the GPU's 64 multiprocessors. This can underutilize some multiprocessors."

Now it is 256.
The improvement in occupancy was investigated using ncu:
"
ncu --kernel-name dev_evaluate_gravity -c 100 --print-summary per-kernel -o dev_evaluate_gravity.prof python scripts/Erwan_1207_imp.py
ncu -i dev_evaluate_gravity.prof.ncu-rep | less
"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

Ph4 gpu capable in handling test particles

2 participants