Skip to content

[LoopSplitting][MicroBenchmarks] Benchmarking Loop Splitting Transformation#379

Closed
amitamd7 wants to merge 1 commit intollvm:mainfrom
amitamd7:tiwari_loop_splitting_benchmarking
Closed

[LoopSplitting][MicroBenchmarks] Benchmarking Loop Splitting Transformation#379
amitamd7 wants to merge 1 commit intollvm:mainfrom
amitamd7:tiwari_loop_splitting_benchmarking

Conversation

@amitamd7
Copy link
Copy Markdown

@amitamd7 amitamd7 commented Apr 9, 2026

PR adds benchmarking mechanics for the upstream PR on #pragma omp split directive

@amitamd7 amitamd7 requested review from Meinersbur and fhahn April 9, 2026 08:49
@amitamd7 amitamd7 changed the title [LoopSplitting] Benchmarking Loop Splitting Transformation [LoopSplitting][MicroBenchmarks] Benchmarking Loop Splitting Transformation Apr 9, 2026
Copy link
Copy Markdown
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://llvm.org/docs/AIToolPolicy.html

Contributors are expected to be transparent and label contributions that contain substantial amounts of tool-generated content.

Benchmarks in llvm-test-suite are to track improvements of compiler optimizations. With #pragma omp split you already told the compiler what optimization to apply. There is very limited amout of optimization a compiler can do after that, basically applying the same optimization 4 times (each generated loop individually). There is no expectation it would improve in future versions of Clang. You are basically illustrating what speed difference of #pragma omp split for programmers considering using it, but it is a waste of time for compiler engineers that want to improve code optimization passes. That is, I don't think we need a benchmark for split upstream.

Comment on lines +2 to +5
# Copy this directory to llvm-test-suite/MicroBenchmarks/LoopSplit/
# and add: add_subdirectory(LoopSplit) to MicroBenchmarks/CMakeLists.txt.
#
# Configure test-suite with a Clang that supports -fopenmp and -fopenmp-version=60.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copy this directory to llvm-test-suite/MicroBenchmarks/LoopSplit/
# and add: add_subdirectory(LoopSplit) to MicroBenchmarks/CMakeLists.txt.
#
# Configure test-suite with a Clang that supports -fopenmp and -fopenmp-version=60.

remove instructions from AI that you obviously applied.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that one out, will keep a note for future references.


// Kernel: sum 0..(N-1) with split into four segments.
static long run_split() {
long sum = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long is 32 bits on 32 bit platforms and Windows 64. It will overflow on 19999999900000000

static void BM_Split(benchmark::State &state) {
long x = 0;
for (auto _ : state)
benchmark::DoNotOptimize(x += run_split());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
benchmark::DoNotOptimize(x += run_split());
auto x = run_split();
benchmark::DoNotOptimize(x);

@amitamd7
Copy link
Copy Markdown
Author

https://llvm.org/docs/AIToolPolicy.html

Contributors are expected to be transparent and label contributions that contain substantial amounts of tool-generated content.

Benchmarks in llvm-test-suite are to track improvements of compiler optimizations. With #pragma omp split you already told the compiler what optimization to apply. There is very limited amout of optimization a compiler can do after that, basically applying the same optimization 4 times (each generated loop individually). There is no expectation it would improve in future versions of Clang. You are basically illustrating what speed difference of #pragma omp split for programmers considering using it, but it is a waste of time for compiler engineers that want to improve code optimization passes. That is, I don't think we need a benchmark for split upstream.

Understood. Thanks for reviewing and sharing the doc, will go through it.

@amitamd7 amitamd7 closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants