Commit d78fb81
Feature/oro 0 amdadvtech merge (#43)
* Add gitignore to the repository
Signed-off-by: Chih-Chen Kao <[email protected]>
* Fix missing CUDA properties. (#16)
Signed-off-by: Chih-Chen Kao <[email protected]>
* Feature/oro 0 radix sort (#19)
* [ORO-0] Working 8 bit radix sort.
* [ORO-0] Some optimization.
* Create LICENSE
* Update README.md (#15)
* Feature/oro 0 raw get set (#19)
* [ORO-0] Rename setter and getter.
* [ORO-0] Fix when there is a dll but no device.
* [ORO-0] Deletion function.
* [ORO-0] Multi processor count.
* [ORO-0] Extended the sort to more than 8 bits. Implemented tests.
* [ORO-0] Moved temp buffer allocation out from the sort().
* [ORO-0] README. References.
* [ORO-0] Debug flag.
* Refactor the code to add the basic constructs to support selecting different scan algorithms.
Add different implementation of the scan algorithm: CPU, single WG and all WG .
Signed-off-by: Chih-Chen Kao <[email protected]>
* Squashed commit of the following:
commit 3f32bea2244653d59efb3c3eaa9433018dde5835
Author: takahiroharada <[email protected]>
Date: Wed Apr 13 10:48:35 2022 -0700
[ORO-0] Fix nvrtc.
* Optimization: Implement the single-pass kernel for GPU parallel scan.
Fix a GPU memory bug.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Feature/oro 0 kernel cache (#4)
* [ORO-0] Cache kernel.
* [ORO-0] Support newer HIP builds on windows (#22)
* [ORO-0] Unit test. (#23)
* Fix LDS scan bug.
The previous implementation would lead to an error when the wavefront (wrap) size is not equal to the size of a workgroup (block).
Since not all threads run simultaneously, for an input arrays larger than the wavefront size, the previous algorithm will not work
because it performs the scan in-place on the input array. The results of one wavefront (wrap) will be overwritten by work items (threads) in another wavefront (wrap).
Signed-off-by: Chih-Chen Kao <[email protected]>
* Optimize the LDS scan algorithm. (#6)
* Optimize the LDS scan algorithm.
This version does not require a temp buffer and can support a LDS input size up to 2 times the workgroup size.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Support an input array in LDS that is 2 times the WG size.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Feature/oro 0 clean up (#7)
* Squashed commit of the following:
commit 3f32bea2244653d59efb3c3eaa9433018dde5835
Author: takahiroharada <[email protected]>
Date: Wed Apr 13 10:48:35 2022 -0700
[ORO-0] Fix nvrtc.
* [ORO-0] Clean up.
* Feature/oro 0 clean up (#10)
* Squashed commit of the following:
commit 3f32bea2244653d59efb3c3eaa9433018dde5835
Author: takahiroharada <[email protected]>
Date: Wed Apr 13 10:48:35 2022 -0700
[ORO-0] Fix nvrtc.
* [ORO-0] Clean up.
* [ORO-0] SortKernel1. Less complex. (#8)
SortKernel (occupancy: 8)
- vgpr: 128
- lds: 6704
SortKernel1 (occupancy: 9)
- vgpr: 106
- lds 7720
* [ORO-0] Kernel execution time check.
* Fix the memory access pattern and change it to coalesced memory access. (#11)
Signed-off-by: Chih-Chen Kao <[email protected]>
* [ORO-0] Single kernel sort for small keys. (#12)
* Optimize the Count kernel for less LDS usage to achieve full occupancy (#13)
* Optimize the Count kernel to let it use less LDS and could achieve full occupancy.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Remove __threadfence_block()
Removes the boundary check in the inner loop.
The upper bound is set only once before going into the loop.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Introduce DRIVER and RTC APIs
* Disable enum-variant
* Improve paths
* Add fields
* Update Vulkan test
* Define CUDA in terms of DRIVER and RTC
* Optimize the sort kernel: single-pass 8bit sort & parallel scan in 4bit sort. (#14)
* Fix a minor issue in CountKernel to make it more robust.
Implement a single-pass 8-bit local sort.
Implement a single-pass 8-bit local sort with shared bins.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Fix nItemsPerWI and enable the version with shared LDS.
Signed-off-by: Chih-Chen Kao <[email protected]>
* [ORO-0] Print driver version.
* [ORO-0] Repro case.
* Fix SORT_WG_SIZE.
Fix stable sort order.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Optimize sort kernel to remove inner boundary check.
Adjust nItemsPerWI.
Signed-off-by: Chih-Chen Kao <[email protected]>
Co-authored-by: takahiroharada <[email protected]>
* Merging another merge (#18)
* Fix a minor issue in CountKernel to make it more robust.
Implement a single-pass 8-bit local sort.
Implement a single-pass 8-bit local sort with shared bins.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Fix nItemsPerWI and enable the version with shared LDS.
Signed-off-by: Chih-Chen Kao <[email protected]>
* [ORO-0] Print driver version.
* [ORO-0] Repro case.
* Fix SORT_WG_SIZE.
Fix stable sort order.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Optimize sort kernel to remove inner boundary check.
Adjust nItemsPerWI.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Calculate the number of WGs based on LDS and max-thread-per-WGP. (#15)
* Calculate the number of WGs based on LDS and max-thread-per-WGP.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Add a workaround for CUDA.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Optimize the sort kernel: single-pass 8bit sort & parallel scan in 4bit sort. (#14)
* Fix a minor issue in CountKernel to make it more robust.
Implement a single-pass 8-bit local sort.
Implement a single-pass 8-bit local sort with shared bins.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Fix nItemsPerWI and enable the version with shared LDS.
Signed-off-by: Chih-Chen Kao <[email protected]>
* [ORO-0] Print driver version.
* [ORO-0] Repro case.
* Fix SORT_WG_SIZE.
Fix stable sort order.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Optimize sort kernel to remove inner boundary check.
Adjust nItemsPerWI.
Signed-off-by: Chih-Chen Kao <[email protected]>
Co-authored-by: takahiroharada <[email protected]>
Co-authored-by: takahiroharada <[email protected]>
Co-authored-by: Chih-Chen Kao <[email protected]>
* Implement key-value pair sorting (#17)
* Add gitignore to the repository
Signed-off-by: Chih-Chen Kao <[email protected]>
* Fix missing CUDA properties. (#16)
Signed-off-by: Chih-Chen Kao <[email protected]>
* Add basic structure for key-value pair sorting.
Fix an error in single pass sort
Signed-off-by: Chih-Chen Kao <[email protected]>
* Add Value data in the test and sort it according to keys.
Signed-off-by: Chih-Chen Kao <[email protected]>
* Support Key only sorting.
Signed-off-by: Chih-Chen Kao <[email protected]>
* [ORO-0] Make single pass kernel non compile time switch.
* Support both Key-Only & Key-Value pair sort kernels
Signed-off-by: Chih-Chen Kao <[email protected]>
* [ORO-0] Test change.
* [ORO-0] A bug.
* [ORO-0] NVIDIA occupancy computation fix. Test change. Tweak params to use single pass sort as much as possible.
Co-authored-by: Takahiro Harada <[email protected]>
Co-authored-by: takahiroharada <[email protected]>
* [ORO-0] Revert demo code.
* Fix missing CUDA properties. (#26)
* Update Orochi.cpp
* [ORO-0] Clean up.
* [ORO-0] OroUtils. (#27)
* [ORO-0] OroUtils.
* [ORO-0] Linux build fix.
* [ORO-0] Forgot to add.
* [ORO-0] Linux build fix.
* [ORO-0] Clean up.
Co-authored-by: Chih-Chen Kao <[email protected]>
Co-authored-by: Aaryaman Vasishta <[email protected]>
Co-authored-by: Mehmet Oguz Derin <[email protected]>
* Add kernel path and include dir to the functions. (#20)
Signed-off-by: Chih-Chen Kao <[email protected]>
* [ORO-0] BakeKernel. (#21)
* [ORO-0] BakeKernel.
* Update tools/genArgs.py
commented code removal
* Update tools/stringify.py
commented code removal
* Update tools/stringify.py
commented code removal
* Update tools/stringify.py
commented code removal
* Update tools/genArgs.py
dead code removal
* Update tools/stringify.py
dead code removal
* fix include
Signed-off-by: Chih-Chen Kao <[email protected]>
* fix script
Signed-off-by: Chih-Chen Kao <[email protected]>
* fix
Signed-off-by: Chih-Chen Kao <[email protected]>
Co-authored-by: Chih-Chen Kao <[email protected]>
* Fix Orochi CUDA API (#23)
Fix Orochi CUDA API
Signed-off-by: Chih-Chen Kao <[email protected]>
* [ORO-0] Linux build fix. (#22)
* [ORO-0] Linux build fix.
* Fix Orochi CUDA API
Signed-off-by: Chih-Chen Kao <[email protected]>
Co-authored-by: Chih-Chen Kao <[email protected]>
* Quick fix for old linux gcc which does not support std::exclusive_scan (#24)
Quick fix for old linux gcc which does not support std::exclusive_scan
Signed-off-by: Chih-Chen Kao <[email protected]>
* Fix the kernel cache bug. (#25)
Fix the kernel cache bug.
The function should not return the oroFunctions that are created previously solely based on the names because they might be invalid.
Signed-off-by: Chih-Chen Kao <[email protected]>
* [ORO-0] Remove static variables. (#26)
* [ORO-0] Remove static variables.
* [ORO-0] Applied the suggestions.
* [ORO-0] Linux regression fix.
* Fix OrochiUtils::getFunctionFromString API (#27)
Signed-off-by: Chih-Chen Kao <[email protected]>
* Adding missing assert (#28)
* Adding missing assert
* Adding more asserts
* Feature/oro 0 gpuopen merge (#31)
* Fix oroGetDeviceProperties in cuda path.
* Fix linux crash (#29)
* [ORO-0] Added missing file.
* [ORO-0] Remove printf from kernelExec and skip compilation of vulkan test on Linux (#31)
* [ORO-0] Skip compilation of vulkan test on Linux
* [ORO-0] Update kernelExec unit test - remove printf
* [ORO-0] Remove cout
* [ORO-0] Fix hipGetErrorString (#32)
* [ORO-0] Fix hipGetErrorString
It was incorrectly importing this API. Import the correct API in hipew.
* [ORO-0] Remove printf from kernelExec and skip compilation of vulkan test on Linux (#31)
* [ORO-0] Skip compilation of vulkan test on Linux
* [ORO-0] Update kernelExec unit test - remove printf
* [ORO-0] Remove cout
* [ORO-0] Add Orochi error codes mapped to HIP/CUDA (#33)
* Add missing path on Apple config. (#34)
* [ORO-0] Adding hiprtc+comgr dlls to workaround the regression in 22.7.1 driver (#38)
* [ORO-0] Adding hiprtc to workaround the regression in 22.7.1 driver released at 7/26/2022.
* [ORO-0] Created win64 subdir.
* [ORO-0] Add hiprtc.dll and comgr dll
Co-authored-by: takahiroharada <[email protected]>
* fix footnote markdown format (#39)
* Fix orochi utils issue in unit tests
Co-authored-by: Aaryaman Vasishta <[email protected]>
Co-authored-by: Chih-Chen Kao <[email protected]>
Co-authored-by: NevesLucas <[email protected]>
Co-authored-by: PixelClear <[email protected]>
Signed-off-by: Chih-Chen Kao <[email protected]>
Co-authored-by: Chih-Chen Kao <[email protected]>
Co-authored-by: Aaryaman Vasishta <[email protected]>
Co-authored-by: Mehmet Oguz Derin <[email protected]>
Co-authored-by: Daniel Meister <[email protected]>
Co-authored-by: NevesLucas <[email protected]>
Co-authored-by: PixelClear <[email protected]>1 parent 03c4676 commit d78fb81
8 files changed
Lines changed: 357 additions & 18 deletions
File tree
- Orochi
- Test
- UnitTest
- tools
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
158 | | - | |
| 158 | + | |
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| |||
220 | 220 | | |
221 | 221 | | |
222 | 222 | | |
223 | | - | |
| 223 | + | |
224 | 224 | | |
225 | 225 | | |
226 | 226 | | |
| |||
381 | 381 | | |
382 | 382 | | |
383 | 383 | | |
384 | | - | |
385 | | - | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
386 | 392 | | |
387 | 393 | | |
388 | 394 | | |
389 | | - | |
390 | | - | |
| 395 | + | |
| 396 | + | |
391 | 397 | | |
392 | | - | |
| 398 | + | |
393 | 399 | | |
394 | 400 | | |
395 | 401 | | |
396 | 402 | | |
397 | 403 | | |
398 | 404 | | |
399 | 405 | | |
400 | | - | |
| 406 | + | |
401 | 407 | | |
402 | 408 | | |
403 | 409 | | |
404 | | - | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
405 | 425 | | |
406 | 426 | | |
407 | 427 | | |
| |||
422 | 442 | | |
423 | 443 | | |
424 | 444 | | |
425 | | - | |
| 445 | + | |
426 | 446 | | |
427 | 447 | | |
428 | 448 | | |
| |||
433 | 453 | | |
434 | 454 | | |
435 | 455 | | |
436 | | - | |
| 456 | + | |
| 457 | + | |
437 | 458 | | |
438 | 459 | | |
439 | 460 | | |
| |||
449 | 470 | | |
450 | 471 | | |
451 | 472 | | |
| 473 | + | |
452 | 474 | | |
453 | 475 | | |
454 | 476 | | |
| 477 | + | |
455 | 478 | | |
| 479 | + | |
456 | 480 | | |
457 | 481 | | |
458 | | - | |
| 482 | + | |
459 | 483 | | |
460 | 484 | | |
461 | 485 | | |
462 | 486 | | |
| 487 | + | |
463 | 488 | | |
| 489 | + | |
464 | 490 | | |
465 | 491 | | |
466 | 492 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
23 | 29 | | |
24 | 30 | | |
25 | 31 | | |
| |||
64 | 70 | | |
65 | 71 | | |
66 | 72 | | |
67 | | - | |
68 | | - | |
| 73 | + | |
| 74 | + | |
69 | 75 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
5 | 8 | | |
6 | 9 | | |
7 | 10 | | |
| |||
50 | 53 | | |
51 | 54 | | |
52 | 55 | | |
| 56 | + | |
53 | 57 | | |
54 | 58 | | |
55 | 59 | | |
56 | 60 | | |
57 | | - | |
| 61 | + | |
58 | 62 | | |
59 | 63 | | |
60 | 64 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
0 commit comments