Skip to content

Commit 4b8b2c7

Browse files
committed
Publish Nabla Path Tracer runtime compare report
0 parents  commit 4b8b2c7

88 files changed

Lines changed: 4424 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Nabla Path Tracer runtime compare from Nsight Graphics
2+
3+
This directory contains one paired `Nsight Graphics` `GPU Trace` probe for Nabla Path Tracer.
4+
5+
Protocol:
6+
- `1` run per variant
7+
- same capture point: `frame 1000`
8+
- same effective render path:
9+
- geometry: `sphere`
10+
- effective method: `solid angle`
11+
- runtime numbers below come directly from `Nsight Graphics` exports:
12+
- `FRAME.xls`
13+
- `GPUTRACE_FRAME.xls`
14+
- measurement machine:
15+
- [`measurement_machine.md`](measurement_machine.md)
16+
17+
## Variant matrix
18+
19+
| Case | Checkout source | Nabla | DXC | SPIRV-Headers | SPIRV-Tools | Mode |
20+
| --- | --- | --- | --- | --- | --- | --- |
21+
| `master_source_off` | `master_runcheck` local worktree | [`e11b118dd2e80393b5b7eb309c6abb25f51a818c`](https://github.com/Devsh-Graphics-Programming/Nabla/commit/e11b118dd2e80393b5b7eb309c6abb25f51a818c) | [`d76c7890b19ce0b344ee0ce116dbc1c92220ccea`](https://github.com/Devsh-Graphics-Programming/DirectXShaderCompiler/commit/d76c7890b19ce0b344ee0ce116dbc1c92220ccea) | [`057230db28c7f7d1d571c9e61732da44815f2891`](https://github.com/Devsh-Graphics-Programming/SPIRV-Headers/commit/057230db28c7f7d1d571c9e61732da44815f2891) | [`91ac969ed599bfd0697a5b88cfae550318a04392`](https://github.com/Devsh-Graphics-Programming/SPIRV-Tools/commit/91ac969ed599bfd0697a5b88cfae550318a04392) | local `Release`, `SOURCE`, runtime `builtins OFF` |
22+
| `devshfixes_upstream` | `unroll_dxc_df_upstream_check` local worktree | [`c13c33662c3733b54d9014988a5ac602ab0c3245`](https://github.com/Devsh-Graphics-Programming/Nabla/commit/c13c33662c3733b54d9014988a5ac602ab0c3245) | [`74d6fbbad7388813c65ae269b20f15b4e971df9c`](https://github.com/Devsh-Graphics-Programming/DirectXShaderCompiler/commit/74d6fbbad7388813c65ae269b20f15b4e971df9c) | [`10b37414a3c9269b9bd8861cc759bd7fdf09760d`](https://github.com/Devsh-Graphics-Programming/SPIRV-Headers/commit/10b37414a3c9269b9bd8861cc759bd7fdf09760d) | [`2c75d08e3b31a673726ce6be80ab528250247064`](https://github.com/Devsh-Graphics-Programming/SPIRV-Tools/commit/2c75d08e3b31a673726ce6be80ab528250247064) | local `Release`, `SOURCE`, runtime `builtins OFF` |
23+
| `unroll_artifact` | CI install artifact from [`run 23599197849`](https://github.com/Devsh-Graphics-Programming/Nabla/actions/runs/23599197849) | [`262a8b72f295ec95d3cf83170f1768a43972c9ab`](https://github.com/Devsh-Graphics-Programming/Nabla/commit/262a8b72f295ec95d3cf83170f1768a43972c9ab) | [`07f06e9d48807ef8e7cabc41ae6acdeb26c68c09`](https://github.com/Devsh-Graphics-Programming/DirectXShaderCompiler/commit/07f06e9d48807ef8e7cabc41ae6acdeb26c68c09) | [`c141151dd53cbd5b1ced0665ad95ae3e91e8f916`](https://github.com/Devsh-Graphics-Programming/SPIRV-Headers/commit/c141151dd53cbd5b1ced0665ad95ae3e91e8f916) | [`2a730e127a32ac8b0713f5e1490d7b9be9d1cc9a`](https://github.com/Devsh-Graphics-Programming/SPIRV-Tools/commit/2a730e127a32ac8b0713f5e1490d7b9be9d1cc9a) | CI `Release install` artifact |
24+
25+
## Main Nsight result
26+
27+
| Variant | GPU frame ms | Dispatch count | Compute active | SM throughput | PCIe write GB/s |
28+
| --- | ---: | ---: | ---: | ---: | ---: |
29+
| `master_source_off` | `21.4304` | `2` | `83.2501%` | `35.5388%` | `2.62710` |
30+
| `devshfixes_upstream` | `19.6157` | `2` | `82.9923%` | `38.2916%` | `2.64694` |
31+
| `unroll_artifact` | `21.5935` | `2` | `83.9945%` | `34.3346%` | `2.62212` |
32+
33+
### Runtime deltas
34+
35+
| Comparison | Delta ms | Delta % |
36+
| --- | ---: | ---: |
37+
| `devshfixes_upstream` vs `master_source_off` | `-1.8147` | `-8.47%` |
38+
| `unroll_artifact` vs `master_source_off` | `+0.1631` | `+0.76%` |
39+
| `unroll_artifact` vs `devshfixes_upstream` | `+1.9778` | `+10.08%` |
40+
41+
## Main conclusion
42+
43+
The measured `latest upstream refresh` baseline is faster than `master_source_off` in this probe. At the same time `unroll_artifact` is effectively at parity with `master_source_off` here at only `+0.76%`, while the remaining gap appears only against `devshfixes_upstream`.
44+
45+
Taken together, the measured runtime cost points at the `unroll` side of the experiment, not at the generic `DXC/SPIRV-Tools upstream refresh`. That tradeoff is also aligned with the intent of the experiment: reduce shader build time aggressively while accepting a small runtime cost.
46+
47+
In practice this is also a strong argument for a development-oriented DXC optimization profile, for example an `-O1`-style mode. For the Nabla Path Tracer builds behind this comparison the shader-build wall time is about `10x` worse without that profile, while the measured runtime delta stays at `+0.76%` against `master_source_off` in this probe and at `+10.08%` against `devshfixes_upstream`; on other machines a simpler `average FPS` check across multiple methods, modes, and shapes placed the same runtime cost in the `5-8%` range. That is exactly the profile proposed in the paired PRs: for development use it delivers a major build-time win while keeping runtime impact effectively negligible in practice.
48+
49+
## Deeper Nsight signals from the same exports
50+
51+
Frame-level exports also show:
52+
- `dispatch_count = 2` and `gr__ctas_launched_queue_sync.sum = 14401` in all three variants
53+
- `unroll_artifact` has lower `SM throughput` than `devshfixes_upstream`
54+
- `unroll_artifact` also shows higher total executed instructions and much higher `L1/LSU/shared` pressure than `devshfixes_upstream`
55+
56+
This points at a `compute-side codegen / execution-mix` difference with higher `L1/LSU/shared` pressure on the `unroll` side.
57+
58+
## Directory map
59+
60+
### Runtime stats
61+
- [`master_source_off/stats.json`](master_source_off/stats.json)
62+
- [`devshfixes_upstream/stats.json`](devshfixes_upstream/stats.json)
63+
- [`unroll_artifact/stats.json`](unroll_artifact/stats.json)
64+
65+
### Machine spec
66+
- [`measurement_machine.md`](measurement_machine.md)
67+
68+
### Executable locations
69+
- `master_source_off`: [`runnable/master_source_off_minimal/31_hlslpathtracer.exe`](runnable/master_source_off_minimal/31_hlslpathtracer.exe)
70+
- `devshfixes_upstream`: [`runnable/devshfixes_upstream_minimal/31_hlslpathtracer.exe`](runnable/devshfixes_upstream_minimal/31_hlslpathtracer.exe)
71+
- `unroll_artifact`: [`runnable/unroll_artifact_minimal/31_hlslpathtracer.exe`](runnable/unroll_artifact_minimal/31_hlslpathtracer.exe)
72+
73+
### Capture files
74+
- [`master_source_off/run01/master_source_off_frame1000_run01.ngfx-capture`](master_source_off/run01/master_source_off_frame1000_run01.ngfx-capture)
75+
- [`devshfixes_upstream/run01/devshfixes_upstream_frame1000_run01.ngfx-capture`](devshfixes_upstream/run01/devshfixes_upstream_frame1000_run01.ngfx-capture)
76+
- [`unroll_artifact/run01/unroll_artifact_frame1000_run01.ngfx-capture`](unroll_artifact/run01/unroll_artifact_frame1000_run01.ngfx-capture)
77+
78+
### Raw Nsight exports
79+
- [`master_source_off/run01/gpu-trace/BASE/FRAME.xls`](master_source_off/run01/gpu-trace/BASE/FRAME.xls)
80+
- [`master_source_off/run01/gpu-trace/BASE/GPUTRACE_FRAME.xls`](master_source_off/run01/gpu-trace/BASE/GPUTRACE_FRAME.xls)
81+
- [`devshfixes_upstream/run01/gpu-trace/BASE/GPUTRACE_FRAME.xls`](devshfixes_upstream/run01/gpu-trace/BASE/GPUTRACE_FRAME.xls)
82+
- [`unroll_artifact/run01/gpu-trace/BASE/GPUTRACE_FRAME.xls`](unroll_artifact/run01/gpu-trace/BASE/GPUTRACE_FRAME.xls)
83+
84+
### Startup logs
85+
- [`startup_devshfixes_upstream/stats.json`](startup_devshfixes_upstream/stats.json)
86+
- [`startup_unroll_artifact/stats.json`](startup_unroll_artifact/stats.json)
83.7 KB
Binary file not shown.

devshfixes_upstream/run01/devshfixes_upstream_frame1000_run01.log

Lines changed: 4 additions & 0 deletions
Large diffs are not rendered by default.
Binary file not shown.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
event_text time_ms
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
GPU frame time 19.6157
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
FE_B.TriageSCG.gr__cycles_active.avg.pct_of_peak_sustained_elapsed 93.1858
2+
HOST.TriageSCG.gpu__engine_cycles_active_any_syncce.avg.pct_of_peak_sustained_elapsed 0.475694
3+
FE_A.TriageSCG.gpu__scheduler_engine_asyncce0_cycles_active.avg.pct_of_peak_sustained_elapsed 0
4+
FE_A.TriageSCG.gpu__scheduler_engine_asyncce1_cycles_active.avg.pct_of_peak_sustained_elapsed 0.0764008
5+
HOST.TriageSCG.gpu__scheduler_engine_asyncce2_cycles_active.avg.pct_of_peak_sustained_elapsed 100
6+
GPUTrace.sm__throughput.avg.pct_of_peak_sustained_elapsed 38.2916
7+
rtcore__cycles_executed.avg.pct_of_peak_sustained_elapsed 0
8+
LTS.TriageSCG.lts__throughput.avg.pct_of_peak_sustained_elapsed 5.72954
9+
FBSP.TriageSCG.dramc__throughput.avg.pct_of_peak_sustained_elapsed 3.40994
10+
PCI.TriageSCG.pcie__throughput.avg.pct_of_peak_sustained_elapsed 16.8333
11+
GPUTrace.WorldPipe_throughput.avg.pct_of_peak_sustained_elapsed 0.00115412
12+
GPUTrace.ScreenPipe_throughput.avg.pct_of_peak_sustained_elapsed 0.132693
13+
fe__throughput.avg.pct_of_peak_sustained_elapsed 0.013713
14+
PCI.TriageSCG.pcie__read_bytes.avg.pct_of_peak_sustained_elapsed 0.579002
15+
PCI.TriageSCG.pcie__read_bytes.sum 1.78591e+06
16+
PCI.TriageSCG.pcie__read_bytes.sum.per_second 0.0910447
17+
PCI.TriageSCG.pcie__write_bytes.avg.pct_of_peak_sustained_elapsed 16.8333
18+
PCI.TriageSCG.pcie__write_bytes.sum 5.19217e+07
19+
PCI.TriageSCG.pcie__write_bytes.sum.per_second 2.64694
20+
pcie__rx_requests_aperture_bar0_op_read.sum.per_second 0.0604108
21+
pcie__rx_requests_aperture_bar0_op_read.sum 1185
22+
pcie__rx_requests_aperture_bar0_op_write.sum.per_second 0.0291603
23+
pcie__rx_requests_aperture_bar0_op_write.sum 572
24+
pcie__rx_requests_aperture_bar1_op_read.sum.per_second 0.00209016
25+
pcie__rx_requests_aperture_bar1_op_read.sum 41
26+
pcie__rx_requests_aperture_bar1_op_write.sum.per_second 0.042415
27+
pcie__rx_requests_aperture_bar1_op_write.sum 832
28+
pcie__rx_requests_aperture_bar2_op_read.sum.per_second 0.0196271
29+
pcie__rx_requests_aperture_bar2_op_read.sum 385
30+
pcie__rx_requests_aperture_bar2_op_write.sum.per_second 0.0279368
31+
pcie__rx_requests_aperture_bar2_op_write.sum 548
32+
FBSP.TriageSCG.dramc__read_throughput.avg.pct_of_peak_sustained_elapsed 3.12
33+
FBSP.TriageSCG.dramc__write_throughput.avg.pct_of_peak_sustained_elapsed 0.289944
34+
LTS.TriageSCG.lts__t_sector_throughput_srcunit_tex.avg.pct_of_peak_sustained_elapsed 2.40237
35+
LTS.TriageSCG.lts__t_sector_throughput_srcunit_gcc.avg.pct_of_peak_sustained_elapsed 1.62815
36+
LTS.TriageSCG.lts__t_sector_throughput_srcunit_pe.avg.pct_of_peak_sustained_elapsed 0.00356426
37+
ROP.TriageSCG.lts__t_sector_throughput_srcunit_crop.avg.pct_of_peak_sustained_elapsed 0.0296463
38+
ROP.TriageSCG.lts__t_sector_throughput_srcunit_zrop.avg.pct_of_peak_sustained_elapsed 0
39+
TriageSCG.lts__t_sector_throughput_srcunit_raster.avg.pct_of_peak_sustained_elapsed 0.330087
40+
LTS.TriageSCG.lts__t_sector_throughput_srcnode_fbp.avg.pct_of_peak_sustained_elapsed 0.00158123
41+
LTS.TriageSCG.lts__t_sector_throughput_srcnode_hub.avg.pct_of_peak_sustained_elapsed 1.69742
42+
LTS.TriageSCG.lts__average_t_sector_hit_rate_realtime.pct 81.2163
43+
SM_C.TriageSCG.smsp__inst_executed_pipe_fma.avg.pct_of_peak_sustained_elapsed 10.5253
44+
SM_C.TriageSCG.smsp__inst_executed_pipe_fma.sum 7.29385e+08
45+
SM_C.TriageSCG.smsp__inst_executed_pipe_fma.avg.per_cycle_elapsed 0.421011
46+
SM_C.TriageSCG.smsp__inst_executed_pipe_fma.avg.peak_sustained 4
47+
SM_C.TriageSCG.smsp__inst_executed_pipe_fmaheavy.avg.pct_of_peak_sustained_elapsed 6.83763
48+
SM_C.TriageSCG.smsp__inst_executed_pipe_fmaheavy.sum 2.36918e+08
49+
SM_C.TriageSCG.smsp__inst_executed_pipe_fmaheavy.avg.per_cycle_elapsed 0.136753
50+
SM_C.TriageSCG.smsp__inst_executed_pipe_fmaheavy.avg.peak_sustained 2
51+
sm__pipe_tensor_cycles_active_realtime.avg.pct_of_peak_sustained_elapsed 0
52+
sm__pipe_tensor_cycles_active_realtime.sum 0
53+
smsp__thread_inst_executed_per_inst_executed.ratio 28.292
54+
smsp__thread_inst_executed_per_inst_executed.pct 88.4125
55+
LTS.TriageSCG.lts__average_t_sector_hit_rate_srcunit_tex_realtime.pct 99.6297
56+
FE_B.TriageSCG.gr__ctas_launched_queue_sync.sum.per_second 0.734156
57+
FE_B.TriageSCG.gr__ctas_launched_queue_sync.sum 14401
58+
FE_B.TriageSCG.gr__ctas_launched_queue_async.sum.per_second 0
59+
FE_B.TriageSCG.gr__ctas_launched_queue_async.sum 0
60+
FE_A.TriageSCG.gr__compute_cycles_active_queue_sync.avg.pct_of_peak_sustained_elapsed 82.9923
61+
FE_A.TriageSCG.gr__compute_cycles_active_queue_async.avg.pct_of_peak_sustained_elapsed 0.0318724
62+
FE_A.TriageSCG.gr__dispatch_count.sum 2
63+
GPC_A.TriageSCG.raster__throughput.avg.pct_of_peak_sustained_elapsed 0.0301417
64+
GPC_A.TriageSCG.prop__throughput.avg.pct_of_peak_sustained_elapsed 0.131119
65+
ROP.TriageSCG.crop__throughput.avg.pct_of_peak_sustained_elapsed 0.132693
66+
ROP.TriageSCG.zrop__throughput.avg.pct_of_peak_sustained_elapsed 0
67+
crop__read_throughput.avg.pct_of_peak_sustained_elapsed 0.00358251
68+
crop__write_throughput.avg.pct_of_peak_sustained_elapsed 0.055826
69+
zrop__read_throughput.avg.pct_of_peak_sustained_elapsed 0
70+
zrop__write_throughput.avg.pct_of_peak_sustained_elapsed 0
71+
GPC_A.TriageSCG.prop__prop2zrop_pixels_stage_latez_realtime.avg.pct_of_peak_sustained_elapsed 0
72+
GPC_A.TriageSCG.raster__zcull_input_samples_realtime.avg.per_cycle_elapsed 0.0250038
73+
GPC_A.TriageSCG.raster__zcull_input_samples_realtime.avg.pct_of_peak_sustained_elapsed 0.0097671
74+
GPC_A.TriageSCG.raster__zcull_input_samples_realtime.sum 3.76678e+06
75+
GPC_A.TriageSCG.raster__zcull_input_samples_op_accepted_realtime.avg.per_cycle_elapsed 0.0250038
76+
GPC_A.TriageSCG.raster__zcull_input_samples_op_accepted_realtime.avg.pct_of_peak_sustained_elapsed 0.0097671
77+
GPC_A.TriageSCG.raster__zcull_input_samples_op_accepted_realtime.sum 3.76678e+06
78+
GPC_A.TriageSCG.prop__input_pixels_type_3d_realtime.avg.per_cycle_elapsed 0.0141987
79+
GPC_A.TriageSCG.prop__input_pixels_type_3d_realtime.avg.pct_of_peak_sustained_elapsed 0.0221854
80+
GPC_A.TriageSCG.prop__input_pixels_type_3d_realtime.sum 2.13901e+06
81+
GPC_A.TriageSCG.prop__prop2zrop_pixels_realtime.avg.per_cycle_elapsed 0
82+
GPC_A.TriageSCG.prop__prop2zrop_pixels_realtime.avg.pct_of_peak_sustained_elapsed 0
83+
GPC_A.TriageSCG.prop__prop2zrop_pixels_realtime.sum 0
84+
GPC_A.TriageSCG.prop__prop2zrop_pixels_op_passed_realtime.avg.per_cycle_elapsed 0.00611755
85+
GPC_A.TriageSCG.prop__prop2zrop_pixels_op_passed_realtime.avg.pct_of_peak_sustained_elapsed 0.00477934
86+
GPC_A.TriageSCG.prop__prop2zrop_pixels_op_passed_realtime.sum 921600
87+
GPC_A.TriageSCG.prop__prop2crop_pixels_realtime.avg.per_cycle_elapsed 0.014171
88+
GPC_A.TriageSCG.prop__prop2crop_pixels_realtime.avg.pct_of_peak_sustained_elapsed 0.0885691
89+
GPC_A.TriageSCG.prop__prop2crop_pixels_realtime.sum 2.13485e+06
90+
HUB_B.TriageSCG.pda__throughput.avg.pct_of_peak_sustained_elapsed 0.000367926
91+
TPC.TriageSCG.vaf__throughput.avg.pct_of_peak_sustained_elapsed 0.000587835
92+
GPC_B.TriageSCG.pes__throughput.avg.pct_of_peak_sustained_elapsed 0.00115412
93+
HUB_B.TriageSCG.pda__input_prims_realtime.avg.pct_of_peak_sustained_elapsed 0.000367926
94+
HUB_B.TriageSCG.pda__input_prims_realtime.avg 1888
95+
HUB_B.TriageSCG.pda__input_prims_realtime.sum 1888
96+
FE_B.TriageSCG.fe__draw_count.sum 58
97+
FE_B.TriageSCG.fe__pixel_shader_barriers.sum 19
98+
FE_B.TriageSCG.fe__output_ops_cmd_go_idle_queue_sync.sum 61
99+
FE_A.TriageSCG.fe__output_ops_cmd_subchsw_queue_sync.sum 28
100+
FE_B.TriageSCG.fe__cycles_stalled_cmd_wfi_queue_sync.avg.pct_of_peak_sustained_elapsed 3.20003
101+
FE_B.TriageSCG.fe__cycles_stalled_cmd_wfi_queue_sync.sum 1.0263e+06
102+
FE_B.TriageSCG.fe__output_ops_cmd_go_idle_queue_async.sum 1
103+
FE_A.TriageSCG.fe__output_ops_cmd_subchsw_queue_async.sum 1
104+
FE_B.TriageSCG.fe__cycles_stalled_cmd_wfi_queue_async.avg.pct_of_peak_sustained_elapsed 0.0334251
105+
FE_B.TriageSCG.fe__cycles_stalled_cmd_wfi_queue_async.sum 10720
106+
gpc__cycles_elapsed.avg.per_second 1920
107+
gpc__cycles_elapsed.avg 3.76621e+07
108+
sys__cycles_elapsed.avg.per_second 1635
109+
sys__cycles_elapsed.avg 3.20717e+07
110+
lts__cycles_elapsed.avg.per_second 1710
111+
lts__cycles_elapsed.avg 3.35428e+07
112+
dramc__cycles_elapsed.avg.per_second 10491.5
113+
dramc__cycles_elapsed.avg 2.05798e+08
114+
SM_A.TriageSCG.l1tex__throughput.avg.pct_of_peak_sustained_elapsed 15.2134
115+
SM_A.TriageSCG.sm__inst_executed_realtime.avg.pct_of_peak_sustained_elapsed 36.1949
116+
SM_A.TriageSCG.sm__inst_executed_realtime.sum 2.50824e+09
117+
SM_A.TriageSCG.sm__inst_executed_realtime.avg.per_cycle_elapsed 1.44779
118+
SM_A.TriageSCG.sm__inst_executed_realtime.avg.peak_sustained 4
119+
SM_A.TriageSCG.sm__inst_executed_pipe_alu_realtime.avg.pct_of_peak_sustained_elapsed 38.2916
120+
SM_A.TriageSCG.sm__inst_executed_pipe_alu_realtime.sum 1.32677e+09
121+
SM_A.TriageSCG.sm__inst_executed_pipe_alu_realtime.avg.per_cycle_elapsed 0.765832
122+
SM_A.TriageSCG.sm__inst_executed_pipe_alu_realtime.avg.peak_sustained 2
123+
SM_A.TriageSCG.sm__inst_executed_pipe_xu_realtime.avg.pct_of_peak_sustained_elapsed 10.6193
124+
SM_A.TriageSCG.sm__inst_executed_pipe_xu_realtime.sum 9.19873e+07
125+
SM_A.TriageSCG.sm__inst_executed_pipe_xu_realtime.avg.per_cycle_elapsed 0.0530964
126+
SM_A.TriageSCG.sm__inst_executed_pipe_xu_realtime.avg.peak_sustained 0.5
127+
SM_B.TriageSCG.tpc__warps_active_shader_vtg_realtime.avg.per_cycle_elapsed 0.00014169
128+
SM_B.TriageSCG.tpc__warps_active_shader_vtg_realtime.avg.pct_of_peak_sustained_elapsed 0.000295187
129+
TPC.TriageSCG.tpc__warps_active_shader_ps_realtime.avg.per_cycle_elapsed 0.0245833
130+
TPC.TriageSCG.tpc__warps_active_shader_ps_realtime.avg.pct_of_peak_sustained_elapsed 0.0512152
131+
TPC.TriageSCG.tpc__warps_active_shader_cs_realtime.avg.per_cycle_elapsed 5.4674
132+
TPC.TriageSCG.tpc__warps_active_shader_cs_realtime.avg.pct_of_peak_sustained_elapsed 11.3904
133+
TriageSCG.tpc__warps_inactive_sm_active_realtime.avg.per_cycle_elapsed 31.0897
134+
TriageSCG.tpc__warps_inactive_sm_active_realtime.avg.pct_of_peak_sustained_elapsed 64.7702
135+
TPC.TriageSCG.tpc__sm_rf_registers_allocated_shader_3d_realtime.avg.pct_of_peak_sustained_elapsed 0.0423113
136+
TPC.TriageSCG.tpc__sm_rf_registers_allocated_shader_3d_realtime.avg.per_cycle_elapsed 27.7291
137+
TPC.TriageSCG.tpc__sm_rf_registers_allocated_shader_cs_realtime.avg.pct_of_peak_sustained_elapsed 68.3443
138+
TPC.TriageSCG.tpc__sm_rf_registers_allocated_shader_cs_realtime.avg.per_cycle_elapsed 44790.1
139+
TPC.TriageSCG.tpc__l1tex_mem_shared_data_tram_bytes_allocated_realtime.avg.per_cycle_elapsed 0.38392
140+
TPC.TriageSCG.tpc__l1tex_mem_shared_data_compute_bytes_allocated_queue_sync_realtime.avg.per_cycle_elapsed 41510.2
141+
TPC.TriageSCG.tpc__l1tex_mem_shared_data_compute_bytes_allocated_queue_async_realtime.avg.per_cycle_elapsed 0
142+
TPC.TriageSCG.tpc__l1tex_mem_shared_data_isbe_bytes_allocated_realtime.avg.per_cycle_elapsed 0.33932
143+
SM_B.TriageSCG.sm__ctas_active.avg.pct_of_peak_sustained_elapsed 11.852
144+
SM_B.TriageSCG.sm__ctas_active.avg.per_cycle_elapsed 2.84447
145+
SM_B.TriageSCG.l1tex__t_sector_hit_rate.pct 84.5574
146+
SM_A.TriageSCG.l1tex__data_pipe_lsu_wavefronts.avg.pct_of_peak_sustained_elapsed 15.2134
147+
SM_A.TriageSCG.l1tex__lsu_writeback_active.avg.pct_of_peak_sustained_elapsed 8.87206
148+
SM_A.TriageSCG.l1tex__data_pipe_tex_wavefronts.avg.pct_of_peak_sustained_elapsed 0.0448575
149+
SM_A.TriageSCG.l1tex__f_wavefronts_realtime.avg.pct_of_peak_sustained_elapsed 0.0905451
150+
SM_A.TriageSCG.l1tex__tex_writeback_active.avg.pct_of_peak_sustained_elapsed 0.0643766
151+
SM_A.TriageSCG.l1tex__data_pipe_lsu_wavefronts_mem_lg.avg.pct_of_peak_sustained_elapsed 2.49725
152+
SM_A.TriageSCG.l1tex__data_pipe_lsu_wavefronts_mem_shared_realtime.avg.pct_of_peak_sustained_elapsed 12.7029
153+
SM_A.TriageSCG.l1tex__data_pipe_lsu_wavefronts_mem_surface_realtime.avg.pct_of_peak_sustained_elapsed 0.0133034

0 commit comments

Comments
 (0)