Commit 249bf97
arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode
With KASAN_HW_TAGS (MTE) in synchronous mode, tag check faults are
reported as immediate Data Abort exceptions. The TFSR_EL1.TF1 bit is
never set since faults never go through the asynchronous path.
Therefore, reading TFSR_EL1 and executing data and instruction barriers
on kernel entry, exit, context switch and suspend is unnecessary
overhead.
As with the check_mte_async_tcf and clear_mte_async_tcf paths for
TFSRE0_EL1, extend the same optimisation to kernel entry/exit, context
switch and suspend.
All mte kselftests pass. The kunit before and after the patch show same
results.
A selection of test_vmalloc benchmarks running on a arm64 machine.
v6.19 is the baseline. (>0 is faster, <0 is slower, (R)/(I) =
statistically significant Regression/Improvement). Based on significance
and ignoring the noise, the benchmarks improved.
* 77 result classes were considered, with 9 wins, 0 losses and 68 ties
Results of fastpath [1] on v6.19 vs this patch:
+----------------------------+----------------------------------------------------------+------------+
| Benchmark | Result Class | barriers |
+============================+==========================================================+============+
| micromm/fork | fork: p:1, d:10 (seconds) | (I) 2.75% |
| | fork: p:512, d:10 (seconds) | 0.96% |
+----------------------------+----------------------------------------------------------+------------+
| micromm/munmap | munmap: p:1, d:10 (seconds) | -1.78% |
| | munmap: p:512, d:10 (seconds) | 5.02% |
+----------------------------+----------------------------------------------------------+------------+
| micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | -0.56% |
| | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 0.70% |
| | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 1.18% |
| | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | -5.01% |
| | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 13.81% |
| | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 6.51% |
| | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 32.87% |
| | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 4.17% |
| | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 8.40% |
| | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | -0.48% |
| | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | -0.74% |
| | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 0.53% |
| | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | -2.81% |
| | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | -2.06% |
| | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | -0.56% |
| | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | -0.41% |
| | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 0.89% |
| | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 1.71% |
| | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 0.83% |
+----------------------------+----------------------------------------------------------+------------+
| schbench/thread-contention | -m 16 -t 1 -r 10 -s 1000, avg_rps (req/sec) | 0.05% |
| | -m 16 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | 0.60% |
| | -m 16 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 16 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.34% |
| | -m 16 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | -0.58% |
| | -m 16 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 9.09% |
| | -m 16 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.74% |
| | -m 16 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | -1.40% |
| | -m 16 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 16 -t 64 -r 10 -s 1000, avg_rps (req/sec) | -0.78% |
| | -m 16 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | -0.11% |
| | -m 16 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.11% |
| | -m 16 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 2.64% |
| | -m 16 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 3.15% |
| | -m 16 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | 17.54% |
| | -m 32 -t 1 -r 10 -s 1000, avg_rps (req/sec) | -1.22% |
| | -m 32 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | 0.85% |
| | -m 32 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 32 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.34% |
| | -m 32 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | 1.05% |
| | -m 32 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 32 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.41% |
| | -m 32 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | 0.58% |
| | -m 32 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 2.13% |
| | -m 32 -t 64 -r 10 -s 1000, avg_rps (req/sec) | 0.67% |
| | -m 32 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | 2.07% |
| | -m 32 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | -1.28% |
| | -m 32 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 1.01% |
| | -m 32 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 0.69% |
| | -m 32 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | 13.12% |
| | -m 64 -t 1 -r 10 -s 1000, avg_rps (req/sec) | -0.25% |
| | -m 64 -t 1 -r 10 -s 1000, req_latency_p99 (usec) | -0.48% |
| | -m 64 -t 1 -r 10 -s 1000, wakeup_latency_p99 (usec) | 10.53% |
| | -m 64 -t 4 -r 10 -s 1000, avg_rps (req/sec) | -0.06% |
| | -m 64 -t 4 -r 10 -s 1000, req_latency_p99 (usec) | 0.00% |
| | -m 64 -t 4 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.00% |
| | -m 64 -t 16 -r 10 -s 1000, avg_rps (req/sec) | -0.36% |
| | -m 64 -t 16 -r 10 -s 1000, req_latency_p99 (usec) | 0.52% |
| | -m 64 -t 16 -r 10 -s 1000, wakeup_latency_p99 (usec) | 0.11% |
| | -m 64 -t 64 -r 10 -s 1000, avg_rps (req/sec) | 0.52% |
| | -m 64 -t 64 -r 10 -s 1000, req_latency_p99 (usec) | 3.53% |
| | -m 64 -t 64 -r 10 -s 1000, wakeup_latency_p99 (usec) | -0.10% |
| | -m 64 -t 256 -r 10 -s 1000, avg_rps (req/sec) | 2.53% |
| | -m 64 -t 256 -r 10 -s 1000, req_latency_p99 (usec) | 1.82% |
| | -m 64 -t 256 -r 10 -s 1000, wakeup_latency_p99 (usec) | -5.80% |
+----------------------------+----------------------------------------------------------+------------+
| syscall/getpid | mean (ns) | (I) 15.98% |
| | p99 (ns) | (I) 11.11% |
| | p99.9 (ns) | (I) 16.13% |
+----------------------------+----------------------------------------------------------+------------+
| syscall/getppid | mean (ns) | (I) 14.82% |
| | p99 (ns) | (I) 17.86% |
| | p99.9 (ns) | (I) 9.09% |
+----------------------------+----------------------------------------------------------+------------+
| syscall/invalid | mean (ns) | (I) 17.78% |
| | p99 (ns) | (I) 11.11% |
| | p99.9 (ns) | 13.33% |
+----------------------------+----------------------------------------------------------+------------+
[1] https://gitlab.arm.com/tooling/fastpath
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Reviewed-by: David Hildenbrand (Arm) <[email protected]>
Reviewed-by: Yeoreum Yun <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>1 parent abed23c commit 249bf97
2 files changed
Lines changed: 12 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
252 | 252 | | |
253 | 253 | | |
254 | 254 | | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
255 | 258 | | |
256 | 259 | | |
257 | 260 | | |
| |||
260 | 263 | | |
261 | 264 | | |
262 | 265 | | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
263 | 269 | | |
264 | 270 | | |
265 | 271 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
291 | 291 | | |
292 | 292 | | |
293 | 293 | | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
294 | 297 | | |
295 | 298 | | |
296 | 299 | | |
| |||
350 | 353 | | |
351 | 354 | | |
352 | 355 | | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
353 | 359 | | |
354 | 360 | | |
355 | 361 | | |
| |||
0 commit comments