|
1 | | -# Candidate Benchmark Programs |
| 1 | +# Benchmarks |
2 | 2 |
|
3 | | -This directory contains the candidate programs for the benchmark suite. They are |
4 | | -candidates, not officially part of the suite yet, because we [intend][rfc] to |
5 | | -record various metrics about the programs and then run a principal component |
6 | | -analysis to find a representative subset of candidates that doesn't contain |
7 | | -effectively duplicate workloads. |
8 | | - |
9 | | -[rfc]: https://github.com/bytecodealliance/rfcs/pull/4 |
10 | | - |
11 | | -## Building |
12 | | - |
13 | | -Build an individual benchmark program via: |
14 | | - |
15 | | -``` |
16 | | -$ ./build.sh path/to/benchmark/dir/ |
17 | | -``` |
18 | | - |
19 | | -Build all benchmark programs by running: |
20 | | - |
21 | | -``` |
22 | | -$ ./build-all.sh |
23 | | -``` |
24 | | - |
25 | | -## Minimal Technical Requirements |
26 | | - |
27 | | -In order for the benchmark runner to successfully execute a Wasm program and |
28 | | -record its execution, it must: |
29 | | - |
30 | | -* Export a `_start` function of type `[] -> []`. |
31 | | - |
32 | | -* Import `bench.start` and `bench.end` functions, both of type `[] -> []`. |
33 | | - |
34 | | -* Call `bench.start` exactly once during the execution of its `_start` |
35 | | - function. This is when the benchmark runner will start recording execution |
36 | | - time and performance counters. |
37 | | - |
38 | | -* Call `bench.end` exactly once during execution of its `_start` function, after |
39 | | - `bench.start` has already been called. This is when the benchmark runner will |
40 | | - stop recording execution time and performance counters. |
41 | | - |
42 | | -* Provide reproducible builds via Docker (see [`build.sh`](./build.sh)). |
43 | | - |
44 | | -* Be located in a `sightglass/benchmarks/$BENCHMARK_NAME` directory. Typically |
45 | | - the benchmark is named `benchmark.wasm`, but benchmarks with multiple files |
46 | | - should use names like `<benchmark name>-<subtest name>.wasm` (e.g., |
47 | | - `libsodium-chacha20.wasm`). |
48 | | - |
49 | | -* Input workloads must be files that live in the same directory as the `.wasm` |
50 | | - benchmark program. The benchmark program is run within the directory where it |
51 | | - lives on the filesystem, with that directory pre-opened in WASI. The workload |
52 | | - must be read via a relative file path. |
53 | | - |
54 | | - If, for example, the benchmark processes JSON input, then its input workload |
55 | | - should live at `sightglass/benchmarks/$BENCHMARK_NAME/input.json`, and it |
56 | | - should open that file as `"./input.json"`. |
57 | | - |
58 | | -* Define the expected `stdout` output in a `./<benchmark name>.stdout.expected` |
59 | | - sibling file located next to the `benchmark.wasm` file (e.g., |
60 | | - `benchmark.stdout.expected`). The runner will assert that the actual |
61 | | - execution's output matches the expectation. |
62 | | - |
63 | | -* Define the expected `stderr` output in a `./<benchmark name>.stderr.expected` |
64 | | - sibling file located next to the `benchmark.wasm` file. The runner will assert |
65 | | - that the actual execution's output matches the expectation. |
66 | | - |
67 | | -Many of the above requirements can be checked by running the `.wasm` file |
68 | | -through the `validate` command: |
69 | | - |
70 | | -``` |
71 | | -$ cargo run -- validate path/to/benchmark.wasm |
72 | | -``` |
73 | | - |
74 | | -## Compatibility Requirements for Native Execution |
75 | | - |
76 | | -Sightglass can also measure the performance of a subset of benchmarks compiled |
77 | | -to native code (i.e., not WebAssembly). To compile these benchmarks without |
78 | | -changing their source code, this involves a delicate interface with the [native |
79 | | -engine] with some additional requirements beyond the [Minimal Technical |
80 | | -Requirements] noted above: |
81 | | - |
82 | | -[native engine]: ../engines/native |
83 | | -[Minimal Technical Requirements]: #minimal-technical-requirements |
84 | | - |
85 | | -* Generate an ELF shared library linked to the [native engine] shared library to |
86 | | - provide definitions for `bench_start` and `bench_end`. |
87 | | - |
88 | | -* Rename the `main` function to `native_entry`. For C- and C++-based source this |
89 | | - can be done with a simple define directive passed to `cc` (e.g., |
90 | | - `-Dmain=native_entry`). |
91 | | - |
92 | | -* Provide reproducible builds via a `Dockerfile.native` file (see |
93 | | - [`build-native.sh`](./build-native.sh)). |
94 | | - |
95 | | -Note that support for native execution is optional: adding a WebAssembly |
96 | | -benchmark does not imply the need to support its native equivalent — CI |
97 | | -will not fail if it is not included. |
98 | | - |
99 | | -## Additional Requirements |
100 | | - |
101 | | -> Note: these requirements are lifted directly from the [the benchmarking |
102 | | -> RFC][rfc]. |
103 | | -
|
104 | | -In addition to the minimal technical requirements, for a benchmark program to be |
105 | | -useful to Wasmtime and Cranelift developers, it should additionally meet the |
106 | | -following requirements: |
107 | | - |
108 | | -* Candidates should be real, widely used programs, or at least extracted kernels |
109 | | - of such programs. These programs are ideally taken from domains where Wasmtime |
110 | | - and Cranelift are currently used, or domains where they are intended to be a |
111 | | - good fit (e.g. serverless compute, game plugins, client Web applications, |
112 | | - server Web applications, audio plugins, etc.). |
113 | | - |
114 | | -* A candidate program must be deterministic (modulo Wasm nondeterminism like |
115 | | - `memory.grow` failure). |
116 | | - |
117 | | -* A candidate program must have two associated input workloads: one small and |
118 | | - one large. The small workload may be used by developers locally to get quick, |
119 | | - ballpark numbers for whether further investment in an optimization is worth |
120 | | - it, without waiting for the full, thorough benchmark suite to complete. |
121 | | - |
122 | | -* Each workload must have an expected result, so that we can validate executions |
123 | | - and avoid accepting "fast" but incorrect results. |
124 | | - |
125 | | -* Compiling and instantiating the candidate program and then executing its |
126 | | - workload should take *roughly* one to six seconds total. |
127 | | - |
128 | | - > Napkin math: We want the full benchmark to run in a reasonable amount of |
129 | | - > time, say twenty to thirty minutes, and we want somewhere around ten to |
130 | | - > twenty programs altogether in the benchmark suite to balance diversity, |
131 | | - > simplicity, and time spent in execution versus compilation and |
132 | | - > instantiation. Additionally, for good statistical analyses, we need *at |
133 | | - > least* 30 samples (ideally more like 100) from each benchmark program. That |
134 | | - > leaves an average of about one to six seconds for each benchmark program to |
135 | | - > compile, instantiate, and execute the workload. |
136 | | -
|
137 | | -* Inputs should be given through I/O and results reported through I/O. This |
138 | | - ensures that the compiler cannot optimize the benchmark program away. |
139 | | - |
140 | | -* Candidate programs should only import WASI functions. They should not depend |
141 | | - on any other non-standard imports, hooks, or runtime environment. |
142 | | - |
143 | | -* Candidate programs must be open source under a license that allows |
144 | | - redistributing, modifying and redistributing modified versions. This makes |
145 | | - distributing the benchmark easy, allows us to rebuild Wasm binaries as new |
146 | | - versions are released, and lets us do source-level analysis of benchmark |
147 | | - programs when necessary. |
148 | | - |
149 | | -* Repeated executions of a candidate program must yield independent samples |
150 | | - (ignoring priming Wasmtime's code cache). If the execution times keep taking |
151 | | - longer and longer, or exhibit harmonics, they are not independent and this can |
152 | | - invalidate any statistical analyses of the results we perform. We can easily |
153 | | - check for this property with either [the chi-squared |
154 | | - test](https://en.wikipedia.org/wiki/Chi-squared_test) or [Fisher's exact |
155 | | - test](https://en.wikipedia.org/wiki/Fisher%27s_exact_test). |
156 | | - |
157 | | -* The corpus of candidates should include programs that use a variety of |
158 | | - languages, compilers, and toolchains. |
| 3 | +The set of benchmarks here have been copied from [Sightglass](https:/ |
| 4 | +/github.com/bytecodealliance/sightglass/benchmarks). In general, the benchmarks here and will mostly be consistent with the set of benchmarks in that repository. |
0 commit comments