You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Speed up the "dump" phase (Pass 2) of osm2rdf (#122)
Speed up the "dump" phase (Pass 2) of `osm2rdf` via a number of improvements:
1. Move instead of copy `osmium::Buffer` objects directly into dedicated worker threads and operate on the original `libosmium` objects the entire time. Apart from the copy, this also avoids synchronization overhead (previously, each OSM object was a dedicated task). This was non-trivial to achieve, as several `libosmium` handlers were previously applied iteratively to a finished `osmium::Buffer` object, augmenting the objects in pipeline-like fashion until they were finally passed to the handler that tasked them away to the threads. Several of these handlers assumed that the objects arrive ordered by their OSM id, which is no longer guaranteed when buffers are handled in parallel. We solve this by separating these handlers into a separate internal pass.
2. Replace our internal C-API calls to `zlib` by `zlib-ng`.
3. Improve the handling and formatting of attributes. In particular, avoid copies by operating directly on the raw C-strings from the `osmium::Buffer`. Also drop the use of `strftime` and generate the time strings directly.
4. Cherry-pick the smaller spatialjoin IDs proposed in #108 to avoid full IRI writes to disk.
5. Update `libspatialjoin`, which now folds the ID directly into the sweeper event if it fits into 64 bit (which it does for every OSM node), eliminating the writing and lookup of the ID for points.
In total, the changes above **sped up the dump phase for `switzerland-latest.osm.pbf` by a factor of almost 5** (4 min -> 50s), when using `--output-compression gz` and otherwise the default options.
On the side, improve the progress bar by "weighting" individual tasks by their estimated duration. In particular, each way is now weighted by the number of its member nodes.
0 commit comments