Byte-accurate source positions in TreeSink

> Relates to [#48](https://github.com/servo/html5ever/issues/48) and [#492](https://github.com/servo/html5ever/issues/492).

[Cadmus](https://github.com/OGKevin/cadmus) is a Reader software for e-readers. 

The original HTML parsing implementation was hand written, and is not spec complete and contains some minor bugs. 
While working on https://github.com/OGKevin/cadmus/issues/343, I decided to give html5ever a try to replace at least the parsing bit.

It turns out, that html5ever does not provide a way to accurately know a position within a document.
The reason this is needed:

- Save and restore reading positions across sessions
- Persist bookmarks and annotations
- Resolve `#anchor-id` URI fragment links

For all of these to work correctly across re-parses, the offset stored on each node
must be the **byte position of that node's opening token in the source string**. It
needs to be stable and comparable to the raw byte sizes of the EPUB spine entries.

For the rendering of dictionary HTML, html5ever covers the use case, as there is no need for position tracking,
but it can't be used for EPUB rendering.

Would it be interesting to add the ability for when parsing a document, to store byte offsets, as this
would be the most stable way to refer to a position within a document and doesn't matter
which parsing system is being used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Byte-accurate source positions in TreeSink #734

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Byte-accurate source positions in TreeSink #734

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions