Relates to #48 and #492.
Cadmus is a Reader software for e-readers.
The original HTML parsing implementation was hand written, and is not spec complete and contains some minor bugs.
While working on OGKevin/cadmus#343, I decided to give html5ever a try to replace at least the parsing bit.
It turns out, that html5ever does not provide a way to accurately know a position within a document.
The reason this is needed:
- Save and restore reading positions across sessions
- Persist bookmarks and annotations
- Resolve
#anchor-id URI fragment links
For all of these to work correctly across re-parses, the offset stored on each node
must be the byte position of that node's opening token in the source string. It
needs to be stable and comparable to the raw byte sizes of the EPUB spine entries.
For the rendering of dictionary HTML, html5ever covers the use case, as there is no need for position tracking,
but it can't be used for EPUB rendering.
Would it be interesting to add the ability for when parsing a document, to store byte offsets, as this
would be the most stable way to refer to a position within a document and doesn't matter
which parsing system is being used.
Cadmus is a Reader software for e-readers.
The original HTML parsing implementation was hand written, and is not spec complete and contains some minor bugs.
While working on OGKevin/cadmus#343, I decided to give html5ever a try to replace at least the parsing bit.
It turns out, that html5ever does not provide a way to accurately know a position within a document.
The reason this is needed:
#anchor-idURI fragment linksFor all of these to work correctly across re-parses, the offset stored on each node
must be the byte position of that node's opening token in the source string. It
needs to be stable and comparable to the raw byte sizes of the EPUB spine entries.
For the rendering of dictionary HTML, html5ever covers the use case, as there is no need for position tracking,
but it can't be used for EPUB rendering.
Would it be interesting to add the ability for when parsing a document, to store byte offsets, as this
would be the most stable way to refer to a position within a document and doesn't matter
which parsing system is being used.