Skip to content

OPENNLP-1850: Offset/alignment layer — Alignment, AlignedText, buildAligned (1b/5)#1109

Draft
krickert wants to merge 1 commit into
OPENNLP-1850-1a-enginefrom
OPENNLP-1850-1b-alignment
Draft

OPENNLP-1850: Offset/alignment layer — Alignment, AlignedText, buildAligned (1b/5)#1109
krickert wants to merge 1 commit into
OPENNLP-1850-1a-enginefrom
OPENNLP-1850-1b-alignment

Conversation

@krickert

Copy link
Copy Markdown
Contributor

Part 1b of the OPENNLP-1850 stack: the offset/alignment layer, split out of the former foundation PR (#1103) on review request so the conceptually hard part gets a focused read on its own. Builds on the engine in #1108.

Adds:

  • Alignment (bidirectional edit-sequence, andThen-composable) and AlignedText
  • the OffsetAwareNormalizer capability interface and TextNormalizer.buildAligned()
  • the *Aligned CharClass variants, the offset-aware rungs, and the line-break-preserving rung
  • the dense span-mapping tests: binary-search span mapping, expansion/deletion edge cases, and andThen composition including the insertion-inside-an-expansion case

Base: OPENNLP-1850-1a-engine (#1108). Stack: 1a → 1b (this) → tokenizer → DL → docs.

…gned, *Aligned (1b)

The conceptually hard half of the former foundation PR, split out on review request: the
bidirectional Alignment edit-sequence and AlignedText, the OffsetAwareNormalizer capability
interface, TextNormalizer.buildAligned(), the *Aligned CharClass variants and the offset-aware
rungs, the line-break-preserving rung, and the dense span-mapping tests (binary-search span
mapping, expansion/deletion edge cases, andThen composition including the insertion-in-expansion
case). Builds on the engine in 1a.
@krickert krickert force-pushed the OPENNLP-1850-1b-alignment branch from 9af6d92 to 9dc7d51 Compare June 24, 2026 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant