fix: keep Russian root match from absorbing conjunctions#146
Conversation
23ba5f0 to
3a00c9d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4956df1f95
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "category": "OBSCENE_MAT", | ||
| "severity": "high", | ||
| "source": "ху[йяиёею]", | ||
| "source": "(?:хуй|хуя|хуи|хуё|хуе|хую)", |
There was a problem hiding this comment.
Restore reviewed хуей/ху-ей loose matches
With this source split into fixed alternatives, stretch: true can only repeat the selected final letter, so inputs like хуей or ху-ей now only produce a хуе prefix match; the loose boundary check then rejects it because the same token still has a trailing й. These are existing reviewed loose corpus cases in tests/loose-corpus.spec.ts that should be fully masked, so this change creates a bypass while fixing the following-conjunction case.
Useful? React with 👍 / 👎.
| "category": "OBSCENE_MAT", | ||
| "severity": "high", | ||
| "source": "ху[йяиёею]", | ||
| "source": "(?:хуй|хуя|хуи|хуё|хуе|хую)", |
There was a problem hiding this comment.
Stop absorbing same-letter following words
Because each explicit alternative is still stretched, a following standalone word that starts with the same final letter is consumed as another repeat of that final atom. For example, привет хуи и мир still gets a loose range over хуи и rather than just хуи (and хуя я ... has the same shape), so the range leak this patch is meant to fix remains for base variants whose last letter matches the next word.
Useful? React with 👍 / 👎.
Summary
huybase rule from treating a following standalone conjunction as part of the same loose stretched root.Validation
npm run checkwith a temporary npm cachenpm run benchmark:profanityonorigin/mainnpm run benchmark:profanityon this branchBenchmark Evidence
Baseline
origin/main:This branch:
Runtime behavior changes only for the Russian base root boundary case where a following standalone conjunction was previously absorbed by loose stretching.
Compatibility Notes
хуй иrange now ends after the obscene token.Closes #144
No publish, no merge, no tag/release.