Skip to content

ultimatile/runic-transcriber

Repository files navigation

runic-transcriber

Transcribe English words into runes via their pronunciation, not by letter-for-letter transliteration. English text is run through a phonemic intermediate representation, so spelling irregularities are resolved by sound:

knight  ->  ᚾᚪᛁᛏ      (/naɪt/: the silent k and gh have no reflex)
vision  ->  ᚠᛁᛋᚳᚢᚾ    (/ˈvɪʒən/: v->f, ʒ->sc)

Target system: Anglo-Saxon Futhorc, the historical extension of Elder Futhark to English-family phonology.

Installation

uv tool install git+https://github.com/ultimatile/runic-transcriber

Usage

# Transcribe words passed as arguments
runic-transcriber knight vision

# Read from stdin
echo "hello world" | runic-transcriber

# Strict mode: fail instead of approximating sounds Futhorc lacks
runic-transcriber --strict vision   # exits 2: /v/ has no exact rune (merges with f)

Non-letter characters (spaces, digits, punctuation) pass through unchanged. Out-of-vocabulary words exit with an error (see Limitations).

Limitations

  • Unknown words fail. A word the pronunciation dictionary doesn't know — uncommon or technical words, and most names — produces an error instead of runes.
  • Accented and non-English spellings fail. résumé, café, naïve and the like aren't recognized.
  • American pronunciation only. Words are sounded out in General American English; other accents aren't reflected.
  • By default, some distinct sounds share a rune. Futhorc has no rune for several English sounds (v, the th in this, z, sh, the s in measure); best-effort mode writes each with its nearest rune, so words that sound different can come out identical. --strict mode rejects such a word instead.

Library API

from runic_transcriber import transcribe, word_to_runes, word_to_phonemes

transcribe("knight")              # -> "ᚾᚪᛁᛏ"
transcribe("cat hat")             # -> "ᚳᚫᛏ ᚻᚫᛏ"
word_to_runes("vision")           # -> "ᚠᛁᛋᚳᚢᚾ"
word_to_phonemes("cat")           # -> ["K", "AE1", "T"]
transcribe("vision", strict=True) # raises UntranscribableError

How it works

A three-layer pipeline:

Layer Step Implementation
L1 English word -> ARPABET phonemes CMU Pronouncing Dictionary, bundled as data/cmudict.dict
L2 phonemes -> abstract phoneme sequence rounded onto the target inventory cost table in data/futhorc.toml
L3 abstract phoneme -> rune glyph the cost table's values

Each phoneme maps with a cost tier recording how faithful the rune is:

cost meaning
0 exact — the phoneme has its own rune
1 merge — a contrast English has but Futhorc does not
2 decompose — one phoneme becomes several runes
3 approximate — nearest fit, no principled rune

The per-phoneme mappings live in data/futhorc.toml. By default (best-effort) every word renders, approximating where needed; in --strict mode a word is rejected unless every phoneme has a cost-0 rune.

Development

uv run pytest -v

Licenses

This project is released under the MIT License — see LICENSE.

It bundles the CMU Pronouncing Dictionary as data/cmudict.dict for the L1 lookup, taken verbatim from the upstream cmusphinx/cmudict repository. CMUdict is Copyright (C) 1993-2015 Carnegie Mellon University and may be redistributed with attribution; its full license terms are retained in data/cmudict.LICENSE.

About

Transcribe English words into runes via their pronunciation

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages