Skip to content

lang: zh-Hant falls back to Simplified UI strings instead of Traditional #14417

@cderv

Description

@cderv

Quarto's subtag-prefix resolution splits lang on - and loads most-general-first:

export async function readLanguageTranslations(
translationFile: string,
lang?: string,
): Promise<{ language: FormatLanguage; files: string[] }> {
// read and parse yaml if it exists (track files read)
const files: string[] = [];
// read the original file
const {
file,
language,
} = await translationCache(translationFile);
if (file) {
files.push(file);
}
// determine additional variations to read
const ext = extname(translationFile);
let [dir, stem] = dirAndStem(translationFile);
stem = escapeRegExp(stem);
const variations: string[] = [];
if (lang) {
// enumerate variations dictated by this lang
const subtags = lang.split("-");
for (let i = 0; i < subtags.length; i++) {
variations.push(subtags.slice(0, i + 1).join("-"));
}
} else {
// enumerate all variations
const glob = stem + "-*" + ext;
const variationRe = new RegExp(
"^" + stem + "-(.*?)" + ext,
);
for (
const entry of expandGlobSync(glob, {
root: dir,
includeDirs: false,
caseInsensitive: true,
})
) {
const match = entry.name.match(variationRe);
if (match) {
variations.push(match[1]);
}
}
}
// read the variations
for (const variation of variations) {
const {
file: variationFile,
language: translations,
} = await translationCache(join(dir, stem + "-" + variation + ext));
if (variationFile) {
files.push(variationFile);
}
// const translations = await maybeReadYaml(
// join(dir, stem + "-" + variation + ext),
// );
Object.keys(translations).forEach((key) => {
// top level entries use the variation key
if (kLanguageDefaultsKeys.includes(key)) {
language[variation] = language[variation] || {};
(language[variation] as FormatLanguage)[key] = translations[key];
// objects use variation key + subkey
} else if (typeof translations[key] === "object") {
const targetKey = variation + "-" + key;
language[targetKey] = language[targetKey] || {};
language[targetKey] = {
...language[targetKey] as Record<string, unknown>,
...translations[key] as Record<string, unknown>,
};
}
});
}
return { language, files };
}

For lang: zh-Hant this loads _language-zh.yml (Simplified) and then looks for _language-zh-Hant.yml (doesn't exist). Result: Traditional-script readers get Simplified UI strings — wrong script.

Quarto's _language-zh-TW.yml is Traditional in script but uses Taiwan-specific vocabulary (added deliberately in #6512 — OpenCC script conversion followed by manual region-specific corrections). A generic zh-Hant file does not exist.

Design question

How should lang: zh-Hant resolve?

  • (a) Fallback rule: if _language-zh-Hant.yml doesn't exist, try _language-zh-TW.yml. Taiwan vocabulary is closer to HK/MO than Simplified is, and this matches Pandoc's posture of shipping one generic zh-Hant file. Could be documented as best-effort until a proper generic file exists.
  • (b) Wait for a contributed generic _language-zh-Hant.yml.
  • (c) Add _language-zh-Hant.yml as a copy of the TW file. Behaviorally equivalent to (a) but less honest about regional scope.

Repro

---
lang: zh-Hant
format: html
---

Inspect generated UI strings (TOC title, "Copy to clipboard" tooltip, section titles in HTML article). They render in Simplified Chinese.

From #14409. Related: #14416.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions