The concept of a "word" is difficult to define, usually referring to a grammatical unit smaller than a phrase and containing one or more syllables.
Word separators differ across languages, and specs should not assume that words are always separated by spaces. Even for the same language, ancient and modern usage may differ.
I think there are two places where related guideline could be added: one is 6.1 Choosing text units for segmentation, indexing, etc., and the other is 9. Typographic support (possibly in 9.9 Miscellaneous).
Here are some examples:
In Arabic, short words like "and" (و) can be written directly next to the preceding word without a space (e.g., الجامعات والكليات means "universities and colleges", but there is only one space). In typesetting, these words can be treated as part of the word they are attached to.
Many scripts, such as Balinese, Batak, Tai Lue, and Khmer, do not have word separators, and the definition of a word is subjective. Spaces may appear in these languages, but they may be phrase separators rather than word separators.
Also, in Vietnamese written with the Latin alphabet and in Fraser script, spaces are used to separate syllables, not words.
In scripts like Chinese, Japanese, and Tibetan, there are no spaces at all (except for a few exceptions, such as textbooks for foreigners).
The concept of a "word" is difficult to define, usually referring to a grammatical unit smaller than a phrase and containing one or more syllables.
Word separators differ across languages, and specs should not assume that words are always separated by spaces. Even for the same language, ancient and modern usage may differ.
I think there are two places where related guideline could be added: one is 6.1 Choosing text units for segmentation, indexing, etc., and the other is 9. Typographic support (possibly in 9.9 Miscellaneous).
Here are some examples:
In Arabic, short words like "and" (و) can be written directly next to the preceding word without a space (e.g., الجامعات والكليات means "universities and colleges", but there is only one space). In typesetting, these words can be treated as part of the word they are attached to.
Many scripts, such as Balinese, Batak, Tai Lue, and Khmer, do not have word separators, and the definition of a word is subjective. Spaces may appear in these languages, but they may be phrase separators rather than word separators.
Also, in Vietnamese written with the Latin alphabet and in Fraser script, spaces are used to separate syllables, not words.
In scripts like Chinese, Japanese, and Tibetan, there are no spaces at all (except for a few exceptions, such as textbooks for foreigners).