Re: [csswg-drafts] [css-text] Questionable Thai words

Related to this general discussion, fwiw, I've been working on some text to describe the various different approaches to line-breaking, that may become an article at some point.  It includes the following table, where names represent scripts (it's just an excerpt, and doesn't include the information about archaic scripts). The term 'word' here represents a vague concept that can be one or more syllables, and of course special rules apply to pretty much all scripts affecting what can and can't start and end a line.

  | Space as word separator | Other word separator | Syllable separator | No word or syllable separator
-- | -- | -- | -- | --
**Wraps words** | Hangul*, Arabic, Armenian, Bengali, Cherokee, Cyrillic, Devanagari, Greek, Gujarati, Gurmukhi, Hebrew, Kannada, Latin, Malayalam, Mandaic, N’Ko, Oriya, Sinhala, Syriac, Tamil, Telugu, Thaana, Tifinagh**, UCAS, Coptic? Glagolitic, Georgian, Newa?, Mongolian?, Limbu?, Meetei Mayak?, Mro?, Old Chiki?, Chakma?, Lepcha?, Saurashtra?, Masaram Gondi?, Tai Viet**, Pau Cin Hau, Adlam, Osage?, Deseret | Ethiopic, Samaritan |   | Khmer, Lao, Myanmar, Thai, Tai Le?, Tai Tham?,
**Wraps syllables** | Sundanese, Buginese ?, Cham, Lisu*** |   | Tibetan | Balinese, Javanese, Batak
**Wraps characters** | Hangul* |   |   | Chinese, Japanese, Yi ?, Vai

Notice that this divides up the problem space in a slightly different way than the spec. Note, in particular, that it's not always a question of wrapping 'words' when morphological analysis is applied to determine line breaks – several scripts simply wrap at syllable boundaries, whether or not those syllables are complete words. Determining those syllable boundaries, however, may also require understanding the text (eg. unless the application understands the text to some extent, it may be difficult to tell whether a character representing a nasal sound has an inherent vowel (ie. is a syllable in itself), or is just the final consonant in a syllable.)

-- 
GitHub Notification of comment by r12a
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/2455#issuecomment-375196330 using your GitHub account

Received on Thursday, 22 March 2018 06:45:51 UTC