Re: [csswg-drafts] [css-text] Questionable Thai words

Whether there is a word-break at a particular point is not as clear cut in Thai as in English. At some points, you can say there definitely isn't a word-break, for example, within a syllable. At other points, you can say there definitely is. But there's a large grey area, where native Thai speakers will disagree and it really depends on what you mean by "word" and what you are trying to do. The main grey area is compound words. Compound words are a lot more common in Thai than English, say, because native Thai words (not derived from Pali, Sanskrit, Khmer etc) are mono-syllabic. In a compound word, whether you can have a word break is a matter of degree depending on the semantics: to what extent is the meaning of the compound derivable from the meaning of the components. So for example: การเขียน is a compound word meaning "writing" which is composed from "การ" which is a word that is used to create a noun phrase from a verb and "เขียน" which means to write. There's a little bit more to the meaning of "การเขียน" than it's two components, but not a lot, so breaking between การ and เขียน  is not as good as breaking between การเขียน and ภาษาไทย, but is pretty OK and it would depend on the typographic situation as to whether in practise a typographer would break there.

I just looked at a Thai weekly news magazine (Matichon weekly) on my breakfast table, which is set with quite narrow columns, and it breaks compounds quite aggressively (i.e. approximately และ•ตัวอย่าง•การ•เขียน•ภาษา•ไทย).  You could even break ตัว•อย่าง but that would start to be a little strange.

Another complication is that what you want for line breaking is not necessarily what you want for selection or cursor movement by word.

-- 
GitHub Notification of comment by jclark
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/2455#issuecomment-375151681 using your GitHub account

Received on Thursday, 22 March 2018 01:47:48 UTC