Re: [csswg-drafts] [css-text] Clarify what ligatures are optional

Ok, this is a bit like a ball of spaghetti. Let me try to unravel it a little.

First of all, the word បាទ used for the example is in the Khmer script, not Myanmar (see https://r12a.github.io/uniview/?charlist=%E1%9E%94%E1%9E%B6%E1%9E%91). 

Second, the ligation occurs between a base character and a combining character, and my understanding is that they should not normally be separated by `text-justify:inter-character`, since they constitute a basic grapheme cluster. The units around which spacing occurs, according to https://drafts.csswg.org/css-text-3/#valdef-text-justify-inter-character, are what CSS calls 'typographic character units'. They are defined at https://drafts.csswg.org/css-text-3/#typographic-character-unit, which implies that a grapheme cluster is usually not broken (ie. you wouldn't want to separate បា).

However, third, in Thai (which is a different script again), the particular vowel-sign represented by the second character apparently is separated in certain similar circumstances, though not identical. This is actually described in an example in the spec (note that the example involves an additional combining character, the anusvara):

> In other scripts such as Thai or Lao, even though for line-breaking the typographic character matches Unicode’s default grapheme clusters, for letter-spacing the relevant unit is less than a [UAX29] grapheme cluster, and may require decomposition or other substitutions before spacing can be inserted. 

> For instance, to properly letter-space the Thai word คำ (U+0E04 + U+0E33), the U+0E33 needs to be decomposed into U+0E4D + U+0E32, and then the extra letter-space inserted before the U+0E32: คํ า. 

I'm not sure whether the presence of the anusvara (the small circle) is required to trigger this behaviour. That combination is also possible in Khmer (បាំ) and also has the special joining form of BA, but the example given is the simpler form.

So i guess there are a number of questions to ask here:

1. Is the reporter of the original bug concerned about Myanmar or Khmer text? Note that the equivalent in Myanmar, ပာဒ်, doesn't involve such a ligated form afaik, so i think they mean Khmer.

2. Does Khmer orthography expect U+1794 KHMER LETTER BA and U+17B6 KHMER VOWEL SIGN AA​ to be separated when `text-justify:inter-character` is applied?

3. Do the same rules apply for tracking as for justification? (The example of Thai above is actually in the context of letter-space, rather than justification iirc.)

3. If BA + AA are separated in Khmer, is the expectation that the special joining form of BA be retained or not? (I would assume not.)

We are in the process of putting together a group of experts in SE Asian scripts.  I'll raise an issue asking them what the expected behaviour should be.





-- 
GitHub Notification of comment by r12a
Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/2644#issuecomment-387723314 using your GitHub account

Received on Wednesday, 9 May 2018 12:33:01 UTC