RE: [css-text] Clusters for letter spacing in Thai and other complex scripts

Ok, what do you think about this:

In some scripts such as Thai or Lao, the UA may apply the additional spacing within a _character_ sometimes along with decompositions. In other scripts such as Myanmar, the UA may disallow applying the additional spacing between a specific pair of _characters_ such as within a syllable.

I haven't run this with my co-editor yet, but does this look good to you?

/koji

From: James Clark [mailto:jjc@jclark.com]
Sent: Wednesday, October 2, 2013 12:32 PM
To: Koji Ishii
Cc: www-style@w3.org
Subject: Re: [css-text] Clusters for letter spacing in Thai and other complex scripts

I really don't think this works for Thai/Lao (I can't speak for Burmese).

The fundamental problem is that UAX#29, which defines grapheme clusters, is about _text_ segmentation.  The boundaries that UAX#29 deals with are boundaries between characters.  It is defined in terms of Normalization Form D, which means you do canonical decomposition but not compatibility decomposition. SARA AM (U+0E33) is perceived by Thai users as a single character; it has only a compatibility decomposition.  Letter spacing in Thai requires decomposing the glyph for SARA AM into two glyphs and putting space between these two glyphs.  The point at which letter space needs to be inserted in Thai thus does not correspond to a boundary between characters of the type that UAX#29 deals with.

The reason I am going on about this is that I believe it shows that the conceptual model that CSS is proposing for letter spacing is fundamentally not the right model, because it cannot accomodate the needs of some scripts.  The current CSS model is you analyze character sequences to determine where to insert letter space.  I suggest that a better way to think about letter-spacing is as a parameter to the shaping process which maps from sequences of characters to sequences of positioned glyphs.

James

On Sep 28, 2013, at 1:23 PM, Koji Ishii <kojiishi@gluesoft.co.jp<mailto:kojiishi@gluesoft.co.jp>> wrote:

On Sep 27, 2013, at 10:55 AM, Andrew Cunningham <lang.support@gmail.com<mailto:lang.support@gmail.com>> wrote:


On 27 September 2013 11:05, James Clark <jjc@jclark.com<mailto:jjc@jclark.com>> wrote:

I tried to explain in my last message why I thought tailoring the definition of extended grapheme cluster was not the right approach, and suggested an alternative approach.  Let me try to put it more succinctly: the clusters of glyphs between which Thai/Lao letter-spacing inserts space are different from the clusters of glyph corresponding to extended grapheme clusters.  The spec for letter-spacing should allow for the fact that for some scripts the typographically correct points to insert letter space do not correspond to boundaries between extended grapheme clusters.

+1

Yeah, I understand what you want to explain. But practically speaking:


 1.  Use A, but implementers can tailor A in any way they need
 2.  Use anything, but use A as baseline

These two look the same thing to me. No?

/koji

Received on Wednesday, 2 October 2013 17:00:24 UTC