W3C home > Mailing lists > Public > www-style@w3.org > October 2013

Re: [css-text] Clusters for letter spacing in Thai and other complex scripts

From: James Clark <jjc@jclark.com>
Date: Wed, 2 Oct 2013 12:32:28 +0900
Message-ID: <2629717672530955735@unknownmsgid>
To: Koji Ishii <kojiishi@gluesoft.co.jp>
Cc: "www-style@w3.org" <www-style@w3.org>
I really don't think this works for Thai/Lao (I can't speak for Burmese).

The fundamental problem is that UAX#29, which defines grapheme clusters, is
about _text_ segmentation.  The boundaries that UAX#29 deals with are
boundaries between characters.  It is defined in terms of Normalization
Form D, which means you do canonical decomposition but not compatibility
decomposition. SARA AM (U+0E33) is perceived by Thai users as a single
character; it has only a compatibility decomposition.  Letter spacing in
Thai requires decomposing the glyph for SARA AM into two glyphs and putting
space between these two glyphs.  The point at which letter space needs to
be inserted in Thai thus does not correspond to a boundary between
characters of the type that UAX#29 deals with.

The reason I am going on about this is that I believe it shows that the
conceptual model that CSS is proposing for letter spacing is fundamentally
not the right model, because it cannot accomodate the needs of some
scripts.  The current CSS model is you analyze character sequences to
determine where to insert letter space.  I suggest that a better way to
think about letter-spacing is as a parameter to the shaping process which
maps from sequences of characters to sequences of positioned glyphs.

James

On Sep 28, 2013, at 1:23 PM, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:


On Sep 27, 2013, at 10:55 AM, Andrew Cunningham <lang.support@gmail.com>
wrote:

On 27 September 2013 11:05, James Clark <jjc@jclark.com> wrote:


> I tried to explain in my last message why I thought tailoring the
> definition of extended grapheme cluster was not the right approach, and
> suggested an alternative approach.  Let me try to put it more succinctly:
> the clusters of glyphs between which Thai/Lao letter-spacing inserts space
> are different from the clusters of glyph corresponding to extended grapheme
> clusters.  The spec for letter-spacing should allow for the fact that for
> some scripts the typographically correct points to insert letter space do
> not correspond to boundaries between extended grapheme clusters.
>

+1


Yeah, I understand what you want to explain. But practically speaking:


   1. Use A, but implementers can tailor A in any way they need
   2. Use anything, but use A as baseline


These two look the same thing to me. No?

/koji
Received on Wednesday, 2 October 2013 03:32:57 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:51:02 UTC