W3C home > Mailing lists > Public > www-style@w3.org > September 2013

Re: [css-text] Clusters for letter spacing in Thai and other complex scripts

From: Andrew Cunningham <lang.support@gmail.com>
Date: Tue, 24 Sep 2013 23:08:21 -0700
Message-ID: <CAGJ7U-Xt2Shz=2m9ub4_iL_FyUODay-M-GsPzphrKKgdo_zrkQ@mail.gmail.com>
Cc: "www-style@w3.org" <www-style@w3.org>
In Cham, Leke, Myanmar and other scripts, possible break points in not at
owrd boundaries would occur at syllable boundaries, and considering the
occurrence of possible existence of final consonants for some languages in
those scripts, just operating on the basis of a grapheme cluster can be
problematic, since a syllbale may not be limited to a single grapheme
cluster.


On 24 September 2013 19:51, James Clark <jjc@jclark.com> wrote:

> I don't think this is the right way to fix the problem.  The UAX29
> definition of extended grapheme cluster works just fine in Thai/Lao
> when what you want is a grapheme cluster.  The issue is that letter
> spacing is not a logical operation on characters; it's a visual
> operation on glyphs. In other words, there are two distinct kinds of
> cluster:
>
> a) there is a _logical_ cluster of _characters_, which is used for
> selection, cursor movement and other editing operations; this is the
> UAX29 extended grapheme clusrer
>
> b) there is also a _visual_ cluster of _glyphs_, which is what you
> need for letter-spacing
>
> In many scripts, these cooincide, but Thai/Lao shows that they don't
> always do so.
>
> So I think the right approach is to fix the definition of
> letter-spacing to say that the units between which you add extra space
> will not always correspond exactly to extended grapheme clusters:
> implementations should do the typographically correct thing for a
> particular script.
>
> James
>
> > On Sep 24, 2013, at 8:07 AM, fantasai <fantasai.lists@inkedblade.net>
> wrote:
> >
> >> On 09/21/2013 10:37 PM, James Clark wrote:
> >> The Editor's Draft for CSS Text says that letter-spacing is applied
> between adjacent "characters", where a "character" is
> >> defined as a UAX29 extended grapheme cluster. This doesn't do the right
> thing in Thai in the case of SARA AM (U+0E33).
> >>
> >> For example, to properly letter-space the word คำ (0E04 + 0E33), [...]
> >
> > The spec does explicitly allow for tailoring; UAX29 is just a
> > baseline requirement. However I'm not sure that the tailoring
> > you're describing is quite in line with the kind that's allowed
> > by UAX29, so I've broadened the wording a bit and added your
> > example here:
> >
> >  http://dev.w3.org/csswg/css-text/#grapheme-cluster
> >
> > If you have a spec that I can point to normatively, I'm happy
> > to do that. :) But otherwise, I think the example will have
> > to suffice.
> >
> > Please let me know
> >  1. If this is satisfactory, or you want something else.
> >  2. If it's ok for us to use your wording verbatim.
> >
> > Thanks!
> >
> > ~fantasai
>
>


-- 
Andrew Cunningham
Project Manager, Research and Development
(Social and Digital Inclusion)
Public Libraries and Community Engagement
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Australia

Ph: +61-3-8664-7430
Mobile: 0459 806 589
Email: acunningham@slv.vic.gov.au
          lang.support@gmail.com

http://www.openroad.net.au/
http://www.mylanguage.gov.au/
http://www.slv.vic.gov.au/
Received on Wednesday, 25 September 2013 06:08:49 UTC

This archive was generated by hypermail 2.4.0 : Friday, 25 March 2022 10:08:34 UTC