W3C home > Mailing lists > Public > www-style@w3.org > July 2014

Re: [css-text] I18N-ISSUE-308: Definition of 'grapheme cluster'

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Thu, 17 Jul 2014 17:14:14 -0700
Message-ID: <53C866D6.2080500@ix.netcom.com>
To: "Phillips, Addison" <addison@lab126.com>, John Cowan <cowan@mercury.ccil.org>
CC: fantasai <fantasai.lists@inkedblade.net>, W3C Style <www-style@w3.org>, www International <www-international@w3.org>
On 7/17/2014 4:27 PM, Phillips, Addison wrote:
>
> Yep. That's one of the things that CharMod antedates which the new 
> definition could benefit from (as fantasai's already does)
>
> (Typed on my Fire HDX)
>

There is a weakness in the descriptions in that it is tacitly assumed 
that for each language there is only a single way to partition a 
sequence of characters into clusters. This is clearly not the case, as 
the discussion of first element formatting (drop caps) has shown. The 
clustering there may well be different from that for cursor movement or 
back-space.

Second, I'm working on a project where the task is defined differently 
from UAX#29. We are not interested in finding all the breaks, but, 
conversely, we need to find all the clusters and then decide whether 
they are well-formed, so as to disallow sequences that are structurally 
nonsense in certain complex scripts. I'm mentioning this as a 
"by-the-way" so that in your attempt to rationalize the definition of 
cluster you don't inadvertently make that task any more difficult than 
it already is by virtue of UAX#29 admitting "ill-formed" clusters as 
part of the "default".

A./
>
>
> On July 17, 2014, at 4:21PM, John Cowan wrote:
>
> Phillips, Addison scripsit:
>
> > What the Unicode Standard actually defines is default grapheme
> > clustering. Some languages require tailoring to this default. For
> > example, a Slovak user might wish to treat the default pair of grapheme
> > clusters "ch" as a single grapheme cluster.
>
> It may be worth taking into account that current versions of UTR 29
> have split default grapheme clusters into legacy grapheme clusters (for
> backward compatibility) and extended grapheme clusters (which incorporate
> spacing as well as non-spacing combining marks, and are recommended).
>
> -- 
> John Cowan http://www.ccil.org/~cowan <http://www.ccil.org/%7Ecowan> 
> cowan@ccil.org
> If a traveler were informed that such a man [as Lord John Russell] was
> leader of the House of Commons, he may well begin to comprehend how the
> Egyptians worshiped an insect.  --Benjamin Disraeli
Received on Friday, 18 July 2014 00:14:35 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:39:23 UTC