- From: Asmus Freytag <asmusf@ix.netcom.com>
- Date: Thu, 17 Jul 2014 17:14:14 -0700
- To: "Phillips, Addison" <addison@lab126.com>, John Cowan <cowan@mercury.ccil.org>
- CC: fantasai <fantasai.lists@inkedblade.net>, W3C Style <www-style@w3.org>, www International <www-international@w3.org>
- Message-ID: <53C866D6.2080500@ix.netcom.com>
On 7/17/2014 4:27 PM, Phillips, Addison wrote: > > Yep. That's one of the things that CharMod antedates which the new > definition could benefit from (as fantasai's already does) > > (Typed on my Fire HDX) > There is a weakness in the descriptions in that it is tacitly assumed that for each language there is only a single way to partition a sequence of characters into clusters. This is clearly not the case, as the discussion of first element formatting (drop caps) has shown. The clustering there may well be different from that for cursor movement or back-space. Second, I'm working on a project where the task is defined differently from UAX#29. We are not interested in finding all the breaks, but, conversely, we need to find all the clusters and then decide whether they are well-formed, so as to disallow sequences that are structurally nonsense in certain complex scripts. I'm mentioning this as a "by-the-way" so that in your attempt to rationalize the definition of cluster you don't inadvertently make that task any more difficult than it already is by virtue of UAX#29 admitting "ill-formed" clusters as part of the "default". A./ > > > On July 17, 2014, at 4:21PM, John Cowan wrote: > > Phillips, Addison scripsit: > > > What the Unicode Standard actually defines is default grapheme > > clustering. Some languages require tailoring to this default. For > > example, a Slovak user might wish to treat the default pair of grapheme > > clusters "ch" as a single grapheme cluster. > > It may be worth taking into account that current versions of UTR 29 > have split default grapheme clusters into legacy grapheme clusters (for > backward compatibility) and extended grapheme clusters (which incorporate > spacing as well as non-spacing combining marks, and are recommended). > > -- > John Cowan http://www.ccil.org/~cowan <http://www.ccil.org/%7Ecowan> > cowan@ccil.org > If a traveler were informed that such a man [as Lord John Russell] was > leader of the House of Commons, he may well begin to comprehend how the > Egyptians worshiped an insect. --Benjamin Disraeli
Received on Friday, 18 July 2014 00:14:34 UTC