W3C home > Mailing lists > Public > www-international@w3.org > January to March 2014

Re: [css-text] I18N-ISSUE-313: Definition of grapheme clusters

From: John Cowan <cowan@mercury.ccil.org>
Date: Fri, 24 Jan 2014 16:46:57 -0500
To: "Phillips, Addison" <addison@lab126.com>
Cc: "CSS WWW Style (www-style@w3.org)" <www-style@w3.org>, www International <www-international@w3.org>
Message-ID: <20140124214657.GB19502@mercury.ccil.org>
Phillips, Addison scripsit:

>     "A grapheme cluster is what a language user considers to be a
>     character or a basic unit of the script."
>     "The UA may further tailor the definition as required by
>     typographical tradition."
>     Example 1
> 
>     I think a grapheme cluster should be defined in the CSS spec as
>     follows: A grapheme cluster is a sequence of characters as defined
>     by the Unicode specification that should be treated as a unit
>     for typographic processing. This generally approximates to what a
>     language user considers to be a letter or basic unit of the script.
> 
>     I don't think applications should redefine what a grapheme cluster
>     is; that definition is established by the Unicode standard. Rather,
>     we should say that applications sometimes require additional
>     rules beyond the use of 'grapheme clusters' in order to handle
>     the typographic traditions of particular scripts.

The definition of "grapheme cluster" in the Unicode Glossary defers to
UAX 29, but the current revision (23) of that UAX doesn't actually have
a formal definition of "grapheme cluster", except as a cover term for
default grapheme clusters, extended grapheme clusters, and tailored
grapheme clusters, which *are* defined.

It does, however, introduce the informal term "user-perceived character",
and says that grapheme clusters (by implication, of one of the above
varieties) are an approximation to user-perceived characters.

This seems to me like good terminology to follow.

-- 
I could dance with you till the cows            John Cowan
come home.  On second thought, I'd              http://www.ccil.org/~cowan
rather dance with the cows when you             cowan@ccil.org
come home.  --Rufus T. Firefly
Received on Friday, 24 January 2014 21:47:23 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:36 UTC