Re: [css-text] I18N-ISSUE-313: Definition of grapheme clusters

Phillips, Addison scripsit:

>     "A grapheme cluster is what a language user considers to be a
>     character or a basic unit of the script."
>     "The UA may further tailor the definition as required by
>     typographical tradition."
>     Example 1
>     I think a grapheme cluster should be defined in the CSS spec as
>     follows: A grapheme cluster is a sequence of characters as defined
>     by the Unicode specification that should be treated as a unit
>     for typographic processing. This generally approximates to what a
>     language user considers to be a letter or basic unit of the script.
>     I don't think applications should redefine what a grapheme cluster
>     is; that definition is established by the Unicode standard. Rather,
>     we should say that applications sometimes require additional
>     rules beyond the use of 'grapheme clusters' in order to handle
>     the typographic traditions of particular scripts.

The definition of "grapheme cluster" in the Unicode Glossary defers to
UAX 29, but the current revision (23) of that UAX doesn't actually have
a formal definition of "grapheme cluster", except as a cover term for
default grapheme clusters, extended grapheme clusters, and tailored
grapheme clusters, which *are* defined.

It does, however, introduce the informal term "user-perceived character",
and says that grapheme clusters (by implication, of one of the above
varieties) are an approximation to user-perceived characters.

This seems to me like good terminology to follow.

Received on Friday, 24 January 2014 21:47:25 UTC