RE: [css-text] I18N-ISSUE-308: Definition of 'grapheme cluster'

I have reviewed the latest text located here:

I generally like the improvements in section 1.3.1 ("Characters and Letters"), although I do note that this is, to a great extent, what CharMod:Fundamentals [1] does. The invention of new definitions of the same terms introduces the opportunity for users to become confused. Before I delve into issue 308 directly, I would tend to suggest that you reference charmod directly as a source for further details on the various ideas of "character": this is what CharMod is for.

===> Regarding the definition of grapheme cluster, I am satisfied by the changes you have made to the description, which are much more complete. I am closing this issue as satisfied.

I should point out that Charmod has a definition of "grapheme cluster" also, that might be suitable as a reference. Our own document, Charmod-Norm, which recently (this week!) had an updated Working Draft published [2], also needs to define grapheme cluster. The better our various definitions coincide the better.

The WD I mention above only has a placeholder, but my editor copy [3] has the following grapheme cluster definition, which is very modestly adapted from charmod's:

A grapheme cluster is a sequence of one or more Unicode characters that form a single user-perceived "character". Grapheme clusters divide the text into units that correspond more closely than character strings to the user's perception of where the character boundaries occur in a visually rendered text. A discussion of grapheme clusters is given at the end of Section 2.10 of the Unicode Standard, [UNICODE]; a formal definition is given in Unicode Standard Annex #29 [UTR29]. What the Unicode Standard actually defines is default grapheme clustering. Some languages require tailoring to this default. For example, a Slovak user might wish to treat the default pair of grapheme clusters "ch" as a single grapheme cluster. Note that the interaction between the language of string content and the end-user's preferences might be complex.

I intend to look carefully at your version when considering further edits to the above. I'm not currently of the opinion that borrowing our text would be helpful to you.





> -----Original Message-----
> From: fantasai []
> Sent: Tuesday, June 24, 2014 1:50 AM
> To: Phillips, Addison; CSS WWW Style (
> Cc: www International
> Subject: Re: [css-text] I18N-ISSUE-308: Definition of 'grapheme cluster'
> On 01/24/2014 10:15 AM, Phillips, Addison wrote:
> > State:
> >      OPEN WG comment
> > Product:
> >      CSS3-text
> > Raised by:
> >      Addison Phillips
> > Opened on:
> >      2013-12-06
> > Description:
> >      1. Section 1.3: The description of "grapheme cluster" feels abbreviated
> >         and terse. Of particular concern to me is this sentence:
> >
> >      --
> >      The UA may further tailor the definition as required by typographical
> tradition.
> >      --
> >
> >      We think this could be clearer, perhaps by saying something similar to:
> >
> >      --
> >      The UA may extend grapheme cluster boundaries as required by the
> typographical
> >      traditions, as identified by the content's language. [See discussion of
> >      "extended graphame cluster" in Section 3 of UAX#29]
> >      --
> The suggested text is not an improvement, it's worse:
>    - Replacement of "tailor" with "extend" is incorrect, since sometimes
>      (as in Thai) they are decomposed.
>    - Tailorings do not always depend on the content language. They may
>      depend on one or more of the following:
>        - script
>        - content language
>        - font style
>      and possibly
>        - typesetting preferences, in cases where multiple options are
>          considered valid and reasonable
> Rejecting this comment as no change, since I think the dictionary definition of
> "typographic tradition" is sufficiently precise.
> Note that exact tailorings are out-of-scope for the CSS spec. If a spec is needed,
> it should be requested as an expansion of UAX29.
> ~fantasai

Received on Thursday, 17 July 2014 22:45:56 UTC