W3C home > Mailing lists > Public > www-style@w3.org > July 2014

RE: [css-text] I18N-ISSUE-308: Definition of 'grapheme cluster'

From: Phillips, Addison <addison@lab126.com>
Date: Thu, 17 Jul 2014 22:45:07 +0000
To: fantasai <fantasai.lists@inkedblade.net>, "CSS WWW Style (www-style@w3.org)" <www-style@w3.org>
CC: www International <www-international@w3.org>
Message-ID: <7C0AF84C6D560544A17DDDEB68A9DFB52C21B776@ex10-mbx-36009.ant.amazon.com>
I have reviewed the latest text located here:

http://dev.w3.org/csswg/css-text-3/#characters


I generally like the improvements in section 1.3.1 ("Characters and Letters"), although I do note that this is, to a great extent, what CharMod:Fundamentals [1] does. The invention of new definitions of the same terms introduces the opportunity for users to become confused. Before I delve into issue 308 directly, I would tend to suggest that you reference charmod directly as a source for further details on the various ideas of "character": this is what CharMod is for.

===> Regarding the definition of grapheme cluster, I am satisfied by the changes you have made to the description, which are much more complete. I am closing this issue as satisfied.

I should point out that Charmod has a definition of "grapheme cluster" also, that might be suitable as a reference. Our own document, Charmod-Norm, which recently (this week!) had an updated Working Draft published [2], also needs to define grapheme cluster. The better our various definitions coincide the better.

The WD I mention above only has a placeholder, but my editor copy [3] has the following grapheme cluster definition, which is very modestly adapted from charmod's:

--
A grapheme cluster is a sequence of one or more Unicode characters that form a single user-perceived "character". Grapheme clusters divide the text into units that correspond more closely than character strings to the user's perception of where the character boundaries occur in a visually rendered text. A discussion of grapheme clusters is given at the end of Section 2.10 of the Unicode Standard, [UNICODE]; a formal definition is given in Unicode Standard Annex #29 [UTR29]. What the Unicode Standard actually defines is default grapheme clustering. Some languages require tailoring to this default. For example, a Slovak user might wish to treat the default pair of grapheme clusters "ch" as a single grapheme cluster. Note that the interaction between the language of string content and the end-user's preferences might be complex.
--

I intend to look carefully at your version when considering further edits to the above. I'm not currently of the opinion that borrowing our text would be helpful to you.

Regards,

Addison

[1] http://www.w3.org/TR/charmod

[2] http://www.w3.org/TR/charmod-norm 
[3] http://inter-locale.com/w3c/Overview(27).html 


> -----Original Message-----
> From: fantasai [mailto:fantasai.lists@inkedblade.net]
> Sent: Tuesday, June 24, 2014 1:50 AM
> To: Phillips, Addison; CSS WWW Style (www-style@w3.org)
> Cc: www International
> Subject: Re: [css-text] I18N-ISSUE-308: Definition of 'grapheme cluster'
> 
> On 01/24/2014 10:15 AM, Phillips, Addison wrote:
> > State:
> >      OPEN WG comment
> > Product:
> >      CSS3-text
> > Raised by:
> >      Addison Phillips
> > Opened on:
> >      2013-12-06
> > Description:
> >      1. Section 1.3: The description of "grapheme cluster" feels abbreviated
> >         and terse. Of particular concern to me is this sentence:
> >
> >      --
> >      The UA may further tailor the definition as required by typographical
> tradition.
> >      --
> >
> >      We think this could be clearer, perhaps by saying something similar to:
> >
> >      --
> >      The UA may extend grapheme cluster boundaries as required by the
> typographical
> >      traditions, as identified by the content's language. [See discussion of
> >      "extended graphame cluster" in Section 3 of UAX#29]
> >      --
> 
> The suggested text is not an improvement, it's worse:
> 
>    - Replacement of "tailor" with "extend" is incorrect, since sometimes
>      (as in Thai) they are decomposed.
> 
>    - Tailorings do not always depend on the content language. They may
>      depend on one or more of the following:
>        - script
>        - content language
>        - font style
>      and possibly
>        - typesetting preferences, in cases where multiple options are
>          considered valid and reasonable
> 
> Rejecting this comment as no change, since I think the dictionary definition of
> "typographic tradition" is sufficiently precise.
> 
> Note that exact tailorings are out-of-scope for the CSS spec. If a spec is needed,
> it should be requested as an expansion of UAX29.
> 
> ~fantasai

Received on Thursday, 17 July 2014 22:45:53 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:39:23 UTC