W3C home > Mailing lists > Public > www-international@w3.org > April to June 2014

RE: [css-text] I18N-ISSUE-308: Definition of 'grapheme cluster'

From: Phillips, Addison <addison@lab126.com>
Date: Sun, 20 Apr 2014 21:41:18 +0000
To: Koji Ishii <kojiishi@gluesoft.co.jp>
CC: "CSS WWW Style (www-style@w3.org)" <www-style@w3.org>, www International <www-international@w3.org>
Message-ID: <7C0AF84C6D560544A17DDDEB68A9DFB517E350DA@ex10-mbx-36009.ant.amazon.com>
> Referring to UAX#29 here is a good idea, but could you confirm your intention
> of the suggested change?

The concern here was that the statement as written is exceedingly vague. There are many "typographic traditions" as there are many languages and scripts. Some guidance on what to do seemed warranted.

> * “further tailor” to “extend grapheme cluster boundaries” looks like you’re
> suggesting to prohibit shrinking grapheme cluster boundaries, but I suppose it’s
> not your intention, is it? Isn’t “tailor” more appropriate word to use here, in
> terms of giving more flexibilities to implementers, and it’s the word widely
> used in UAX#29?

In the main, we do mean "extend", since that what usually needs to happen. I can't, off hand, think of a case where the cluster is reduced in size, but that doesn't mean there isn't one. Tailor, as a result, is probably the better word choice.

> * Is your intention of adding “as identified by the content’s language” to
> prohibit tailoring unless content language is specified? My thought was that it’s
> better not to have such restrictions from I18N perspective. Do I misunderstand
> your suggestion?

Different languages and cultures have different "typographic traditions". So there needed to be some kind of indication in the text about what a "typographic tradition" is and how to apply it. 

Since these traditions are linked to different languages or cultures (and are neither wholly generalized nor can they be inferred solely from the script/codepoints in the text), the user-agent needs to infer them from available data in the page/text, probably from language tags (if any exist). In the absence of language tags, it is still possible to apply language-specific tailoring (by guessing the language or assuming some default).

The goal was not to prohibit or restrict grapheme boundary tailoring, but to provide some way for implementations to connect code to content. Otherwise I read this sentence as saying, basically, "The UA can split the text wherever it feels it is convenient to do so and no guarantee of interoperability of selection is provided."

In the ideal case, UAX#29 would supply a complete description of grapheme boundary selection, including tailorings (perhaps via CLDR) and we would just point there. In the absence of that, it makes sense to me to try to enforce a certain level of interoperability, while permitting the development of better text segmentation, particularly in some of the Indic scripts that are known to have unaddressed corner cases.

Addison

> -----Original Message-----
> From: Koji Ishii [mailto:kojiishi@gluesoft.co.jp]
> Sent: Sunday, April 20, 2014 8:43 AM
> To: Phillips, Addison
> Cc: CSS WWW Style (www-style@w3.org); www International
> Subject: Re: [css-text] I18N-ISSUE-308: Definition of 'grapheme cluster'
> 
> Thank you for the feedback and the suggestion.
> 
> Referring to UAX#29 here is a good idea, but could you confirm your intention
> of the suggested change?
> 
> * “further tailor” to “extend grapheme cluster boundaries” looks like you’re
> suggesting to prohibit shrinking grapheme cluster boundaries, but I suppose it’s
> not your intention, is it? Isn’t “tailor” more appropriate word to use here, in
> terms of giving more flexibilities to implementers, and it’s the word widely
> used in UAX#29?
> * Is your intention of adding “as identified by the content’s language” to
> prohibit tailoring unless content language is specified? My thought was that it’s
> better not to have such restrictions from I18N perspective. Do I misunderstand
> your suggestion?
> 
> /koji
> 
> 
> On Jan 25, 2014, at 3:15 AM, Phillips, Addison <addison@lab126.com> wrote:
> 
> > State:
> >    OPEN WG comment
> > Product:
> >    CSS3-text
> > Raised by:
> >    Addison Phillips
> > Opened on:
> >    2013-12-06
> > Description:
> >    1. Section 1.3: The description of "grapheme cluster" feels abbreviated and
> terse. Of particular concern to me is this sentence:
> >
> >    --
> >    The UA may further tailor the definition as required by typographical
> tradition.
> >    --
> >
> >    We think this could be clearer, perhaps by saying something similar to:
> >
> >    --
> >    The UA may extend grapheme cluster boundaries as required by the
> typographical traditions, as identified by the content's language. [See discussion
> of "extended graphame cluster" in Section 3 of UAX#29]
> >    --

Received on Sunday, 20 April 2014 21:41:58 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 20 April 2014 21:41:59 UTC