W3C home > Mailing lists > Public > www-style@w3.org > January 2012

Re: [css3-text] grapheme clusters across element boundary

From: Peter Moulder <peter.moulder@monash.edu>
Date: Tue, 17 Jan 2012 11:43:34 +1100
To: www-style@w3.org
Message-id: <20120117004334.GA19918@bowman.infotech.monash.edu.au>
On Mon, Jan 16, 2012 at 03:15:04PM -0800, fantasai wrote:

> Added:
>   | The rendering characteristics of a <i>character</i> divided by an
>   | element boundary is undefined: it may be rendered as belonging to
>   | either side of the boundary, or as some approximation of belonging
>   | to both. Authors should avoid dividing grapheme clusters by element
>   | boundaries.
> 
> How's that?

I hope I don't distract from the question of what behaviour to prescribe,
but I don't know whether or not a better time will arise to remind that
we'd need to clarify the word "character" in the above text: it isn't
clear it's intended to include u + combining umlaut (sense (2) of
‘character’ in http://unicode.org/glossary/), or whether the rule only
applies to separating the code units that encode a single code point
(sense (3), the case being discussed more recently in this thread).

(I realize that there may well have been intended to be an example or two
to clarify this, and it's understandable not to compose examples when the
actual behaviour hasn't been decided on.)

I think the normative text might as well use more specific unicode
terminology even if examples help to convey an intuitive understanding.
(So presumably mention either "grapheme cluster" or "code point".  It
might also help to use a phrase involving "code units" such as the phrase
used above if that helps clarify what it means for an element boundary to
"divide" that character.)


Again, don't let me distract from the discussion of what actual behaviour
to require.

pjrm.
Received on Tuesday, 17 January 2012 00:44:00 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:48 GMT