W3C home > Mailing lists > Public > www-style@w3.org > January 2012

Re: [css3-text] tweak the definition of a grapheme cluster a bit for UTF-16

From: fantasai <fantasai.lists@inkedblade.net>
Date: Mon, 16 Jan 2012 18:45:36 -0800
Message-ID: <4F14E0D0.5050507@inkedblade.net>
To: www-style@w3.org
On 01/16/2012 04:49 PM, Kang-Hao (Kenny) Lu wrote:
> 1. UAs using UTF-16 as internal storage treat a non-BMP character as two
> grapheme clusters. I am aware that this is unlikely to happen so I'll
> stop talking about this possibility.

This is a non-issue, afaict. When Unicode defines a grapheme cluster, it
doesn't do it differently based on how what encoding is being used.

> 2. UAs render content with isolated surrogate differently. This already
> happened[1]. If you find other ways to address this problem (by either
> marking it as undefined or forbid certain behavior) then I think I am
> satisfied. That is, I don't want WebKit's behavior to fall into the "UA
> may further tailor the definition (grapheme cluster) as allowed by
> Unicode." allowance. UAs should not be allowed count a element starting
> with an isolated surrogate as having zero grapheme clusters so to speak.

CSS doesn't define how text maps to glyphs in the font, only which font
and font features you use to do it. So I would think that mapping is up
to the Unicode and font specs, not to up to CSS. I do expect that would
make WebKit's rendering wrong.

>> We assume Unicode in CSS
> First of all, can you point to me which spec has a statement like this?
> I couldn't find such a statement in either CSS2.1 or CSS3 Text.

AFAIK it's an assumption that is not stated. :) It mainly affects the
interpretation of U+XXXX notation, and anywhere we refer to the Unicode
properties of the text.

> If CSS were in pure Unicode, then this suggested that the document tree,
> the terminology used in CSS2.1, is in pure Unicode, then we wouldn't
> have been presented questions like what should UA do if a non-BMP
> character crosses the element boundary, as being discussed by Boris and
> Glenn. HTML is unlikely to be the layer to address this problem too (how
> would HTML+DOM gives CSS a document tree in pure Unicode?)

Seems to me that would fall under the "grapheme cluster split by an
element boundary" case, no?

Received on Tuesday, 17 January 2012 02:46:22 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:54 UTC