W3C home > Mailing lists > Public > www-style@w3.org > January 2012

[css3-text] letter-spacing and degenerated grapheme clusters (was: [css3-text] tweak the definition of a grapheme cluster a bit for UTF-16)

From: Kang-Hao (Kenny) Lu <kennyluck@csail.mit.edu>
Date: Tue, 17 Jan 2012 19:58:22 +0800
Message-ID: <4F15625E.5080204@csail.mit.edu>
To: fantasai <fantasai.lists@inkedblade.net>
CC: WWW Style <www-style@w3.org>
(12/01/17 10:45), fantasai wrote:
> This is a non-issue, afaict. When Unicode defines a grapheme cluster, it
> doesn't do it differently based on how what encoding is being used.

Fine. See below however.

>> 2. UAs render content with isolated surrogate differently. This already
>> happened[1]. If you find other ways to address this problem (by either
>> marking it as undefined or forbid certain behavior) then I think I am
>> satisfied. That is, I don't want WebKit's behavior to fall into the "UA
>> may further tailor the definition (grapheme cluster) as allowed by
>> Unicode." allowance. UAs should not be allowed count a element starting
>> with an isolated surrogate as having zero grapheme clusters so to speak.
> CSS doesn't define how text maps to glyphs in the font, only which font
> and font features you use to do it. So I would think that mapping is up
> to the Unicode and font specs, not to up to CSS. I do expect that would
> make WebKit's rendering wrong.

Well, I was trying to "prove" that WebKit is wrong here with the
machinery in this spec, the following logic flow and the "\udf06Test"

* Establish the fact that this text run has at least four grapheme clusters.
* Write a test that uses "letter-spacing: 1em;" and check if this text
run is at least 3em long.

Now that WebKit fails this, I raised this issue in hope that we can make
the definition of a grapheme cluster as precise as possible so that the
first statement holds, even if the example contains malformed characters.

In this process, there's a relevant issue here I found. Namely, are UAs
allowed not to count grapheme clusters if the underlying font systems
drop these? For example, Opera12alpha* does not render replacement
characters and the following two are equivalent:

<div style="letter-spacing: 1em;">A&#xfffd;B</div>
<!-- U+FFFD REPLACEMENT CHARACTER or any character the system don't
support -->
<div style="letter-spacing: 1em;">AB</div>

Or, to draw it:


although I kind of expect it be at least


long according to spec. Similar example,

<div style="letter-spacing: 1em;">AB</div>
<div style="letter-spacing: 1em;">A&x200b;B</div> <!-- U+200B ZERO WIDTH

IE, FF9 and Opera12alpha shows the above two lines differently (A--B and
A--|--B) while WebKit gives the same thing. Is WebKit allowed to exhibit
this behavior because the underlying font system doesn't provide a glyph?

If that's the case, I would ask

1. The spec to clarify and say if the font system can't provide a glyph
for a grapheme cluster, the grapheme cluster is allowed to be treated as
if it's a degenerated one (the term in UAX#29). You might probably want
clarify letter-spacing for degenerated grapheme clusters (mainly for Cc
and Cf, since U+200B is Cf) too.

2. The spec to define the what should happen for

<a><b><c>TE</c>(no glyph)</b>ST</a> for a { letter-spacing: 0.1em; } b {
letter-spacing: 0.2em; }

TE[0.2em]ST or TE[0.1em]ST

because there are two boundaries.

3. The spec to say, in Appendix G, that default spacing (or at least
letter-spacing) probably happens after font/glyph selection. I am not an
implementer so I have no idea if this is making sense or not.

And then sadly my proof certainly won't work, and I'll skip bombing the
list with other insane cases like what should happen if there is a
non-BMP character crossing a element boundary when letter-spacing is
specified on the root element.

Realistically speaking, I should just file a bug to WebKit's bugzilla if
I really care about this...

*Opera12alpha seems quite unstable with regard to font handling.

Received on Tuesday, 17 January 2012 11:58:52 UTC

This archive was generated by hypermail 2.4.0 : Friday, 25 March 2022 10:08:09 UTC