Re: [css3-text] letter-spacing and degenerated grapheme clusters (was: [css3-text] tweak the definition of a grapheme cluster a bit for UTF-16)

On 17 Jan 2012, at 11:58, Kang-Hao (Kenny) Lu wrote:

> In this process, there's a relevant issue here I found. Namely, are UAs
> allowed not to count grapheme clusters if the underlying font systems
> drop these?

I don't think so.

> For example, Opera12alpha* does not render replacement
> characters and the following two are equivalent:
> 
> <div style="letter-spacing: 1em;">A&#xfffd;B</div>
> <!-- U+FFFD REPLACEMENT CHARACTER or any character the system don't
> support -->
> <div style="letter-spacing: 1em;">AB</div>

That seems clearly wrong, IMO. The presence of the character should not be ignored, even if it can't be rendered "properly".

> 
> Or, to draw it:
> 
> A--B
> 
> although I kind of expect it be at least
> 
> A--|--B
> 
> long according to spec.

Yes.

> Similar example,
> 
> <div style="letter-spacing: 1em;">AB</div>
> <div style="letter-spacing: 1em;">A&x200b;B</div> <!-- U+200B ZERO WIDTH
> SPACE -->
> 
> IE, FF9 and Opera12alpha shows the above two lines differently (A--B and
> A--|--B) while WebKit gives the same thing. Is WebKit allowed to exhibit
> this behavior because the underlying font system doesn't provide a glyph?

I don't think so; ZERO WIDTH SPACE is a separate grapheme cluster and therefore is affected by letter-spacing.

(By way of contrast, ZERO WIDTH NON-JOINER extends the preceding cluster, and so the two lines

  <div style="letter-spacing: 1em;">AB</div>
  <div style="letter-spacing: 1em;">A&#x200c;B</div>

would be expected to render identically, with no extra space in the second case.)

> 
> If that's the case, I would ask
> 
> 1. The spec to clarify and say if the font system can't provide a glyph
> for a grapheme cluster, the grapheme cluster is allowed to be treated as
> if it's a degenerated one (the term in UAX#29). You might probably want
> clarify letter-spacing for degenerated grapheme clusters (mainly for Cc
> and Cf, since U+200B is Cf) too.

If the font system can't provide a glyph for a grapheme cluster, the UA should render some kind of placeholder (an empty box, a question mark, a black diamond with a white question mark, some hint as to the kind of character or its Unicode value, etc) so that the user at least knows something is present, rather than pretending the character simply isn't there.

> 2. The spec to define the what should happen for
> 
> <a><b><c>TE</c>(no glyph)</b>ST</a> for a { letter-spacing: 0.1em; } b {
> letter-spacing: 0.2em; }
> 
> TE[0.2em]ST or TE[0.1em]ST
> 
> because there are two boundaries.

Perhaps it should result in T[0.2em]E[0.15em]S[0.1em]T, because the right-hand sidebearing of E is increased by half the letter-spacing of element b, while the left-hand sidebearing of S is increased by half the letter-spacing of element a.

> 
> 3. The spec to say, in Appendix G, that default spacing (or at least
> letter-spacing) probably happens after font/glyph selection. I am not an
> implementer so I have no idea if this is making sense or not.

I don't think that's correct. Letter-spacing should apply between grapheme clusters, and grapheme cluster boundaries should depend only on the Unicode characters in the text, not on the particular fonts that happen to be used.

JK

Received on Tuesday, 17 January 2012 12:38:19 UTC