Re: [css3-syntax] CSS escape sequences

On 12 Jan 2012, at 10:35, Mathias Bynens wrote:

> http://www.w3.org/TR/css3-syntax/#characters defines CSS escape
> sequences of the form `\000026` or `\26 `, both of which decode to
> `&`.
> 
> WebKit browsers don’t support this syntax for characters outside the
> BMP: https://bugs.webkit.org/show_bug.cgi?id=76152 For example,
> `\1d306 ` or `\01d306` are supposed to be escape sequences for the
> “tetragram for centre” symbol (U+1D306), but they don’t work in
> WebKit.
> 
> There seems to be another way to escape these characters, namely by
> breaking them up in UTF-16 code units: `\d834\df06 `. All browsers
> except Gecko (https://bugzilla.mozilla.org/show_bug.cgi?id=717529)
> seem to support this, even though this isn’t mentioned in the spec.

This looks suspiciously like an (inadvertent?) artifact of the use of UTF-16 as the encoding form for strings within the browser. Suppose a browser happened to use UTF-8 as its internal string format; should it then treat "\f0\9d\8c\86" as meaning U+1D306? (Of course not. But that would be analogous to treating "\d834\df06" that way just because the browser happens to use UTF-16 internally.)

> Should the spec be changed to reflect reality?

CSS backslash-hexadecimal character escapes are supposed to represent ISO 10646 character codes, *NOT* UTF-16 code units.

As such, I think interpreting "\d834\df06" as the character U+1D306 should be considered a bug, and the spec should perhaps be clarified with a note explicitly prohibiting this behavior.

JK

Received on Thursday, 12 January 2012 13:23:22 UTC