W3C home > Mailing lists > Public > www-style@w3.org > January 2012

Re: [css3-syntax] CSS escape sequences

From: Jonathan Kew <jonathan@jfkew.plus.com>
Date: Thu, 12 Jan 2012 13:17:44 +0000
Message-Id: <7D68184F-986D-49E8-9DC2-CA6E666DC8A8@jfkew.plus.com>
To: www-style <www-style@w3.org>
On 12 Jan 2012, at 10:35, Mathias Bynens wrote:

> http://www.w3.org/TR/css3-syntax/#characters defines CSS escape
> sequences of the form `\000026` or `\26 `, both of which decode to
> `&`.
> 
> WebKit browsers donít support this syntax for characters outside the
> BMP: https://bugs.webkit.org/show_bug.cgi?id=76152 For example,
> `\1d306 ` or `\01d306` are supposed to be escape sequences for the
> ďtetragram for centreĒ symbol (U+1D306), but they donít work in
> WebKit.
> 
> There seems to be another way to escape these characters, namely by
> breaking them up in UTF-16 code units: `\d834\df06 `. All browsers
> except Gecko (https://bugzilla.mozilla.org/show_bug.cgi?id=717529)
> seem to support this, even though this isnít mentioned in the spec.

This looks suspiciously like an (inadvertent?) artifact of the use of UTF-16 as the encoding form for strings within the browser. Suppose a browser happened to use UTF-8 as its internal string format; should it then treat "\f0\9d\8c\86" as meaning U+1D306? (Of course not. But that would be analogous to treating "\d834\df06" that way just because the browser happens to use UTF-16 internally.)

> Should the spec be changed to reflect reality?

CSS backslash-hexadecimal character escapes are supposed to represent ISO 10646 character codes, *NOT* UTF-16 code units.

As such, I think interpreting "\d834\df06" as the character U+1D306 should be considered a bug, and the spec should perhaps be clarified with a note explicitly prohibiting this behavior.

JK
Received on Thursday, 12 January 2012 13:23:22 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:48 GMT