- From: Jonathan Kew <jonathan@jfkew.plus.com>
- Date: Thu, 12 Jan 2012 13:17:44 +0000
- To: www-style <www-style@w3.org>
On 12 Jan 2012, at 10:35, Mathias Bynens wrote: > http://www.w3.org/TR/css3-syntax/#characters defines CSS escape > sequences of the form `\000026` or `\26 `, both of which decode to > `&`. > > WebKit browsers don’t support this syntax for characters outside the > BMP: https://bugs.webkit.org/show_bug.cgi?id=76152 For example, > `\1d306 ` or `\01d306` are supposed to be escape sequences for the > “tetragram for centre” symbol (U+1D306), but they don’t work in > WebKit. > > There seems to be another way to escape these characters, namely by > breaking them up in UTF-16 code units: `\d834\df06 `. All browsers > except Gecko (https://bugzilla.mozilla.org/show_bug.cgi?id=717529) > seem to support this, even though this isn’t mentioned in the spec. This looks suspiciously like an (inadvertent?) artifact of the use of UTF-16 as the encoding form for strings within the browser. Suppose a browser happened to use UTF-8 as its internal string format; should it then treat "\f0\9d\8c\86" as meaning U+1D306? (Of course not. But that would be analogous to treating "\d834\df06" that way just because the browser happens to use UTF-16 internally.) > Should the spec be changed to reflect reality? CSS backslash-hexadecimal character escapes are supposed to represent ISO 10646 character codes, *NOT* UTF-16 code units. As such, I think interpreting "\d834\df06" as the character U+1D306 should be considered a bug, and the spec should perhaps be clarified with a note explicitly prohibiting this behavior. JK
Received on Thursday, 12 January 2012 13:23:22 UTC