W3C home > Mailing lists > Public > www-style@w3.org > January 2012

Re: [css3-syntax] CSS escape sequences

From: Simon Sapin <simon.sapin@kozea.fr>
Date: Thu, 12 Jan 2012 14:16:21 +0100
Message-ID: <4F0EDD25.6010900@kozea.fr>
To: www-style@w3.org
Le 12/01/2012 11:35, Mathias Bynens a écrit :
> There seems to be another way to escape these characters, namely by
> breaking them up in UTF-16 code units: `\d834\df06 `. All browsers
> except Gecko (https://bugzilla.mozilla.org/show_bug.cgi?id=717529)
> seem to support this, even though this isn’t mentioned in the spec.


Isn’t this an accident due to using UCS-2 internally (fixed 16 bits 
encoding) and pretend it is UTF-16? (Or the reverse...)

The CSS syntax is defined in terms of Unicode/ISO 10646 code points. 
UTF-16 surrogate pairs like 0xd834-0xdf06 only exist when serializing 
code points to UTF-16 bytes.

For example, the fact that len(u'\U0001d306') is 2 on some builds of 
Python is a bug in Python (it should be 1), not a reality of how Unicode 
works. (I use Python syntax for the example, but the same bug exist in 
many other platforms.)

Simon Sapin
Received on Thursday, 12 January 2012 13:22:42 UTC

This archive was generated by hypermail 2.4.0 : Friday, 25 March 2022 10:08:09 UTC