W3C home > Mailing lists > Public > www-international@w3.org > January to March 2010

Re: For review: Character encodings in HTML and CSS

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 11 Feb 2010 16:36:20 +0100
To: Richard Ishida <ishida@w3.org>
Cc: www-international@w3.org
Message-ID: <20100211163620713994.84b5cc03@xn--mlform-iua.no>
Upon rereading what CSS21 says about escaping, I think the entire 
paragraph on CSS escaping takes some simplifications that perhaps are 
not so simplifying:

  ]] CSS. The escape mechanism for representing characters in CSS is a 
backslash followed by a hexadecimal number representing the Unicode 
code point value. Note that these escapes are terminated by a space, 
rather than a semi-colon. [[

Firstly, using a single (white)space character as a termination 
character is something that I think many find confusing in itself. And 
I think that this is indirectly confirmed by the Charmod document 
<http://www.w3.org/TR/2005/REC-charmod-20050215/>, which says:

  ]] C044  [S]  Escape syntax should require either explicit end 
delimiters or a fixed number of characters in each character escape. 
Escape syntaxes where the end is determined by any character outside 
the set of characters admissible in the character escape itself should 
be avoided.
These character escapes are not clear visually, and can cause an editor 
to insert spurious line-breaks when word-wrapping on spaces. Forms like 
SPREAD's &UABCD; [SPREAD] or XML's &#xhhhh;, where the character escape 
is explicitly terminated by a semicolon, are much better. [[

The Charmod document doesn't discuss (white)space as termination 
character, but it seems evident that (white)space could be unclear 
visually - it is difficult to separate a termination space from a 
"normal" space, something which CSS21 notes when it specifies how 
Unicode escapes may be terminated:

]]
  If a character in the range [0-9a-fA-F] follows the hexadecimal
  number, the end of the number needs to be made clear. There are 
  two ways to do that:
  1. with a space (or other white space character): "\26 B"
     ("&B"). In this case, user agents should treat a "CR/LF" pair
     (U+000D/U+000A) as a single white space character.
  2. by providing exactly 6 hexadecimal digits: "\000026B" ("&B")
     In fact, these two methods may be combined. Only one white 
     space character is ignored after a hexadecimal escape. Note
     that this means that a "real" space after the escape sequence
     must itself either be escaped or doubled. 
[[

Rather than trying to make CSS escape termination analogous with HTML 
NCRs by recommending to use a termination character, it seems simpler 
to me to recommend authors to provide exactly 6 hexadecimal digits, as 
then one do not need to use the confusing whitespace terminator.
-- 
leif halvard silli
Received on Thursday, 11 February 2010 15:36:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 11 February 2010 15:36:57 GMT