- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Sun, 12 Mar 2006 15:22:35 +1100
Hi, In section 8.2.1 Tokenising Entities, for a numeric character reference, it states: | If one or more characters match the range, then take them all and | interpret the string of characters as a number (either hexadecimal | or decimal as appropriate), and return a character token for the | Unicode character whose codepoint is that number. If the number is | not a valid Unicode character (e.g. if the number is higher than | 1114111), or if the number is zero, then return a character token for | the U+FFFD REPLACEMENT CHARACTER character instead. This does not cover the characters in the range from #x80 to #x9F, which have historically been treated as code points from the Windows-1252 repertoire, rather than the control characters from Unicode. AFAIK, this is already interoperably implemented in all browsers. Characters in the range from #x01 to #x19 (except for whitespace characters) are not treated interoperably across platforms. On Windows, Firefox, IE and Opera all displayed characters from some repertoire I couldn't identify. But on Mac: all the browsers displayed either nothing or a box (a place holder character). I think these should all return U+FFFD. The use of characters in either of these ranges should be an easy parse error. -- Lachlan Hunt http://lachy.id.au/
Received on Saturday, 11 March 2006 20:22:35 UTC