- From: <smontagu@smontagu.org>
- Date: Tue, 15 Nov 2005 11:12:53 -0800 (PST)
- To: "Ian Hickson" <ian@hixie.ch>
- Cc: "Boris Zbarsky" <bzbarsky@mit.edu>, "www-style Mailing List" <www-style@w3.org>
>> I'm wondering what happens when \nnnnnnn escapes (backslash followed by >> numbers) are used and the resulting character is invalid (eg it's a high >> or low surrogate, or is above 0x00110000). Should the escape be treated >> as U+FFFD? Or should this be considered an error and error recovery >> (skipping a declaration or whatever needs to happen at that point in >> parsing) happen? Or something else? > > The spec doesn't say. It also doesn't say what should happen with \0 > (indeed it calls that one out explicitly). I suggest treating them all as > U+FFFD, and only dropping the rule if U+FFFD would cause the rule to be > dropped at that point. (The idea is that a literal reading of 2.1 suggests > that no codepoints can be invalid except 0, and so they should be treated > the same way valid-but-unknown characters would be.) In the case of values above 0x00110000, they are not "codepoints", at least by the Unicode definition D4b in http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf
Received on Tuesday, 15 November 2005 19:13:03 UTC