- From: Geoffrey Sneddon <foolistbar@googlemail.com>
- Date: Sat, 07 Sep 2013 11:54:47 +0100
- To: "Kang-Hao (Kenny) Lu" <kanghaol@oupeng.com>
- Cc: WHAT Working Group <whatwg@whatwg.org>
On 06/09/2013 04:05, Kang-Hao (Kenny) Lu wrote: > (2013/09/06 6:08), Geoffrey Sneddon wrote: >> The phrasing content section states: >> >>> Text nodes and attribute values must consist of Unicode characters, >>> must not contain U+0000 characters, must not contain permanently >>> undefined Unicode characters (noncharacters), and must not contain >>> control characters other than space characters. This specification >>> includes extra constraints on the exact value of Text nodes and >>> attribute values depending on their precise context. >> >> And the pre-processing the input-stream section states: >> >>> Any occurrences of any characters in the ranges U+0001 to U+0008, >>> U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters >>> U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, >>> U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, >>> U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, >>> U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, >>> U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are parse >>> errors. These are all control characters or permanently undefined >>> Unicode characters (noncharacters). >> >> Note the first uses "Unicode characters", the second "characters" — the >> former excludes surrogates as a conformance requirement. >> >> Note that every disallowed non-surrogate character is a parse error. > > Except U+0000 or am I missing something? This is handled inline in the parser, as noted in the preprocessing section. It sometimes gets passed through as U+0000, sometimes gets changed to U+FFFD, sometimes gets ignored, but always creates a parser error. >> Therefore, it would make sense to make surrogates parse errors. >> >> It should be noted that they can only occur in the input stream if they >> come from script (as they cannot be decoded from the input byte stream >> as the decoders will never emit a surrogate). > > which means that this seems ... cubersome ... to implement in a > conformance checker. Which reminds me, does > > # Conformance checkers must report at least one parse error > # condition to the user if one or more parse error conditions exist > # in the document and must not report parse error conditions if none > # exist in the document. Conformance checkers may report more than > # one parse error condition if more than one parse error condition > # exists in the document. > > mean validator.nu and Firefox view source are non-conforming because > they do nothing about document.write() ? > > I think we should exempt conformance checkers from scripts instead. They already are. From the "Conformance classes" section: > Conformance checkers must check that the input document conforms when parsed without a browsing context (meaning that no scripts are run, and that the parser's scripting flag is disabled), and should also check that the input document conforms when parsed with a browsing context in which scripts execute, and that the scripts never cause non-conforming states to occur other than transiently during script execution itself. (This is only a "SHOULD" and not a "MUST" requirement because it has been proven to be impossible. [COMPUTABLE]) (I feel like pedanting and pointing out this is untrue — it has not been proven impossible to do, it has been proven impossible to do in general. It wouldn't be that hard to design a conformance checker to check "<html><script>document.write("<p>")</script>".) On the other hand, a JS console can reasonably report parse errors from script, so the parse errors are still worthwhile to have. /Geoffrey.
Received on Saturday, 7 September 2013 10:55:35 UTC