Re: [whatwg] Parse errors for invalid characters

(2013/09/06 6:08), Geoffrey Sneddon wrote:
> The phrasing content section states:
> 
>> Text nodes and attribute values must consist of Unicode characters,
>> must not contain U+0000 characters, must not contain permanently
>> undefined Unicode characters (noncharacters), and must not contain
>> control characters other than space characters. This specification
>> includes extra constraints on the exact value of Text nodes and
>> attribute values depending on their precise context.
> 
> And the pre-processing the input-stream section states:
> 
>> Any occurrences of any characters in the ranges U+0001 to U+0008,
>> U+000E to U+001F, U+007F to U+009F, U+FDD0 to U+FDEF, and characters
>> U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE,
>> U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF,
>> U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE,
>> U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF,
>> U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are parse
>> errors. These are all control characters or permanently undefined
>> Unicode characters (noncharacters).
> 
> Note the first uses "Unicode characters", the second "characters" — the
> former excludes surrogates as a conformance requirement.
> 
> Note that every disallowed non-surrogate character is a parse error.

Except U+0000 or am I missing something?

> Therefore, it would make sense to make surrogates parse errors.
> 
> It should be noted that they can only occur in the input stream if they
> come from script (as they cannot be decoded from the input byte stream
> as the decoders will never emit a surrogate).

which means that this seems ... cubersome ... to implement in a
conformance checker. Which reminds me, does

   # Conformance checkers must report at least one parse error
   # condition to the user if one or more parse error conditions exist
   # in the document and must not report parse error conditions if none
   # exist in the document. Conformance checkers may report more than
   # one parse error condition if more than one parse error condition
   # exists in the document.

mean validator.nu and Firefox view source are non-conforming because
they do nothing about document.write() ?

I think we should exempt conformance checkers from scripts instead.


Cheers,
Kenny
-- 
Web Specialist, Opera Sphinx Game Force, Oupeng Browser, Beijing
Try Oupeng: http://www.oupeng.com/

Received on Friday, 6 September 2013 03:06:27 UTC