Re: What makes illegal characters non-conformant from Henri Sivonen on 2009-09-23 (public-html-comments@w3.org from September 2009)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 23 Sep 2009 21:03:43 +0300
To: Henry S.Thompson <ht@inf.ed.ac.uk>
Cc: "Anne van Kesteren" <annevk@opera.com>, public-html-comments@w3.org
Message-Id: <15795833-1583-416E-A916-C2C04977F478@iki.fi>

On Sep 23, 2009, at 20:34, Henry S. Thompson wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Anne van Kesteren writes:
>
>> http://whatwg.org/html5#misinterpreted-for-compatibility
>
> That's about agents, not documents.

What happens here is that Validator.nu is out of date and doesn't  
misinterpret US-ASCII for compatibility, the US-ASCII decoder finds a  
bad byte.

However, what makes the document non-conforming (but what isn't the  
reason why Validator.nu says it's non-conforming) is the sentence "The  
character encoding name given must be the name of the character  
encoding used to serialize the file." under http://www.whatwg.org/specs/web-apps/current-work/#charset

The byte 0x80 is not valid in US-ASCII. Thus, US-ASCII isn't the name  
of the encoding used.

Note that for encodings that aren't "misinterpreted for compatibility"  
the reasoning would be that the normative requirements of the encoding  
become part of the conformance criteria by reference. Since  
Validator.nu is out of date and treats US-ASCII like any non-special  
encoding, this is the reason why it complains.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 23 September 2009 18:04:34 UTC