W3C home > Mailing lists > Public > public-html-comments@w3.org > September 2009

Re: What makes illegal characters non-conformant

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 23 Sep 2009 21:03:43 +0300
Cc: "Anne van Kesteren" <annevk@opera.com>, public-html-comments@w3.org
Message-Id: <15795833-1583-416E-A916-C2C04977F478@iki.fi>
To: Henry S.Thompson <ht@inf.ed.ac.uk>
On Sep 23, 2009, at 20:34, Henry S. Thompson wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Anne van Kesteren writes:
>
>> http://whatwg.org/html5#misinterpreted-for-compatibility
>
> That's about agents, not documents.


What happens here is that Validator.nu is out of date and doesn't  
misinterpret US-ASCII for compatibility, the US-ASCII decoder finds a  
bad byte.

However, what makes the document non-conforming (but what isn't the  
reason why Validator.nu says it's non-conforming) is the sentence "The  
character encoding name given must be the name of the character  
encoding used to serialize the file." under http://www.whatwg.org/specs/web-apps/current-work/#charset

The byte 0x80 is not valid in US-ASCII. Thus, US-ASCII isn't the name  
of the encoding used.

Note that for encodings that aren't "misinterpreted for compatibility"  
the reasoning would be that the normative requirements of the encoding  
become part of the conformance criteria by reference. Since  
Validator.nu is out of date and treats US-ASCII like any non-special  
encoding, this is the reason why it complains.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Wednesday, 23 September 2009 18:04:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 June 2011 00:14:00 GMT