Re: Ill-formed Validator response from olivier Thereaux on 2008-01-02 (www-validator@w3.org from January 2008)

From: olivier Thereaux <ot@w3.org>
Date: Wed, 2 Jan 2008 22:05:17 +0900
To: Henri Sivonen <hsivonen@iki.fi>
Cc: W3C Validator Community <www-validator@w3.org>
Message-Id: <0B61B43F-905B-422F-A523-6D55CB0DF230@w3.org>

Hi Henri,

Thanks for reporting this.

On Jan 2, 2008, at 21:20 , Henri Sivonen wrote:
> http://validator.w3.org/check?uri=http%3A%2F%2Fphilip.html5.org%2Fmisc%2Fchars.html&charset=iso-8859-1&output=soap12

OK, that's an interesting case. If In understand correctly this is how  
it was constructed:

* take some claiming to be utf-8, but isn't. A bit more metadata on  
the test case would help.
* force the validator to interpret that as iso-8859-1
  (note that if left to its own device, the validator will refuse to  
validate the document as it can't decode as utf-8)
* the forced transcoding creates something ugly, which is then  
displayed in the error source
* That's bad, especially since it trips up a number of parsers which  
seem to think that the data stops there

Is that a proper assessment of what's happening? I am not an expert in  
unicode and your report is a bit terse. :)

Ideally, please report this to bugzilla, with more details and  
information, that would be very helpful.

Thanks.
-- 
olivier

Received on Wednesday, 2 January 2008 13:05:21 UTC