Character data messed up in 0.8.0 beta 1

On Thu, 19 Apr 2007, olivier Thereaux wrote:

> It is with great pleasure and excitement that we am starting a Beta Test 
> period for the W3C Markup Validator:
>
> http://validator-test.w3.org/

I haven't found many differences yet (it's faster, but probably just 
because of smaller load), but this one is rather serious:

When testing a page in ISO-8859-1 encoding, the echo of a source line in 
an error message has the non-ASCII characters replaced by malformed data, 
displayed by IE 7 as small rectangles, by Firefox 2 as U+FFFD (a white
question mark in a black lozenge)

Test page: http://www.cs.tut.fi/~jkorpela/test/val.html

The reason is apparently that the beta version echoes the source line "as 
is", even though the source is ISO-8859-1 encoded and the validator's 
report page is UTF-8 encoded.

This doesn't happen in the production version validator.w3.org, which 
seems to convert the source to UTF-8 before echoing it.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Thursday, 19 April 2007 10:01:25 UTC