Re: The error was: utf8 "\xF6" does not map to Unicode from Jukka K. Korpela on 2010-02-04 (www-validator@w3.org from February 2010)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Thu, 4 Feb 2010 18:42:13 +0200
To: <kogentis@googlemail.com>, <www-validator@w3.org>
Message-ID: <B89EC1FFD41947248A0A1EE6F194C8D5@JukanPC>

Michael Schratz wrote:

> Validating kogentis.de <http://kogentis.de/>
>
> While validating my site this error stops the process:
>
> "Sorry, I am unable to validate this document because on line *65* it
> contained one or more bytes that I cannot interpret as |utf-8|

That's a rather good error message, is it not?

> The error was: utf8 "\xF6" does not map to Unicode"
>
> The source code doesn't contain this characters...

The error message does not refer to any characters. It refers to bytes 
(octets) that do not constitute a representation of any character in the 
declared character encoding. That is, data error.

How you fix this is a different thing. The document appears to contain ISO 
8859-1 encoded data, which is (in the general case) malformed when served as 
UTF-8. The UTF-8 encoding is declared in the HTTP headers, so you need to 
find out how to change those headers or how to change the document's 
encoding.

> it seems, it's a
> part code of thickbox-library...

Whatever. If you do Print Preview, you will see cluelessness in action. The 
offending data is in an element that is (meant to be) hidden when viewed on 
screen, shown in print media, presenting something completely different from 
the page content on screen. Of course, the sensible thing is to get rid of 
such childishness, deleting the part that contains the text, which makes no 
sense. But if you don't want to do that, the technical answer is that the 
validator simply reports data error and you have to take it from there.

The error is at character level, so it's actually independent of any HTML 
markup.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/

Received on Thursday, 4 February 2010 16:43:28 UTC