W3C home > Mailing lists > Public > www-validator@w3.org > September 2007

Re: The error was: utf8 "\xBA" does not map to Unicode

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Mon, 24 Sep 2007 10:51:08 +0300 (EEST)
To: www-validator@w3.org
cc: tammyglaser798@earthlink.net
Message-ID: <Pine.SOC.4.64.0709241032390.29575@mustatilhi.cs.tut.fi>

On Mon, 24 Sep 2007, Frank Ellermann wrote:

> Tammy Glaser wrote:
>
>> when I refer to it from the actual page by clicking the referral link, I get the validation error
>> listed in the subject.
>
> Your document claims to be UTF-8, and apparently your server supports 
> this theory.

Actually the server does not send encoding information in HTTP headers, so 
browsers and other user agents (such as validators) are supposed to 
believe the <meta> tag, and they do. Thus, changing

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

to

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

This can be (informally) seen by using the option for overriding the 
encoding in the validator's user interface (the Encoding menu).

(There is a similar problem later with the punctuation apostrophe.
For it, declaring the encoding as windows-1252 would do, and so would e.g. 
the entity reference &rsquo;.)

But there's more... not really about validation but about proper use of 
characters.

> But in line 126 (reported by the validator) you use a degree character that's
> not encoded as UTF-8.  There are various ways to fix this, two ideas:
>
> - you can simply replace this character by &#186; (decimal for hex. BA)
> - you could also try to declare windows-1252 where you now have UTF-8

The octet BA is indeed apparently meant to denote the degree sign,
but what it really means in windows-1252 (and in iso-8859-1) is the 
masculine ordinal indicator, which is a superscripted letter "o", often 
underlined. That character can be represented as &#186;. But it's not the 
correct character.

The degree sign is U+00B0, representable in windows-1252 as octet B0 and 
generally (in any encoding) in HTML as &#176; or, alternatively, as &deg;.

PS. It is inappropriate to use the "W3C HTML 4.01!" icon when the page 
does not in fact validate. On the other hand, such icons are worse than 
useless anyway, partly because those few visitors who understand them 
may know well that they are often even false claims.

(I'm CC'ing the OP, who might not read the list.)

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Monday, 24 September 2007 07:51:24 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 10 December 2014 20:09:00 UTC