W3C home > Mailing lists > Public > www-validator@w3.org > June 2009

Re: utf8 validator confusion

From: Sam Ruby <rubys@intertwingly.net>
Date: Mon, 22 Jun 2009 07:29:19 -0400
Message-ID: <4A3F6B0F.8030902@intertwingly.net>
To: David Dorward <david@dorward.me.uk>
CC: Sean <sean@mediamice.net>, www-validator@w3.org
David Dorward wrote:
> Sean wrote:
>> My encoding is UTF-8 but My Server is showing as UTF-8.
> 
> Your server says:
> 
>  "Content-type: text/xml"
> 
> If the Atom feed is using UTF-8, it should be:
> 
>  "Content-type: application/xml; charset=utf-8"
> 
> On the subject of which - you seem to be using a hybrid of Atom and RSS. I
> suggest you switch to straight Atom - it should simplify matters (and,
> AFAIK, RSS offers nothing you can't find in Atom). If you switch to Atom,
> use application/atom+xml rather then a generic application/xml.
> 
> The feed itself, however, says:
> 
>  "<?xml version="1.0" encoding="iso-8859-1"?>"
> 
> If it is UTF-8 it should read:
> 
>  "<?xml version="1.0" encoding="utf-8"?>"
> 
> ... or, since UTF-8 is the default:
> 
>  "<?xml version="1.0"?>"
> 
> ... or, since 1.0 is the default:
> 
>  ""
> 
>> Also if characters are used then it fails validation. Eg This “ type of
>> “character” , or £50, or ?.
> 
> After eyeballing the feed, I can't see any data in there which isn't
> straight ASCII so I can't tell if it is using UTF-8, ISO-8859-1 or
> something else.
> 
> Whatever encoding you are actually using doesn't match the one that the
> validator thinks you are using (which I assume, given the xml prolog,
> would be ISO-8859-1).

I see curly quotes and an "n dash", from the windows-1252 code page. 
Since these characters are not a part of either ASCII or iso-8859-1, the 
feed validator displays the characters in raw hex and highlighted in 
red.  Alternately, people have developed scripts which convert the 
Windows-centric code page into a more cross-platform friendly form, for 
example:

http://www.fourmilab.ch/webtools/demoroniser/

- Sam Ruby
Received on Monday, 22 June 2009 11:29:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:35 GMT