W3C home > Mailing lists > Public > www-validator@w3.org > December 2001

Re: UTF-8

From: Michael Everson <everson@evertype.com>
Date: Tue, 11 Dec 2001 13:19:38 +0000
Message-Id: <p05101000b83bb6c0ae55@[193.120.28.157]>
To: Nick Kew <nick@webthing.com>
Cc: <www-validator@w3.org>
At 20:49 +0000 2001-12-10, Nick Kew wrote:
>
>>  I have a lot of pages with a few Latin 1 (non ASCII) characters in
>>  them. I want to convert them all to UTF-8. This isn't always
>>  straightforward.
>
>Won't iconv do it?

What is that?

>  > "Sorry, I am unable to validate this document because on line 63 it
>>  contained some byte(s) that I cannot interpret as utf-8. Please check
>>  both the content of the file and the character encoding indication. "
>
>That'll be when the parser refuses your document outright because
>it's incompatible with your declared charset.  It also means that
>the source is (technically at least) too broken even to try and
>display.

Oh, come on! I declare UTF-8, grand. In plain text, UTF-8 looks like 
ASCII with some Latin-1 characters in it in pairs on triplets. What's 
wrong with the Validator is that if even ONE of the UTF-8 characters 
wasn't turned from a single Latin-1 character into a pair of Latin-1 
characters, then it chokes. Now my browsers display it easily enough.

>  > But the Validator is broken. It doesn't display the source, and so I
>>  have NO IDEA how to find line 63.
>
>Erm - open your document in a text editor?

My editors wrap lines and things. They don't number them. One can't 
always see the single broken character easily.

The point is that it is extremely useful for all the other validation 
processes, where the numbered lines are listed and the little ^^ 
carets show you where the error is. But on the UTF-8 check this 
useful source display doesn't happen, and that's what I would like 
you good folks to fix.

>BTW: do you have a need to convert, or is this an exercise?

Yes, I am working on converting my whole site. My showpiece is 
http://www.evertype.com/standards/iso15924/document/scriptbib.html. 
There is a lot more than Latin 1 in that.

By the way, by way of introduction, I'm one of the authors of the 
Unicode Standard, work on the Macintosh platform, and have been 
eagerly awaiting decent Unicode tools for years now. Validator is a 
great tool, but it falls down here in terms of helping people make 
sure their pages are in UTF-8.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com
Received on Tuesday, 11 December 2001 08:19:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:00 GMT