- From: Michael Everson <everson@evertype.com>
- Date: Tue, 11 Dec 2001 13:19:38 +0000
- To: Nick Kew <nick@webthing.com>
- Cc: <www-validator@w3.org>
At 20:49 +0000 2001-12-10, Nick Kew wrote: > >> I have a lot of pages with a few Latin 1 (non ASCII) characters in >> them. I want to convert them all to UTF-8. This isn't always >> straightforward. > >Won't iconv do it? What is that? > > "Sorry, I am unable to validate this document because on line 63 it >> contained some byte(s) that I cannot interpret as utf-8. Please check >> both the content of the file and the character encoding indication. " > >That'll be when the parser refuses your document outright because >it's incompatible with your declared charset. It also means that >the source is (technically at least) too broken even to try and >display. Oh, come on! I declare UTF-8, grand. In plain text, UTF-8 looks like ASCII with some Latin-1 characters in it in pairs on triplets. What's wrong with the Validator is that if even ONE of the UTF-8 characters wasn't turned from a single Latin-1 character into a pair of Latin-1 characters, then it chokes. Now my browsers display it easily enough. > > But the Validator is broken. It doesn't display the source, and so I >> have NO IDEA how to find line 63. > >Erm - open your document in a text editor? My editors wrap lines and things. They don't number them. One can't always see the single broken character easily. The point is that it is extremely useful for all the other validation processes, where the numbered lines are listed and the little ^^ carets show you where the error is. But on the UTF-8 check this useful source display doesn't happen, and that's what I would like you good folks to fix. >BTW: do you have a need to convert, or is this an exercise? Yes, I am working on converting my whole site. My showpiece is http://www.evertype.com/standards/iso15924/document/scriptbib.html. There is a lot more than Latin 1 in that. By the way, by way of introduction, I'm one of the authors of the Unicode Standard, work on the Macintosh platform, and have been eagerly awaiting decent Unicode tools for years now. Validator is a great tool, but it falls down here in terms of helping people make sure their pages are in UTF-8. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Received on Tuesday, 11 December 2001 08:19:44 UTC