UTF-8 from Michael Everson on 2001-12-10 (www-validator@w3.org from December 2001)

From: Michael Everson <everson@evertype.com>
Date: Mon, 10 Dec 2001 11:11:36 +0000
To: www-validator@w3.org
Message-Id: <p05101005b83a481ccc50@[193.120.28.70]>

W3C colleagues,

I have a lot of pages with a few Latin 1 (non ASCII) characters in 
them. I want to convert them all to UTF-8. This isn't always 
straightforward.

Where the Validator fails BADLY is that if I am converting to UTF-8 
and I miss one of the characters (usually this means there is a 
single Latin 1 character in the file instead of a pair) I get a very 
unhelpful message like this:

"Sorry, I am unable to validate this document because on line 63 it 
contained some byte(s) that I cannot interpret as utf-8. Please check 
both the content of the file and the character encoding indication. "

But the Validator is broken. It doesn't display the source, and so I 
have NO IDEA how to find line 63.

If W3C wants us to use UTF-8, this tool needs to be fixed ASAP. I 
heard a rumour that no one has worked on the Validator for quite a 
while, but a fix on this particular point would really help.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com

Received on Monday, 10 December 2001 06:11:45 UTC