Re: UTF-8 from Nick Kew on 2001-12-10 (www-validator@w3.org from December 2001)

From: Nick Kew <nick@webthing.com>
Date: Mon, 10 Dec 2001 20:49:53 +0000 (GMT)
To: Michael Everson <everson@evertype.com>
cc: <www-validator@w3.org>
Message-ID: <20011210204345.F1490-100000@fenris.webthing.com>

On Mon, 10 Dec 2001, Michael Everson wrote:

> I have a lot of pages with a few Latin 1 (non ASCII) characters in
> them. I want to convert them all to UTF-8. This isn't always
> straightforward.

Won't iconv do it?

> Where the Validator fails BADLY is that if I am converting to UTF-8
> and I miss one of the characters (usually this means there is a
> single Latin 1 character in the file instead of a pair) I get a very
> unhelpful message like this:
>
> "Sorry, I am unable to validate this document because on line 63 it
> contained some byte(s) that I cannot interpret as utf-8. Please check
> both the content of the file and the character encoding indication. "

That'll be when the parser refuses your document outright because
it's incompatible with your declared charset.  It also means that
the source is (technically at least) too broken even to try and
display.

> But the Validator is broken. It doesn't display the source, and so I
> have NO IDEA how to find line 63.

Erm - open your document in a text editor?

BTW: do you have a need to convert, or is this an exercise?

-- 
Nick Kew

Site Valet - the essential service for anyone with a website.
<URL:http://valet.webthing.com/>

Received on Monday, 10 December 2001 15:50:02 UTC