W3C home > Mailing lists > Public > www-validator@w3.org > December 2001

Re: UTF-8

From: Nick Kew <nick@webthing.com>
Date: Mon, 10 Dec 2001 20:49:53 +0000 (GMT)
To: Michael Everson <everson@evertype.com>
cc: <www-validator@w3.org>
Message-ID: <20011210204345.F1490-100000@fenris.webthing.com>

On Mon, 10 Dec 2001, Michael Everson wrote:

> I have a lot of pages with a few Latin 1 (non ASCII) characters in
> them. I want to convert them all to UTF-8. This isn't always
> straightforward.

Won't iconv do it?

> Where the Validator fails BADLY is that if I am converting to UTF-8
> and I miss one of the characters (usually this means there is a
> single Latin 1 character in the file instead of a pair) I get a very
> unhelpful message like this:
> "Sorry, I am unable to validate this document because on line 63 it
> contained some byte(s) that I cannot interpret as utf-8. Please check
> both the content of the file and the character encoding indication. "

That'll be when the parser refuses your document outright because
it's incompatible with your declared charset.  It also means that
the source is (technically at least) too broken even to try and

> But the Validator is broken. It doesn't display the source, and so I
> have NO IDEA how to find line 63.

Erm - open your document in a text editor?

BTW: do you have a need to convert, or is this an exercise?

Nick Kew

Site Valet - the essential service for anyone with a website.
Received on Monday, 10 December 2001 15:50:02 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:58:25 UTC