Re: W3C validator and charset

From: Alan J. Flavell (
Date: Mon, Aug 09 1999

Date: Mon, 9 Aug 1999 05:21:27 -0400 (EDT)
From: "Alan J. Flavell" <>
To: Dan Connolly <>
Message-ID: <>
Subject: Re: W3C validator and charset

On Sun, 8 Aug 1999, Dan Connolly wrote:

> Found this in my mailbox... I think it's been fixed.

I don't believe so, nor does the changes list for the W3C validator show
any mention of such a fix. 

  Character encoding: utf-8 
     Level of HTML: HTML 4.0 Transitional. 

Below are the results of attempting to parse this document with an SGML

Error at line 32:
                 non SGML character number 146

The WDG's validator does not have this problem, and has no difficulty
processing the same page. 

> If not, please report the problem to

Here we are.

all the best

> 20 Jul 1998 "Alan J. Flavell" wrote:
> > 
> > Greetings, I wonder whether I could interest you in this one.
> > 
> > It appears that the W3C validator doesn't respond appropriately to
> > documents that are sent to it with a non-iso-8859-1 charset, for example
> > utf-8.  It rejects octets in the range 128-159 decimal as if they were
> > illegal, when in this charset they are perfectly legal and indeed
> > necessary.  Apparently the same would be true with even 8-bit character
> > codes where this range of octet values is assigned to printable
> > characters, such as koi8-r, Mac, or Windows codes.
> > 
> > So it seems that there may be some functionality missing from the
> > validator in this area.
> > 
> > I know that A.Prilop has made several attempts to bring this to the
> > attention of Gerald O, but apparently without receiving any kind of
> > answer.  As I've had discussions with you in the past on this topic
> > area, I wondered whether I could interest you in this issue.  It's a
> > pity for this otherwise excellent service to fail us just when there's a
> > growing interest in using utf-8 document coding (although, of course, a
> > more-general solution, that permitted codings such as koi8-r etc.  to be
> > validated too, not only the unicode encodings, would be ideal).
> > 
> > I have to admit I'm not aware of whether the underlying software has
> > this functionality already; at the very least I would have thought the
> > validator pages might mention this shortcoming, if it cannot be fixed
> > quickly.
> > 
> > all the best
> -- 
> Dan Connolly, W3C
> tel:+1-512-310-2971 (office, mobile)
> (put your tel# in the Subject:)


  "I have no problem with cute and clever.  In fact I actually _like_ cute
  and clever.  I don't think it's clever to be cute in such a way as to
  make the pages less useful.        But then I'm not a graphic designer."
                        -     Calum I Mac Leod on