W3C home > Mailing lists > Public > www-validator@w3.org > September 2002

Re: possible bug in the validator?

From: Liam Quinn <liam@htmlhelp.com>
Date: Wed, 4 Sep 2002 12:33:54 -0400 (EDT)
To: Lloyd Wood <l.wood@eim.surrey.ac.uk>
cc: <www-validator@w3.org>
Message-ID: <Pine.LNX.4.33L2.0209041227420.13463-100000@localhost.localdomain>

On Wed, 4 Sep 2002, Lloyd Wood wrote:

> On Wed, 4 Sep 2002, Liam Quinn wrote:
>
> > On Wed, 4 Sep 2002, Lloyd Wood wrote:
> >
> > > On Wed, 4 Sep 2002, Olivier Thereaux wrote:
> > >
> > > > On Wed, Sep 04, 2002, Lloyd Wood wrote:
> > > > > > Your server is sending the header
> > > > > > Content-Type: text/html; charset=us-ascii
> > > > > > which overrides the charset specified within the HTML document.
> > > > >
> > > > > Surely the charset in the document should take precedence?
> > > >
> > > > No, see http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
> > >
> > > Thanks.
> > >
> > > I can't decide if that's a very subtle way to get Content-Type used
> > > properly, or just very very broken.
> >
> > Well, consider a server (such as the original poster's) that transcodes
> > HTML pages on-the-fly according to the capabilities of the client.  It's
> > trivial for the server to set the charset in the HTTP header, but changing
> > a <meta> tag within the HTML document is much more difficult, especially
> > when almost all HTML documents are invalid.
>
> In your example, how does the server know what charset to transcode
> the page _from_?

The server could use something like Apache's AddCharset directive, or it
could use language-specific heuristics to detect the original character
encoding.  I don't know what the original poster's server does, but if you
understand Czech (I don't), the answer may be here:

http://www.csacek.cz/

-- 
Liam Quinn
Received on Wednesday, 4 September 2002 12:33:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:04 GMT