W3C home > Mailing lists > Public > www-validator@w3.org > September 2002

Re: possible bug in the validator?

From: Lloyd Wood <l.wood@eim.surrey.ac.uk>
Date: Wed, 4 Sep 2002 17:43:31 +0100 (BST)
To: Liam Quinn <liam@htmlhelp.com>
cc: www-validator@w3.org
Message-ID: <Pine.SOL.4.43.0209041741500.20773-100000@artemis.ee.surrey.ac.uk>

> > On Wed, 4 Sep 2002, Liam Quinn wrote:
> >
> > > On Wed, 4 Sep 2002, Lloyd Wood wrote:
> > >
> > > > On Wed, 4 Sep 2002, Olivier Thereaux wrote:
> > > >
> > > > > On Wed, Sep 04, 2002, Lloyd Wood wrote:
> > > > > > > Your server is sending the header
> > > > > > > Content-Type: text/html; charset=us-ascii
> > > > > > > which overrides the charset specified within the HTML document.
> > > > > >
> > > > > > Surely the charset in the document should take precedence?
> > > > >
> > > > > No, see http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
> > > >
> > > > Thanks.
> > > >
> > > > I can't decide if that's a very subtle way to get Content-Type used
> > > > properly, or just very very broken.
> > >
> > > Well, consider a server (such as the original poster's) that transcodes
> > > HTML pages on-the-fly according to the capabilities of the client.  It's
> > > trivial for the server to set the charset in the HTTP header, but changing
> > > a <meta> tag within the HTML document is much more difficult, especially
> > > when almost all HTML documents are invalid.
> >
> > In your example, how does the server know what charset to transcode
> > the page _from_?
>
> The server could use something like Apache's AddCharset directive, or it
> could use language-specific heuristics to detect the original character
> encoding.

it wouldn't simply look at the charset in the document meta tag?

AddDefaultCharset effectively sets Content-Type.

L.

<http://www.ee.surrey.ac.uk/Personal/L.Wood/><L.Wood@surrey.ac.uk>
Received on Wednesday, 4 September 2002 12:43:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:04 GMT