Re: possible bug in the validator?

On Wed, 4 Sep 2002, Lloyd Wood wrote:

> > > On Wed, 4 Sep 2002, Liam Quinn wrote:
> > >
> > > > On Wed, 4 Sep 2002, Lloyd Wood wrote:
> > > >
> > > > > On Wed, 4 Sep 2002, Olivier Thereaux wrote:
> > > > >
> > > > > > On Wed, Sep 04, 2002, Lloyd Wood wrote:
> > > > > > > > Your server is sending the header
> > > > > > > > Content-Type: text/html; charset=us-ascii
> > > > > > > > which overrides the charset specified within the HTML document.
> > > > > > >
> > > > > > > Surely the charset in the document should take precedence?
> > > > > >
> > > > > > No, see http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
> > > > >
> > > > > Thanks.
> > > > >
> > > > > I can't decide if that's a very subtle way to get Content-Type used
> > > > > properly, or just very very broken.
> > > >
> > > > Well, consider a server (such as the original poster's) that transcodes
> > > > HTML pages on-the-fly according to the capabilities of the client.  It's
> > > > trivial for the server to set the charset in the HTTP header, but changing
> > > > a <meta> tag within the HTML document is much more difficult, especially
> > > > when almost all HTML documents are invalid.
> > >
> > > In your example, how does the server know what charset to transcode
> > > the page _from_?
> >
> > The server could use something like Apache's AddCharset directive, or it
> > could use language-specific heuristics to detect the original character
> > encoding.
>
> it wouldn't simply look at the charset in the document meta tag?

I don't know.  It would depend on the server.

If I were writing a server, I'd want to avoid the expense and difficulty
of parsing HTML.

-- 
Liam Quinn

Received on Wednesday, 4 September 2002 12:57:52 UTC