Re: charset parameter

At 14:03 01/07/25 +0100, Lloyd Wood wrote:
>On Wed, 25 Jul 2001, Terje Bless wrote:
>
> > The issue is that the transport protocol sez that an absense of an explicit
> > charset parameter on the Content-Type means "ISO-8859-1"; HTML or XML rules
> > don't apply here. When it comes time to parse the markup, you already have
> > a charset; the XML/HTML rules do not govern HTTP.
>
>well, that's handy.

But as I wrote, it's not correct.


>I've always wondered how you define the charset for the line that
>defines the charset so that you can interpret it.

The HTTP headers are defined to be in ASCII. For the 'in-document'
information, either you assume ASCII (for HTML) or there are more
complicated heuristics (see XML app. F). The validator currently
assumes ASCII (or anything compatible with it).


> > In practice you have to decide between "Assume ISO-8859-1 as that's what
> > /people/ tend to assume" or "Assume nothing as people will get it wrong
> > some part of the time".
>
>I don't see how you can ever assume nothing.

Well, for the validator, 'assume nothing' just means 'document
doesn't validate'. That's quite easy :-).

Regards,   Martin.

Received on Wednesday, 25 July 2001 22:23:27 UTC