Re: charset parameter

On 25.07.01 at 14:03, Lloyd Wood <l.wood@eim.surrey.ac.uk> wrote:

>I've always wondered how you define the charset for the line that defines
>the charset so that you can interpret it.

For the HTTP header fields it's fairly simple; they're US-ASCII period. For
that bogosity called "META" the waters are substansially more muddy.
Especially since there aren't any clear rules for whether the charset in
the META element overrides the one in the HTTP header... Or vice versa...
Or what this means for the case when the charset in the HTTP header is
there by inference (as a default, not explicitly)...

In short: it's a mess. :-(

I think Martin, Björn, and I, are all in agreement on this in general; but
the current discussion is about niggling details like which spec takes
precedence, IETF or W3C, or what the least inappropriate (or "most
appropriate" if you're the cup-is-half-full type) behaviour for the
Validator would be in various cases.


>>In practice you have to decide between "Assume ISO-8859-1 as that's what
>>/people/ tend to assume" or "Assume nothing as people will get it wrong
>>some part of the time".
>
>I don't see how you can ever assume nothing.

"I'm sorry, but that Document Type is not in my Catalog. I cannot Validate
this document" and "I'm sorry, but that Character Encoding is not in my
database. I cannot Validate this document." or "I'm sorry, but I was unable
to determine the Character Encoding based on available information. Please
make your Character Encoding explicit in the HTTP headers".

Etc. :-)

To "assume nothing" in this context means that if we cannot get a clear,
unambigius, indication, we abort instead of guessing or, in this case,
instead of interpreting the internally inconsistent specifications (that's
the HTML-WG's job ;D).

Received on Thursday, 26 July 2001 00:26:05 UTC