- From: Terje Bless <link@pobox.com>
- Date: Thu, 26 Jul 2001 04:43:59 +0200
- To: W3C Validator <www-validator@w3.org>
On 25.07.01 at 14:03, Lloyd Wood <l.wood@eim.surrey.ac.uk> wrote: >I've always wondered how you define the charset for the line that defines >the charset so that you can interpret it. For the HTTP header fields it's fairly simple; they're US-ASCII period. For that bogosity called "META" the waters are substansially more muddy. Especially since there aren't any clear rules for whether the charset in the META element overrides the one in the HTTP header... Or vice versa... Or what this means for the case when the charset in the HTTP header is there by inference (as a default, not explicitly)... In short: it's a mess. :-( I think Martin, Björn, and I, are all in agreement on this in general; but the current discussion is about niggling details like which spec takes precedence, IETF or W3C, or what the least inappropriate (or "most appropriate" if you're the cup-is-half-full type) behaviour for the Validator would be in various cases. >>In practice you have to decide between "Assume ISO-8859-1 as that's what >>/people/ tend to assume" or "Assume nothing as people will get it wrong >>some part of the time". > >I don't see how you can ever assume nothing. "I'm sorry, but that Document Type is not in my Catalog. I cannot Validate this document" and "I'm sorry, but that Character Encoding is not in my database. I cannot Validate this document." or "I'm sorry, but I was unable to determine the Character Encoding based on available information. Please make your Character Encoding explicit in the HTTP headers". Etc. :-) To "assume nothing" in this context means that if we cannot get a clear, unambigius, indication, we abort instead of guessing or, in this case, instead of interpreting the internally inconsistent specifications (that's the HTML-WG's job ;D).
Received on Thursday, 26 July 2001 00:26:05 UTC