- From: Kjetil Torgrim Homme <kjetilho@ifi.uio.no>
- Date: Sun, 08 Jun 2003 10:45:37 +0200
- To: W3C Validator <www-validator@w3.org>
[Terje Bless]: > > No argument from me there. In fact I consider it a bug in HTML 4.0 > that they meddle with what is IMO the provenance of HTTP, and a > bug in HTTP that they meddle with what is the provenance of MIME. agreed. > > do you really think so? I find that very hard to believe, > > especially since HTML 4 isn't even an IETF standard. > > And the W3C isn't, and doesn't claim to be, a recognized standards > body. But given this is the _W3C_ Markup Validator we kinda have > to accept its authority as given, non? :-) :-) > But my point was that even if both documents were produced under > the aegis of the IETF, if HTML passed IETF Last Call with no > substantive complaints then it would have quite legally superseded > this provisio from HTTP. ah. yes, if it was a proposed standard. > If this was not acceptable to the IETF, the Area Director or the > RFC Editor should have addressed the issue prior to publication as > a standards track RFC. exactly. > Case in point; RFC1036 (netnews) manages the neat trick of saying > a) that it borrows a majority of its syntax from RFC822 (email), > b) that where the two diverge RFC822 is to be considered > authorative, _and_ c) goes merrily on its way superseding and > modifying both syntax and semantics of common header > fields. RFC1036 is still considered authorative (albeit badly out > of touch with reality) within the IETF. this is a bit off topic, but 1036 never was a proposed standard. it was also written a long time ago, in 1987, when IETF's procedures were less stringent. but I'm not sure I see the conflict, anyway. an RFC-1036 message MUST parse as an RFC-822 message, but not the other way around. for instance, the Message-ID header has more restrictive syntax (no spaces allowed), but any RFC-1036 msg-id is allowed by RFC-822. similarily with References, where RFC-1036 only allows msg-ids separated by a single space, but RFC-822 also allows atoms and quoted-strings. > I agree; one of the two must yield. We have implemented a solution > based on RFC2616 yielding. Think OO; we import HTTP and override > its CharsetDefaulting method instead of throwing a > InvalidAccessException. :-) that's a new subclass, so it is no longer HTTP... :-) > I'll grant that the issue is debateable though. Ours is but one of > (at least) two valid interpretations. And I'm not even certain > everyone involved in the validator is in perfect agreement on this > either. The status quo is probably best described as the rough > consensus somewhat biased by what I percieved the least-harmfull / > overall-most-usefull behaviour was, given the circumstances. I propose this order explicit HTTP charset META HTTP-EQUIV charset attribute implicit HTTP (== ISO-8859-1) that's it. any guesswork should not influence parsing and status as valid/invalid. however, feel free to add big flashing warnings if the file starts with 0xFE 0xFF (or 0xFF 0xFE, ugh), or if _all_ 8-bit characters are part of valid UTF-8 encodings, etc. -- Kjetil T.
Received on Sunday, 8 June 2003 04:45:41 UTC