W3C home > Mailing lists > Public > www-validator@w3.org > June 2003

Re: default charset broken

From: Terje Bless <link@pobox.com>
Date: Sun, 8 Jun 2003 08:21:31 +0200
To: W3C Validator <www-validator@w3.org>
cc: Kjetil Torgrim Homme <kjetilho@ifi.uio.no>
Message-ID: <f02000001-1026-7266EB70997911D7B1DF0030657B83E8@[]>

Hash: SHA1

Kjetil Torgrim Homme <kjetilho@ifi.uio.no> wrote:

>my argument is that the recommendation is invalid, since they
>commit a layering violation, and contradict an RFC which was
>standard track at the time of publication.

No argument from me there. In fact I consider it a bug in HTML 4.0 that they
meddle with what is IMO the provenance of HTTP, and a bug in HTTP that they
meddle with what is the provenance of MIME. I would like to see that section
of HTML 4.0 exciced entirely, possibly replaced by one that says you should
use US-ASCII+CharEscapes OR UTF-8 with explicit labelling. If it were anywhere
near feasible I would also like to see RFC2616 superseded with snippage along
those same lines.

I don't make the rules, I just try to make them stick. :-)

>>>I'll reiterate: when it comes to specifying how HTTP works, the HTTP
>>>RFC trumps the HTML spec.
>>Says who? Even within the IETF you'd have trouble making that stick.
>do you really think so?  I find that very hard to believe, especially
>since HTML 4 isn't even an IETF standard.

And the W3C isn't, and doesn't claim to be, a recognized standards body. But
given this is the _W3C_ Markup Validator we kinda have to accept its authority
as given, non? :-)

But my point was that even if both documents were produced under the aegis of
the IETF, if HTML passed IETF Last Call with no substantive complaints then it
would have quite legally superseded this provisio from HTTP. If this was not
acceptable to the IETF, the Area Director or the RFC Editor should have
addressed the issue prior to publication as a standards track RFC.

Case in point; RFC1036 (netnews) manages the neat trick of saying a) that it
borrows a majority of its syntax from RFC822 (email), b) that where the two
diverge RFC822 is to be considered authorative, _and_ c) goes merrily on its
way superseding and modifying both syntax and semantics of common header
fields. RFC1036 is still considered authorative (albeit badly out of touch
with reality) within the IETF.

This situation appears to persist in USEFOR (son-of-1036bis) and RFC2822.

And just to be clear, I'm not saying this is as it should be; I'm saying this
is how it _is_, regardless of what I think of it, and that was the reason why
we chose to implement this area of the validator in this particular way.

>you agreed there is a contradiction between the two documents, and so
>one of the two must yield.  I think it is obvious that the established
>standard must have precedence.

I agree; one of the two must yield. We have implemented a solution based on
RFC2616 yielding. Think OO; we import HTTP and override its CharsetDefaulting
method instead of throwing a InvalidAccessException. :-)

I'll grant that the issue is debateable though. Ours is but one of (at least)
two valid interpretations. And I'm not even certain everyone involved in the
validator is in perfect agreement on this either. The status quo is probably
best described as the rough consensus somewhat biased by what I percieved the
least-harmfull/overall-most-usefull behaviour was, given the circumstances.

- -- 
"When you have no nails your hammer grows restless, and you begin to throw
 sideways glances at screws and pieces of string."    -- Jarkko Hietaniemi

Version: PGP SDK 3.0.2

Received on Sunday, 8 June 2003 02:21:36 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:37 UTC