Re: LINK TYPE=override/type from Jukka Korpela on 1998-01-22 (www-html@w3.org from January 1998)

From: Jukka Korpela <jkorpela@cc.hut.fi>
Date: Thu, 22 Jan 1998 09:49:07 +0200 (EET)
To: www-html@w3.org
Message-ID: <Pine.OSF.3.96.980122084918.1538A-100000@torvi.hut.fi>
On Wed, 21 Jan 1998, Neil St.Laurent wrote:

> A simple question about using
> <LINK REL="Something" HREF="somwhere" TYPE="content/type">
> Does the TYPE here override any returned type from the HTTP 
> connection?

Perhaps not so simple. The HTML 4.0 specification is very vague here.

It just says that the value of TYPE shall be a MIME type (Internet
media type), called "content type" in the HTML 4.0 context. In fact,
the description of the TYPE attribute of the LINK element only refers
to the IANA _registry_ of content types (and mentions that text/css is to
be accepted too:-), not to the real thing which consists of the relevant
RFCs such as RFC 2046 (which is mentioned elsewhere in the spec, though).

But there seems to be no statement about the real _meaning_ of giving
the TYPE attribute. Obviously, it should specify the true MIME type
of the referred resource. Questions:
a) Can (or should) a user agent trust this e.g. by ignoring such LINK
   elements where the specified MIME type is not supported by the user
   agent? Or should it (for http: URLs) at least do a HEAD request
   to check what the server says?
b) If a user agent detects MIME type mismatch between the TYPE value
   and the Content-Type header, what should it do? I'd say it should
   try to give some warning. But which type should it use to determine
   the processing method?
c) Are user agents required to check for type mismatches when possible?
I can see _no_ answer to these in the HTML 4.0 spec, but perhaps
I have missed something.

The _comments_ in the DTD have in principle no normative value, but
perhaps we should, in the absence of any normative guidance, that
the text "advisory content type" there means that the TYPE value
is advisory only. This _might_ be read as implying that answer to c) is
"no", answer to b) is to use the Content-Type header, and answer to
a) is that it cannot be trusted.

If this is so, I'm afraid I can't see the very idea of using TYPE
(when the HREF value is an HTTP URL).

> That is, considering
> <LINK REL="Stylesheet" HREF="neils.css" TYPE="text/css">
> 
> Say the server is setup incorrectly and returns 
> "application/octet-stream" as the type instead.  Does the user agent 
> use the TYPE from the LINK element as its type?  That is, will 
> "text/css" be the type regardless of what the server returns?

Someone might say that a server which conforms to HTTP specifications
must not return text/css since it is not a registered MIME type. :-)

This question is perhaps somewhat different from the general one,
since stylesheets are expected to be handled by the browser itself
if at all. It seems to me that the intended use of the TYPE attribute
is to allow a browser to pick up only those stylesheets which are
written in a language known to it, ignoring others. In principle,
there might be style sheet languages which clash in the sense that
a style sheet might syntactically conform to both but with different
semantics. In that case, the TYPE attribute value might be used to
select the correct interpretation; here, of course, the possibility of
mismatch with HTTP headers is possible, too.

Perhaps this problem might be solved by a formal statement - an official
interpretation - from the W3C.
_My_ suggestion is:
a) The TYPE value _may_ be used by a user agent to ignore resources
with a MIME type which is not supported by the user agent. A user
agent _may_ alternatively retrieve the actual resource and its
announced MIME type and act according to it.
b) In the case of detected mismatch between the TYPE value and
the MIME type with which the resource is actually served (determined
e.g. from HTTP headers), the latter is to be obeyed. A user agent
may, however, report the situation to the user and allow him to select
between the alternatives.
c) User agents are _not_ required to check that the TYPE value (if
present) and the MIME type with which the resource is served (if present)
actually match. It is however strongly recommended that such checks
are made and any mismatches are reported to the user. A user agent
may inspect the resource itself to decide which of the MIME types
is more plausible.

One might say that b) is impractical, since mismatches typically
result from servers not being configured properly, which is rather
common. I suppose most servers nowadays would still send a .css file
with Content-Type of text/plain or application/octet-stream, depending
on which is the default for unrecognized file name extensions.
In principle and in the long run, on the other hand, I would say that it
is better to trust the information attached to a resource itself than
information attached to a _reference_ to a resource. To take a simple
example: Let's assume that someone puts plain text files onto the Web,
later adds markup to them so that they become HTML files, without
changing the file names (since there might be a large number of links
around). Now, if someone has linked to them with <A TYPE="text/plain"
HREF=...>, should the browser still display them as plain text although
the server says Content-Type:text/html?

Yucca, http://www.hut.fi/u/jkorpela/
Received on Thursday, 22 January 1998 02:49:30 UTC