Re: Type Attribute from Ernest Cline on 2003-11-17 (www-html@w3.org from November 2003)

From: Ernest Cline <ernestcline@mindspring.com>
Date: Sun, 16 Nov 2003 23:48:26 -0500
To: "Lachlan Hunt" <lhunt07@postoffice.csu.edu.au>, "W3C HTML List" <www-html@w3.org>
Message-ID: <410-220031111744826109@mindspring.com>
I've taken the time to make a thorough look at the type attribute.
I've reached somewhat different and considerably more detailed
conclusions than what I had before, which I explain in detail below.
These conclusions are:

1) The type attribute is not needed for resources retrieved using
   HTTP or other protocols that provide a mechanism to indicate
   the MIME type(s) of the resource.
2) For those protocols for which a type attribute is needed,
   a single valued type attribute containing but a single MIME type
   is sufficient. Thus in the interest of simplicity and consistency,
   the type attribute should keep its HTML4 format of a single type.
3) The type attribute when present should be used  to determine
   if the resource will be retrieved by the user agent.
4) XHTML2 should define what happens if a retrieved resource
   does not match the type attributed to it via the type attribute
   or other method.  At a minimum, a user agent must be able to
   provide an error message to the user and to present what
   would be presented, had the resource not been retrieved.
  Additional options as to what to do might be offered to the
  user if they so choose.

Please point out any place in my reasoning that is faulty.
I have made some assumptions, such as that by the time
XHTML2 actually starts to be implemented, that it can be
assumed that HTTP servers will all support HTTP 1.1,
that while they seem reasonable to me, might not be.

> [Original Message]
> From: Lachlan Hunt <lhunt07@postoffice.csu.edu.au>
>
>   Oskar Welzl wrote:
>
> >you see, the main difference between a descriptive HTML 4-@type
> >and the advisory/prescriptive @type in the XHTML 2.0 draft shows
> >when you consider 
> >
> ><span src="img.gif" type="image/png">hey?! what is it now??</span>
> >
> >let us assume the image is image/gif, not image/png. the author
> >simply made a mistake.
> 
>   I think this is the author's problem, not a problem with XHTML.  
> Author's need to take more care any way when writing XHTML 2.0,
> since its rules (particularly structure rules) are more strict
> than HTML was.  We definitely want to stay as far away from those 
> tag-soup-browser's style of parsing, and rendering, of HTML as
> much as possible.  So the above example should not be too much,
> if any, concern for XHTML 2.0.

Let me start off by taking a look at what would happen if type were
advisory and if it were prescriptive by asking some questions and
providing my answers to them.

* Why should we want an advisory type attribute?
To provide a way for the UA to offer a choice to the user of formats.
* Can this be achieved by other means?
An HTTP 1.1 OPTIONS done on the URL "image.gif" could work, but that
assumes that the URL uses the http: or https: protocols.  Some
protocols provide no way of gaining information about the resource
other than loading the resource and inspecting it.  However, any
protocol that offers multiple versions of a resource that are
referred to by the same URL should provide such information.

* Why should we want a prescriptive type attribute?
To enable the UA to load the resource only if it is capable of
handling the resource or to load a only a specific resource type.
* Can this be achieved by other means?
In HTTP limiting the resource to what the UA is acceptable can be done
by using the Accept-header.  In other protocols this usually requires
determining the type by either making a suffix to MIME type
correlation and/or by inspecting the resource's content.  If a
specific format is considered essential, that argues that the resource
should have a URL specific to that format, perhaps in addition to
a generic URL that handles any format.

HTTP Summary:  As far as HTTP is concerned, any conceivable use of the
type attribute can be achieved without resorting to the use of the type
attribute.  Therefore, the real benefit of the type attribute is for
use with other protocols, and any usage of the type attribute for HTTP
should be considered secondary and designed to meet the needs of
resource retrieval via non-HTTP protocols.

* What usage of the type attribute is the most useful for non-HTTP URLs?

First, are there any other protocols in use that allow as does HTTP,
for multiple versions of the same resource to use the same URL?  I am not
aware of any, but that is not the same as there being none.  However,
I think that it is safe to assume that these other protocols, if any,
must have some mechanism for selecting a specific version and for letting
the user agent know which versions are available.  Thus, just like HTTP,
the use of a type attribute is redundant with such a protocol and
therefore the type attribute should be chosen to support determining MIME
type info for protocols that provide no mechanism to determine this.

Only a restricted use case is left to consider: retrieval of a resource
via a protocol that provides but a single version of the resource per URL,
but no information on the type of the resource.

For any such protocol, a single MIME type is sufficient.  Multivalued
type attributes are redundant.  Not only that but consider this:

<span type="text/x-format1,text/x-format2" src="example.txt">
</span>

Suppose that example.txt meets the requirements to fit either MIME type
and that the user agent has different methods of presenting both types.
Which is to be preferred, assuming that the protocol in use provides
no information about the file type?
* q-values are specific to HTTP and thus not suitable for generic use.
* Preferring the type whose valid forms are a subset of the other only
  makes sense if one type is indeed a subset of the other.
* Position can already be handled via:

<span type="text/x-format1" src="example.txt">
 <span type="text/x-format2" src="example.txt">
 </span>
</span>

While the brevity of the first form is desirable, how often is such
a case really going to occur?  As I have already pointed out, the
type attribute is unnecessary for HTTP or similar protocols that
allow for multiple versions of the same resource to be referred to
by the same URL.  Rarely can a resource with a type other than text/*
be used for multiple MIME types.  

The question that remains is what role should a type attribute serve?

Well obviously the user agent should use the type to determine whether
to attempt to access the resource.  If it doesn't, then what point
in having a type attribute?

Once it has retrieved it, then if the resource is of the indicated type,
then the user agent obviously uses it, but what if as in the example
Oskar gave, the resource is not of its advertised type. I'll discuss
this below since it also applies to resources sent via HTTP with
an incorrect Content-Type.

non-HTTP summary: The type attribute should be used as a single valued
attribute used to determine if the resource should be accessed.

> > in HTML 4, it hardly matters. the UA will probably try to fetch the
> > file, anyway, with its default accept-header. no problem.
> 
>   This is very much like the browser saying "I don't care what you've 
> told me, I'll just do what I think is right".  i.e. Tag-soup style
> parsing!
>
> > according to XHTML 2.0 (may 2003 draft), the UA "must" change its
> > accept header to image/png only. (from the draft, 6.6: "The user
> > agent must combine this list it with its own list of acceptable
> > media
> > types by taking the intersection")
> 
>   Again, this is both the author's error and concern, not that of XHTML 
> and the browser.  When the author finds that a 406 response is being 
> returned, or at least sees that the image won't load, I'm sure the
> author will find and fix the problem (well... hopefully).
>
> > this example is to illustrate why the XHTML 2 way of using @type
> > is far from being "advisory" only. it's a firm 'must', not a
> > 'should better' or 'could'.
> 
>   What's the point of the attribute, if the browser essentially
> ignores it anyway, and just sends off it's request with it's default
> accept header?

There is another issue here, that is clouded by the argument over how
advisory type should be.  What if a resource is not of the type it
says it is either as result of the type attribute or the Content-Type
given in the response if HTTP is being used to get the resource?

There are four options I can see occurring here.

1) The user agent ignores the resource and does what it would have done
   had the resource not been accessed.
  (i.e., in Oskar's example, the alt text is presented.)

2) The user agent considers the resource invalid and presents an error
   message in some manner. It does not do what it would have done had
   the resource not been accessed.
  (i.e. in Oskar's example, only an error message is given.)

3) The user agent considers the resource invalid and presents an error
   message in some manner. It also does what it would have done had
   the resource not been accessed.
  (i.e., both the error message and the alt text are presented.)

4) The user agent tries to determine if it is a resource that it knows
   how to handle.  If it can handle it, it acts as if the "correct"
   type was given.  (i.e. In Oskar's example, it presents the GIF.)
   If it can't, it acts according one of the other three options.

The difficulty with option 4 is that it is possible that a resource
could be validly interpreted as any of several MIME types.  (This is
most likely to occur with the various subtypes of text/*, but it is
not impossible to construct examples involving wildly different file
types such as image/gif and text/rfc822.)  On the other hand this is
the behavior people have come to expect from user agents, to make
every attempt to resolve the "problem" and do something.  Not only that,
but option 4 gives the user agent the same behavior once the resource
has been retrieved, regardless of whether a type attribute has been
specified. so implementing option 4 would in one sense be the simplest
for a UA to implement.   Whether it would be the best is another
question, and one that I could agree with either answer.

However, even with option 4 used, a user agent will have to choose one
of the other three as its fallback option if it cannot determine the
type of the resource.  Which of the three should be used?

Personally, I prefer requiring user agents defaulting to support
option 3 but to allow them to offer the user a choice of option 1 or 2.
Option 3 allows the user to be informed that there is a problem in
getting the resource but to provide a version of the intended
information as well. If the user does not desire the full information,
that is their choice.
Received on Sunday, 16 November 2003 23:49:05 UTC