- From: Ernest Cline <ernestcline@mindspring.com>
- Date: Sun, 16 Nov 2003 23:48:26 -0500
- To: "Lachlan Hunt" <lhunt07@postoffice.csu.edu.au>, "W3C HTML List" <www-html@w3.org>
I've taken the time to make a thorough look at the type attribute. I've reached somewhat different and considerably more detailed conclusions than what I had before, which I explain in detail below. These conclusions are: 1) The type attribute is not needed for resources retrieved using HTTP or other protocols that provide a mechanism to indicate the MIME type(s) of the resource. 2) For those protocols for which a type attribute is needed, a single valued type attribute containing but a single MIME type is sufficient. Thus in the interest of simplicity and consistency, the type attribute should keep its HTML4 format of a single type. 3) The type attribute when present should be used to determine if the resource will be retrieved by the user agent. 4) XHTML2 should define what happens if a retrieved resource does not match the type attributed to it via the type attribute or other method. At a minimum, a user agent must be able to provide an error message to the user and to present what would be presented, had the resource not been retrieved. Additional options as to what to do might be offered to the user if they so choose. Please point out any place in my reasoning that is faulty. I have made some assumptions, such as that by the time XHTML2 actually starts to be implemented, that it can be assumed that HTTP servers will all support HTTP 1.1, that while they seem reasonable to me, might not be. > [Original Message] > From: Lachlan Hunt <lhunt07@postoffice.csu.edu.au> > > Oskar Welzl wrote: > > >you see, the main difference between a descriptive HTML 4-@type > >and the advisory/prescriptive @type in the XHTML 2.0 draft shows > >when you consider > > > ><span src="img.gif" type="image/png">hey?! what is it now??</span> > > > >let us assume the image is image/gif, not image/png. the author > >simply made a mistake. > > I think this is the author's problem, not a problem with XHTML. > Author's need to take more care any way when writing XHTML 2.0, > since its rules (particularly structure rules) are more strict > than HTML was. We definitely want to stay as far away from those > tag-soup-browser's style of parsing, and rendering, of HTML as > much as possible. So the above example should not be too much, > if any, concern for XHTML 2.0. Let me start off by taking a look at what would happen if type were advisory and if it were prescriptive by asking some questions and providing my answers to them. * Why should we want an advisory type attribute? To provide a way for the UA to offer a choice to the user of formats. * Can this be achieved by other means? An HTTP 1.1 OPTIONS done on the URL "image.gif" could work, but that assumes that the URL uses the http: or https: protocols. Some protocols provide no way of gaining information about the resource other than loading the resource and inspecting it. However, any protocol that offers multiple versions of a resource that are referred to by the same URL should provide such information. * Why should we want a prescriptive type attribute? To enable the UA to load the resource only if it is capable of handling the resource or to load a only a specific resource type. * Can this be achieved by other means? In HTTP limiting the resource to what the UA is acceptable can be done by using the Accept-header. In other protocols this usually requires determining the type by either making a suffix to MIME type correlation and/or by inspecting the resource's content. If a specific format is considered essential, that argues that the resource should have a URL specific to that format, perhaps in addition to a generic URL that handles any format. HTTP Summary: As far as HTTP is concerned, any conceivable use of the type attribute can be achieved without resorting to the use of the type attribute. Therefore, the real benefit of the type attribute is for use with other protocols, and any usage of the type attribute for HTTP should be considered secondary and designed to meet the needs of resource retrieval via non-HTTP protocols. * What usage of the type attribute is the most useful for non-HTTP URLs? First, are there any other protocols in use that allow as does HTTP, for multiple versions of the same resource to use the same URL? I am not aware of any, but that is not the same as there being none. However, I think that it is safe to assume that these other protocols, if any, must have some mechanism for selecting a specific version and for letting the user agent know which versions are available. Thus, just like HTTP, the use of a type attribute is redundant with such a protocol and therefore the type attribute should be chosen to support determining MIME type info for protocols that provide no mechanism to determine this. Only a restricted use case is left to consider: retrieval of a resource via a protocol that provides but a single version of the resource per URL, but no information on the type of the resource. For any such protocol, a single MIME type is sufficient. Multivalued type attributes are redundant. Not only that but consider this: <span type="text/x-format1,text/x-format2" src="example.txt"> </span> Suppose that example.txt meets the requirements to fit either MIME type and that the user agent has different methods of presenting both types. Which is to be preferred, assuming that the protocol in use provides no information about the file type? * q-values are specific to HTTP and thus not suitable for generic use. * Preferring the type whose valid forms are a subset of the other only makes sense if one type is indeed a subset of the other. * Position can already be handled via: <span type="text/x-format1" src="example.txt"> <span type="text/x-format2" src="example.txt"> </span> </span> While the brevity of the first form is desirable, how often is such a case really going to occur? As I have already pointed out, the type attribute is unnecessary for HTTP or similar protocols that allow for multiple versions of the same resource to be referred to by the same URL. Rarely can a resource with a type other than text/* be used for multiple MIME types. The question that remains is what role should a type attribute serve? Well obviously the user agent should use the type to determine whether to attempt to access the resource. If it doesn't, then what point in having a type attribute? Once it has retrieved it, then if the resource is of the indicated type, then the user agent obviously uses it, but what if as in the example Oskar gave, the resource is not of its advertised type. I'll discuss this below since it also applies to resources sent via HTTP with an incorrect Content-Type. non-HTTP summary: The type attribute should be used as a single valued attribute used to determine if the resource should be accessed. > > in HTML 4, it hardly matters. the UA will probably try to fetch the > > file, anyway, with its default accept-header. no problem. > > This is very much like the browser saying "I don't care what you've > told me, I'll just do what I think is right". i.e. Tag-soup style > parsing! > > > according to XHTML 2.0 (may 2003 draft), the UA "must" change its > > accept header to image/png only. (from the draft, 6.6: "The user > > agent must combine this list it with its own list of acceptable > > media > > types by taking the intersection") > > Again, this is both the author's error and concern, not that of XHTML > and the browser. When the author finds that a 406 response is being > returned, or at least sees that the image won't load, I'm sure the > author will find and fix the problem (well... hopefully). > > > this example is to illustrate why the XHTML 2 way of using @type > > is far from being "advisory" only. it's a firm 'must', not a > > 'should better' or 'could'. > > What's the point of the attribute, if the browser essentially > ignores it anyway, and just sends off it's request with it's default > accept header? There is another issue here, that is clouded by the argument over how advisory type should be. What if a resource is not of the type it says it is either as result of the type attribute or the Content-Type given in the response if HTTP is being used to get the resource? There are four options I can see occurring here. 1) The user agent ignores the resource and does what it would have done had the resource not been accessed. (i.e., in Oskar's example, the alt text is presented.) 2) The user agent considers the resource invalid and presents an error message in some manner. It does not do what it would have done had the resource not been accessed. (i.e. in Oskar's example, only an error message is given.) 3) The user agent considers the resource invalid and presents an error message in some manner. It also does what it would have done had the resource not been accessed. (i.e., both the error message and the alt text are presented.) 4) The user agent tries to determine if it is a resource that it knows how to handle. If it can handle it, it acts as if the "correct" type was given. (i.e. In Oskar's example, it presents the GIF.) If it can't, it acts according one of the other three options. The difficulty with option 4 is that it is possible that a resource could be validly interpreted as any of several MIME types. (This is most likely to occur with the various subtypes of text/*, but it is not impossible to construct examples involving wildly different file types such as image/gif and text/rfc822.) On the other hand this is the behavior people have come to expect from user agents, to make every attempt to resolve the "problem" and do something. Not only that, but option 4 gives the user agent the same behavior once the resource has been retrieved, regardless of whether a type attribute has been specified. so implementing option 4 would in one sense be the simplest for a UA to implement. Whether it would be the best is another question, and one that I could agree with either answer. However, even with option 4 used, a user agent will have to choose one of the other three as its fallback option if it cannot determine the type of the resource. Which of the three should be used? Personally, I prefer requiring user agents defaulting to support option 3 but to allow them to offer the user a choice of option 1 or 2. Option 3 allows the user to be informed that there is a problem in getting the resource but to provide a version of the intended information as well. If the user does not desire the full information, that is their choice.
Received on Sunday, 16 November 2003 23:49:05 UTC