- From: Mike Kelly <mike@mykanjo.co.uk>
- Date: Sat, 17 Oct 2009 03:11:36 +0100
- To: Smylers <Smylers@stripey.com>, public-html@w3.org
Smylers wrote: > Mike Kelly writes: > > >> HTML does not currently provision for hyperlinks to indicate a >> specific content-type preference for the Accept header of a given >> request. >> > > That's true, but URLs are distributed in many ways other than as > hyperlinks in HTML documents, most of which don't have any way of > indicating that. > > >> This is an important feature for developers who wish to leverage HTTP >> content-negotiation, >> > > Surely even if HTML provided for this such developers would still be > hampered by URLs being passed around without content types (and users > not being used to them). For example URLs are commonly communicated > via: > > * plain-text e-mail messages > * instant messaging and Twitter messages > * URL-shortening services > * adverts on the side of buses > * T-shirts > > All of the above involve either a browser being passed a URL (sans > content type) from an external application, or the URL being entered by > a user. > > Any site which requires a certain content type to be supplied to serve > the desired content will be serving the wrong content. > A 'cold start' request to a URI out of the context of a particular application flow should revert to the UA's generic Accept header. The significance of a URI is to identify a resource. There isn't a situation where the server risks serving the 'wrong' content in response to a given request, provided server's conneg logic is sensible. There should not be too much confusion for a user if clicking a URI (or entering it into a location bar manually) causes a browser window to load an HTML page. A good server side implementation should be aware of, and appropriately accommodate for use cases in which a user may wish to progress from the default browser 'landing page' (HTML representation) to other available formats - these kinds of implementations are more than likely to be aware of this since conneg'd representations are involved in their design. I do agree that there would be a plain-text URI issue if the sender of the URI wished to specify a 'non-default' representation - however, this *trade-off* against the benefits should be at the developers discretion and in the context of a particular system - right now the choice is taken out of their hands. There are also client side solutions to this that could be introduced over the longer-term to mitigate this problem. > This feature would also break bookmarks: a user could bookmark a page's > URL, believing that the URL identifies that page, yet on later visiting > that bookmark being served different content. > > The browser should have the request object available when storing a bookmark, and could easily solve this type of issue by storing bookmarks as HTML documents. >> ... and require HTML hyperlinks that specify requests to URIs with a >> specific Accept header preference. There are use cases in which the >> distinction between a resource's representations are relevant to the >> flow of an html driven application, e.g. the difference to a browser >> between an atom and an html representation of a blog resource. >> >> <a href="/blog" type="text/html">My blog (HTML)</a> >> <a href="/blog" type="application/atom+xml">My blog (Atom Feed)</a> >> > > Many blogs seem to manage with different URLs for their HTML content and > their feeds, so this requirement can't apply to all blogs in general. > Please could you clarify precisely the situation which leads to this > requirement, where two separate URLs wouldn't work? > > Apologies, maybe the paragraph below this is not clear enough - It is not that using separate URIs "doesn't work", just that it may be a sub-optimal for a particular system that would benefit more from a strictly standardized distinction between resources and representations. A clear distinction between the two allows intermediaries to make valuable, automated assumptions about the significance of a request. Importantly - these assumptions are taken in light of the definitions outlined in the HTTP spec; increasing interop, and removing coupling between components. >> Without a formal mechanism in HTML which can specify to UAs the >> contextual content-type preference for a given hyperlink, HTML is not >> a viable hypermedia format for systems which must rigorously leverage >> HTTP conneg - this /could/ be achieved with representation specific >> URIs (i.e. format 'suffixes', URI parameters etc.) but there are >> situations in which conneg is a superior solution, particularly in >> terms of the system as a whole, taking into account intermediaries >> such as caches. >> > > In what way does it help for a cache to cache a blog's homepage and feed > labelled with the same URL compared with caching them with separate > URLs? A client retrieving one of them doesn't care whether the other > one happens to be cached; surely from the cache's point of view they are > entirely independent? > > The benefits are realized in terms of automated cache invalidation. Modifying a resource should automatically invalidate all of its representations. (http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.10) In a server side reverse proxy cache scenario (a common use case for large scale web applications); being able to rely on this automatic mechanism as a sole method of cache invalidation ensures that the cache is refreshed as infrequently and simply as possible, and that destination server usage is kept to a minimum. This kind of efficiency gain can dramatically reduce operating costs; particularly true in new 'pay-as-you-process' elastic computing infrastructures. If representations are treated as resources then an automatic cache invalidation mechanism is not viable and must be coupled to a specific application. E.g.: What, from the perspective of a cache invalidation mechanism, does POST /blog.html mean for the other 'representations' /blog.atom /blog.rss ..? Nothing! Because a cache will not recognize these are representations of the same resource since they are each identified as separate resources and given their own URI. If conneg is used, visibility is greatly increased and the cache can automatically invalidate all of the representations. E.g: POST /blog Content-Type: text/html .... would invalidate: /blog Content-Type: application/atom+xml Content-Type: application/rss+xml Content-Type: application/json >> It seems a shame that this, perfectly valid, use of HTTP is not >> allowed to system developers that must implement HTML driven >> applications. >> > > If HTML were to provide for this, it still wouldn't be usable because of > the uses of URLs outside of HTML. As such, implementing this feature > would be a disservice to HTML developers, misleading them into thinking > it's viable, whereas actually using separate URLs works better. > It's not a perfect solution to all problems - it's a trade-off. If highly-efficient automated caching is more valuable to your system than being able to avoid the highly risky world of plain text URIs and grumpy twitter users, then there is an obvious choice to be made. This trade-off can only be made in context, it doesn't make sense to try and govern this via the HTML5 spec. >> Furthermore - it does not seem that potential enabling solutions would >> cause incompatibility with existing HTML applications currently not >> concerned with conneg. >> > > Existing deployed browsers don't have this feature. If a developer were > to use HTML like you suggested above it may then work for him in his > browser, while making his blog's feed URL completely unavailable to > anybody with an older browser. > True, but my point was actually that if browsers suddenly began using the type attribute to modify their accept header - that shouldn't break any existing application. - Mike
Received on Saturday, 17 October 2009 02:12:13 UTC