Re: Hyperlinks and content negotiation from Mike Kelly on 2009-10-19 (public-html@w3.org from October 2009)

From: Mike Kelly <mike@mykanjo.co.uk>
Date: Mon, 19 Oct 2009 11:20:07 +0100
To: public-html@w3.org
Message-ID: <4ADC3D57.70002@mykanjo.co.uk>
Smylers wrote:
> Mike Kelly writes:
>
>   
>> Smylers wrote:
>>
>>     
>>> Mike Kelly writes:
>>>
>>> Content negotiation could succeed if only those who know what they are
>>> doing touch it, that typical authors aren't somehow tempted to start
>>> playing with it.  That's possible, but not certain.  I don't know how
>>> we'd gather data either way.
>>>       
>> Content negotiation exists as a standardized feature in the HTTP spec.
>>     
>
> HTML5 is designed on what is useful or expedient in practice, not what
> is in other specs (though often there is a big overlap, because the
> other specs also match reality).
>   

Why?

It shouldn't be HTML's place to dictate what parts of protocols are 
deemed 'useful or expedient' - that is inefficient and socially 
irresponsible. If you have a problem with other protocols "in practice", 
then take that up with the relevant controlling bodies.

If you disagree with this please could you explain why such 
interventionism is within the best interests of web architecture, and/or 
the world in general.


> That HTTP has this feature doesn't matter; what matters is whether
> authors would use it, and whether its existence would cause harm.
>   

If it improves caching efficiency, many large scale web applications are 
likely to leverage it.

What are your suggestions as to why the existence of an *optional* 
attribute would cause harm? The argument "Some 'non-expert' authors 
might experiment with it and get it wrong because it's not something 
they have seen before" is very weak and could be applied to just about 
any proposed new feature.

>   
>> If there are aspects of HTTP that you think are unnecessary or wrong,
>> and need addressing - this should be taken up with the relevant bodies
>> controlling the HTTP spec - any other approach to 'solving' these
>> /perceived/ problems , regardless of intention, is bad (and
>> potentially damaging) governance practice.
>>     
>
> HTML5 covers features which are useful to web authors.  In some cases
> that involves interacting with or specifying features in other specs,
> such as HTTP.  But that HTML5 relies on HTTP it doesn't follow that
> HTML5 has to enable a way of using every feature that HTTP provides.
>
> Equally, nor does it follow that not providing for a particular HTTP
> feature in HTML5 is labelling it "unnecessary or wrong"; it's simply one
> which is deemed not to be relevant enough for HTML5.
>   

Efficient web architecture should be the primary objective - your 
attitude seems rather "isolationist", which is a shame given the level 
of responsibility HTML5 holds.

It may not be directly labelling it "unnecessary or wrong", but that is 
the implication if you have been provided with a relevant use case and 
example benefit(s). Regardless, I am sure you are well aware that you 
are rendering these features non-viable for the majority of web 
applications (which rely on HTML as the primary method of delivery) by 
not provisioning for them.


>   
>> Meanwhile; I think it would be most productive for HTML to recognize
>> its important role in driving HTTP applications, and look to provide
>> (where possible) standardized mechanisms by which developers can
>> leverage all relevant features of HTTP.
>>     
>
> Well obviously you think that would be productive, since you want the
> features!
>
> I can counter that by pointing out I think it would be unproductive.
>
> That doesn't really get us anywhere -- what you think and what I think
> is equally valid.
>
> That's why it's better to have data for this sort of thing: if a feature
> would be useful to many HTML authors, safe, backwards-compatible, etc
> then it can be added on its own merits, without needing to be tret as a
> special case for being in some other spec.
>   

HTTP isn't just "some other spec" - it's the key protocol for the web. 
That is not an 'appeal to authority'; it is blatant fact.

Web applications rely heavily on hypermedia - and, unfortunately, HTML 
happens to be the primary format for this. Hence why I wrote that HTML 
needs to "recognize its important role in driving HTTP applications".

>   
>> I think conneg is a relevant, valuable feature of HTTP that HTML5 is
>> capable of provisioning for, at relatively little risk/cost.
>>     
>
> In that case try to think of ways showing how valuable it would be, and
> how low the risk.
>
>   

I've shown how it can be used, and the caching efficiency it allows. 
It's makes more sense if you simply point to the reasons why introducing 
this would /not/ be 'low risk' - then we can discuss those concerns, and 
hopefully reach an agreement.

>>>> It is not that using separate URIs "doesn't work", just that it
>>>> may be a sub-optimal for a particular system that would benefit
>>>> more from a strictly standardized distinction between resources
>>>> and representations.  A clear distinction between the two allows
>>>> intermediaries to make valuable, automated assumptions about the
>>>> significance of a request.
>>>>         
>>> Please could you be more specific about these assumptions and their
>>> value.  HTML5 is designed by finding problems that need to be solved
>>> first, and then looking for solutions to those problems.
>>>
>>> (In this case it sounds like content negotiation may be the only
>>> solution to the particular problem, but for the rigor of the spec we
>>> don't want to add features without being sure what they are for and
>>> that they are the best way of solving the problem.)
>>>
>>>       
>>>>> In what way does it help for a cache to cache a blog's homepage
>>>>> and feed labelled with the same URL compared with caching them
>>>>> with separate URLs?
>>>>>           
>>>> The benefits are realized in terms of automated cache
>>>> invalidation.  Modifying a resource should automatically
>>>> invalidate all of its representations.
>>>>         
>>> Thanks -- that makes sense.  You mention "assumptions" in plural
>>> above, so I presume there are others?
>>>       
>> Plural for caches in the sense that various HTTP request methods could
>> cause invalidation (i.e. POST/PUT/DELETE)
>>     
>
> Surely that's true for all URLs -- that one method can cause
> invalidation for other methods applies just as much to a single page?
> It isn't specifically an advantage of two different formats of the same
> page sharing a URL.
>   

I think you may have misunderstood the point being made here

> So the problem you actually want to solve is to invalidate all formats
> of some content when any of them change.
>
> Content negotiation would solve that for content whose formats have
> different media types.  It doesn't solve it for content available in
> multiple formats, all of which are HTML (for example a long article
> which is available either paginated or as a single page, or content with
> a 'printer friendly' version, or content in multiple human languages).
>   

On pagination; that is an implementation detail - clearly there are ways 
you can identify resources which will make it hard to leverage a cache 
invalidation mechanism. It depends on the problem you are trying to 
solve, and whether compromise is possible - it's a trade off. E.g. A 
single/printer friendly page could be empty with JavaScript code that 
renders the whole document via Ajax calls to each page.

These are decisions that should be made in the context of a particular 
problem by developers.

On multiple human languages - there is Accept-Language header in HTTP 
for this purpose. That is an equally valid form of content negotiation; 
however it's practical use is more limited because translations are 
barely ever controlled mechanically by the server (i.e. one translation 
is not automatically generated from another). This is a situation in 
which it *does* make sense to identify them as separate resources, I'm 
happy to explore why this is the case in more detail if it is not 
immediately clear. If you felt that an equally strong case could be made 
for language (or even encoding) as for content-type then perhaps an 
additional attribute for that could also be considered?

>   
>>>> It's not a perfect solution to all problems - it's a trade-off.
>>>> If highly-efficient automated caching is more valuable to your
>>>> system than being able to avoid the highly risky world of plain
>>>> text URIs and grumpy twitter users, then there is an obvious
>>>> choice to be made.
>>>>         
>>> That sounds fair enough.  Do you have any evidence of the numbers of
>>> developers who would choose the cache-invalidation advantage over
>>> the plain-text URL advantage?
>>>       
>> No - but that is not at all surprising given that it isn't a viable
>> option right now!
>>     
>
> Evidence would include things like this being a common problem which web
> developers encounter and ask about on fora for suggestions of what to do
> about it.

Is there contention over whether caching is an important aspect of web 
architecture? It is not un-common for persisted data to be rendered by 
web applications into multiple formats.

> Or that developers have resorted to tracking these sorts of
> dependencies on the server-sid -- perhaps evidenced by the existence of
> libraries for doing this.
Databases and server side MVC frameworks ..?

http://couchdb.apache.org

http://guides.rubyonrails.org/getting_started.html#the-mvc-architecture

> Or a list of sites which publish the same
> content in multiple formats, which separate mime types, and where caches
> often have an up-to-date version of one format but incorrectly are still
> caching an old version of another.
>   

A better approach would be to evaluate it against the most efficient 
caching mechanisms that /are/ currently viable, and judging whether it 
does provide a significant increase in efficiency. Please could you 
suggest which of these mechanisms you think would be a good choice to 
evaluate against.

>> Is this even necessary if we are in agreement that the caching use
>> case makes sense, and has significant value?
>>     
>
> There are many problems which it would be nice for HTML to solve.  It
> can't solve all of them.  Bigger problems are more worth solving.
>
> "Significant value" is contentious; you can claim something is
> significant and somebody else can claim it's insignificant.  Whereas if
> you provide some data to back your claim, its significance is more a
> matter of fact than opinion.
>
>   

Again; I hadn't realised there was contention over the importance of 
caching!

I linked to the part of the HTTP protocol where the mechanism is 
defined/provisioned for. I gave a practical example of resource 
identification (single resource, multiple representations) that would 
allow such a mechanism to operate. There seems to be no other caching 
approaches that are as efficient as this - if you don't agree this is 
the case then please could you provide details of the alternative.

>>> Unfortunately HTML5 can't cater for every valid requirement, so
>>> generally doesn't add features that would be useful to only a very small
>>> number of authors (for example HTML5 doesn't add a <ship> element,
>>> despite some authors having a very valid requirement to distinguish
>>> names of ships on their pages; mentioning ships simply isn't common
>>> enough).
>>>       
>> I understand the point you are making
>> but don't feel that is a sensible or helpful comparison for this case.
>>     
>
> Yeah, I don't think it's a particularly good example either, but since
> you understood me anyway (thanks!) I don't have to think of a better
> one!
>   

I'm not the only person reading this - let's try and keep this as 
constructive as possible please.

Cheers,
Mike
Received on Monday, 19 October 2009 10:20:43 UTC