Re: Review of Content Transformation Guidelines Last Call? from Jo Rabin on 2008-09-05 (public-bpwg-comments@w3.org from July to September 2008)

From: Jo Rabin <jrabin@mtld.mobi>
Date: Fri, 05 Sep 2008 15:12:30 +0100
To: Mark Baker <distobj@acm.org>
CC: public-bpwg-comments <public-bpwg-comments@w3.org>
Message-ID: <48C13E4E.9010101@mtld.mobi>
Mark

Thanks very much for your further comments. Some further in line.

thanks again
Jo

On 04/09/2008 21:24, Mark Baker wrote:
> Hi Jo,
> 
> On Thu, Sep 4, 2008 at 12:01 PM, Jo Rabin <jrabin@mtld.mobi> wrote:
>> Hi Mark
>>
... snip ...
> 
>>> 4.1.1 Applicable HTTP Methods
>>>
>>> "Proxies should not intervene in methods other than GET, POST, HEAD and
>>> PUT."
>>>
>>> I can't think of any good reason for that.  If a request using an
>>> extension method wants to avoid transformation, it can always include
>>> the no-transform directive.
>>>
>> I think the reason is that transforming proxies would not know what to do
>> with other methods so saying that they leave them alone seems to bring some
>> clarity.
> 
> In five years time, there might easily be another method in common
> use.  I presume you want these guidelines to be relevant then?
> 
My personal hope is that in 5 years time there will be new mechanisms 
available that makes these guidelines redundant. The main reason for 
advancing them today is that we have a mess today. In that time frame 
there might be systematic ways of harmonizing the interactions of 
transforming proxies, requesting and identifying content as having 
specific representation types (desktop, handheld etc.), denoting what 
types of transformation are permissible or not (rather than just saying 
it is, or it isn't) and so on.

>>> 4.1.3 Treatment of Requesters that are not Web browsers
>>>
>>> "Proxies must act as though a no-transform directive is present (see
>>> 4.1.2 no-transform directive in Request) unless they are able
>>> positively to determine that the user agent is a Web browser"
>>>
>>> That seems both vague and arbitrary.  What is a Web browser?  What's
>>> the objective that this guideline is trying to meet?
>> We need a better way of expressing this. The idea is to prevent intervention
>> in traffic that happens to use HTTP like XMLHttpRequest or applications that
>> have nothing to do with Web browsing that find it convenient to use HTTP -
>> that's quite common in the mobile space.
> 
> I don't know what you mean by "Web browsing" there.  Do you consider,
> say, GMail's use of XHR "Web browsing"?
> 
> Regardless, I think it's a fool's errand to try to distinguish XHR
> HTTP traffic from non-XHR HTTP traffic.  They're all just HTTP
> messages, and that's all a proxy ever sees.

"Fools step in where angels fear to tread", as they say. We will have to 
review whether this section should be removed, I think.

> 
>>> "The theoretical idempotency of GET requests is not always respected
>>> by servers. In order, as far as possible, to avoid mis-operation of
>>> such content, proxies should avoid issuing duplicate requests and
>>> specifically should not issue duplicate requests for comparison
>>> purposes."
>>>
>>> First of all, do you mean "safe" or "idempotent"?  That you refer only
>>> to GET suggests safety, but the second sentence suggests you are
>>> referring to idempotency.  So please straighten that out.  Oh, and
>>> there's nothing "theoretical" about GET's safety or idempotency; it's
>>> by definition, in fact.
>> Well, yes. We have struggled with this. Often, though, it's not side-effect
>> free. Even if that just means that it makes it more difficult to track
>> statistics.
> 
> I think those problems are very familiar to Web developers.  I don't
> think it needs to be called out.

You'd hope so, wouldn't you? In practice, though, the introduction of 
transforming proxies that do double dipping seems in ignorance of (or 
despite knowledge of) such things. So we do feel it worth calling out as 
a response to the situation we are reacting to.

> 
>> Among many, one of the practical difficulties that has been
>> pointed out is, for example, linking from an email - where the method used
>> will be GET irrespective of the purpose of the link.
> 
> I don't understand; "purpose of the link"?
Well a link on "Register for this Event" in an email might very well 
cause an automatic registration, a link on an ad will generate a 
click-through. Each of these different cases, today, uses a GET method. 
The convenience of hyperlinks of this kind in email bodies is probably 
sufficient, and common enough, to think that for most Web developers 
by-definition safety and idempotency is a far-away country of which they 
know little.

> 
>>> Secondly, if the server changes something important because it
>>> received a GET request, then that's its problem.  Likewise, if it
>>> changes something non-idempotently because it received a PUT request,
>>> that's also something it has to deal with.  In both cases though, the
>>> request itself is idempotent (and safe with GET), so I see no merit to
>>> that advice that you offer ... unless of course the problem you refer
>>> to is pervasive which clearly isn't the case.
>>>
>> All we are saying, I think, is that there is content in the wild today that
>> mis-operates as a result of some current practices of transforming proxies.
> 
> Life sucks 8-)  The wild is a big place, and there will always be
> misbehaving software.  Only if the problem behaviour is pervasive
> should we even consider accommodating it.

In the mobile world Transforming Proxies are becoming more and more 
common, to the point of being pervasive.

> 
>>> I don't understand the need for 4.1.5.2.  The second paragraph in
>>> particular seems overly specific, as proxies should obviously not be
>>> retrying POST requests unless an error - any error - was received.
>>> PUT messages can be retried because they're idempotent.
>>>
>> Call it "repetition for emphasis" if you like, but we are dealing in some
>> cases with "wild west" behavior - which is causing a lot of problems in the
>> field.
> 
> Then let's see those problems described in detail.  That would allow
> people to accommodate them as befits their needs. A blanket "don't do
> this" guideline only serves to cripple implementations going forward,
> and reduce the pressure on the producers of the broken software to fix
> it.
> 
I think we're on different tacks here. The transforming proxy software 
is not "broken" [in the sense in which I think you mean it] it fulfils 
its operators' needs. The "don't do this" stuff is intended to limit the 
damage that is done to unsuspecting and possibly unbroken servers.

>>>> From 4.1.5.4, "When requesting resources that form part of the
>>> representation of a resource (e.g. style sheets, images), proxies
>>> should  make the request for such resources with the same headers as
>>> the request for the resource from which they are referenced.".  Why?
>>> There may be lots of reasons for using different headers on these
>>> requests.  For example, I'd expect the Accept header to be different
>>> for a stylesheet than for an image.  What are you trying to accomplish
>>> with this restriction?
>> The content provider needs to understand that the requests form part of the
>> same representation. If the user agent header changes then it's likely that
>> a content provider who has created specially crafted experiences for
>> different classes of device would serve inconsistent parts of that
>> experience.
> 
> Ok, but I'm sure agents aren't going to change headers for the heck of
> it.  They're going to change them when they feel its important for
> them to communicate that information.  I really don't see what value
> this adds.

The user agents are not the ones changing the headers. It is the 
transforming proxies that are changing the headers.

> 
>>> 4.1.5.5 defines a protocol.  This should be in an Internet Draft, not
>>> in a guidelines document.
>> We face a chicken and egg situation I believe. I think we have an urgent
>> problem of needing to be able to represent original headers but only a "de
>> facto" way of doing that at present. We point out in the scope for further
>> work the need to put many things on a more established footing. If we can't
>> recommend putting original headers on the request in any way then I am not
>> sure what progress can be made to solving that urgent problem.
> 
> You can do it but it doesn't belong in a guidelines document, it
> belongs in an Internet Draft - probably experimental if you say it's
> de-facto (I've personally never seen it used) - vetted through the
> IETF process.  W3C WGs have crafted a number of I-Ds in the past.
> 
> It seems you're fine, because your charter says;
> 
> "There is no intent for the MWBP Working Group to develop new
> technology, such as markup languages. However if, during its work, the
> need for new technologies is identified, the group may raise
> requirements with other W3C groups or groups within other standards
> organisations."

Indeed, I expect we will, but first we need to establish a consensus of 
what we think is desirable and ameliorate a situation that will rapidly 
get worse if we don't do something now.
> 
>>> 4.2.2 "Servers must include a Cache-Control: no-transform directive if
>>> one is received in the HTTP request."  Why?  What does the
>>> transformability of a request body have to do with the
>>> transformability of the associated response body?
>>>
>> It's a simple way for preserving the integrity of things like XMLHttpRequest
>> originated requests and their responses.
> 
> XHR responses are sent by the server that served the script, so it's
> completely within the application developer's power to preserve
> integrity across requests and responses if they so desire.
> 
> Outside the context of XHR, that requirement unnecessarily restricts
> the kinds of transformations which proxies can make.

I'm not sure I understand your point. We are talking here specifically 
about the case where transforming proxies are deployed by third parties 
and carry out unspecified (as far as the content provider is concerned) 
transformations of their content. So the specific point is to restrict 
the kinds of transformations and to provide a way for a user agent to 
feel comfortable that the integrity of both the request and the response 
are restricted. It's not our intention to restrict the way that content 
providers might carry out transformation in their own proxies (which 
from our point of view we regard as part of the server) nor is it our 
intention to restrict the way that it is carried out as part of the user 
agent function.
> 
>>> 4.3.2 "If the response includes a Warning: 214 Transformation Applied
>>> HTTP header, proxies must not apply further transformation. "  Why?
>>> The transformation indicated by the warning may have been the result
>>> of a server-side transformation which a client-side proxy may deem
>>> suboptimal, and so want to retransform.  I see no problem with that.
>>>
>> Well the specific case you point to is out of scope of this document, which
>> refers neither to server side nor to client side adaptation.
> 
> I meant a server-side proxy; a surrogate.

As above.
> 
>> There is an issue with multiple proxies that in the present draft we did not
>> feel we could address completely. It's relatively common for there to be an
>> operator provided proxy and a search engine provided proxy in the path of
>> the request and response. Having a second proxy re-transform what it
>> believes to be a desktop experience, but which actually is a handset
>> oriented experience is a recipe for muddle and dysfunction, I think.
> 
> I'm sure there will be cases where the result would be less than
> ideal, but you can't rule out a perfectly valid form of HTTP message
> exchange just because that possibility exists.
> 
We are ruling out a behavior that is known to be harmful in the context 
of mobile web access and yes, in that context it is theoretically 
possible that something useful could come out of multiple 
transformations, but that's not the point. If that is what you want to 
do then you won't conform to these guidelines, I guess.

Jo
Received on Friday, 5 September 2008 14:13:36 UTC