Re: Review of Content Transformation Guidelines Last Call? from Mark Baker on 2008-09-05 (public-bpwg-comments@w3.org from July to September 2008)

From: Mark Baker <distobj@acm.org>
Date: Fri, 5 Sep 2008 11:00:51 -0400
To: "Jo Rabin" <jrabin@mtld.mobi>
Cc: public-bpwg-comments <public-bpwg-comments@w3.org>
Message-ID: <e9dffd640809050800r44a014efh6953255619a3f2a3@mail.gmail.com>
Jo,

On Fri, Sep 5, 2008 at 10:12 AM, Jo Rabin <jrabin@mtld.mobi> wrote:
> Mark
>
> Thanks very much for your further comments. Some further in line.
>
> thanks again
> Jo
>
> On 04/09/2008 21:24, Mark Baker wrote:
>>
>> Hi Jo,
>>
>> On Thu, Sep 4, 2008 at 12:01 PM, Jo Rabin <jrabin@mtld.mobi> wrote:
>>>
>>> Hi Mark
>>>
> ... snip ...
>>
>>>> 4.1.1 Applicable HTTP Methods
>>>>
>>>> "Proxies should not intervene in methods other than GET, POST, HEAD and
>>>> PUT."
>>>>
>>>> I can't think of any good reason for that.  If a request using an
>>>> extension method wants to avoid transformation, it can always include
>>>> the no-transform directive.
>>>>
>>> I think the reason is that transforming proxies would not know what to do
>>> with other methods so saying that they leave them alone seems to bring
>>> some
>>> clarity.
>>
>> In five years time, there might easily be another method in common
>> use.  I presume you want these guidelines to be relevant then?
>>
> My personal hope is that in 5 years time there will be new mechanisms
> available that makes these guidelines redundant. The main reason for
> advancing them today is that we have a mess today. In that time frame there
> might be systematic ways of harmonizing the interactions of transforming
> proxies, requesting and identifying content as having specific
> representation types (desktop, handheld etc.), denoting what types of
> transformation are permissible or not (rather than just saying it is, or it
> isn't) and so on.

I understand, and hope that too, but it remains possible that in that
timeframe, a) a new method is in common use, and b) these guidelines
are still required.  For that reason alone, I think constraining the
list of methods is a bad idea.

>>>> "The theoretical idempotency of GET requests is not always respected
>>>> by servers. In order, as far as possible, to avoid mis-operation of
>>>> such content, proxies should avoid issuing duplicate requests and
>>>> specifically should not issue duplicate requests for comparison
>>>> purposes."
>>>>
>>>> First of all, do you mean "safe" or "idempotent"?  That you refer only
>>>> to GET suggests safety, but the second sentence suggests you are
>>>> referring to idempotency.  So please straighten that out.  Oh, and
>>>> there's nothing "theoretical" about GET's safety or idempotency; it's
>>>> by definition, in fact.
>>>
>>> Well, yes. We have struggled with this. Often, though, it's not
>>> side-effect
>>> free. Even if that just means that it makes it more difficult to track
>>> statistics.
>>
>> I think those problems are very familiar to Web developers.  I don't
>> think it needs to be called out.
>
> You'd hope so, wouldn't you? In practice, though, the introduction of
> transforming proxies that do double dipping seems in ignorance of (or
> despite knowledge of) such things. So we do feel it worth calling out as a
> response to the situation we are reacting to.

Now I'm confused.  Proxies should be able to retry certain requests as
I described.  The problem isn't with them, it's with the dumb
applications which change state on GET (for example).  Don't cripple
proxies because of a few bad apps.

>>> Among many, one of the practical difficulties that has been
>>> pointed out is, for example, linking from an email - where the method
>>> used
>>> will be GET irrespective of the purpose of the link.
>>
>> I don't understand; "purpose of the link"?
>
> Well a link on "Register for this Event" in an email might very well cause
> an automatic registration, a link on an ad will generate a click-through.

Ah, right.

> Each of these different cases, today, uses a GET method. The convenience of
> hyperlinks of this kind in email bodies is probably sufficient, and common
> enough, to think that for most Web developers by-definition safety and
> idempotency is a far-away country of which they know little.

That's not uncommon, I agree.  But I've also seen many cases where the
link brings up a page which correctly asks the user to confirm and
uses a POST form.  Besides, these change-state-on-GET links are nearly
never exposed to the wider Web - or if they are, the problem is very
quickly fixed as soon as all the search engine spiders hit them - so I
see little value in crippling transforming proxies.

! uncommon != common 8-)

>>>> Secondly, if the server changes something important because it
>>>> received a GET request, then that's its problem.  Likewise, if it
>>>> changes something non-idempotently because it received a PUT request,
>>>> that's also something it has to deal with.  In both cases though, the
>>>> request itself is idempotent (and safe with GET), so I see no merit to
>>>> that advice that you offer ... unless of course the problem you refer
>>>> to is pervasive which clearly isn't the case.
>>>>
>>> All we are saying, I think, is that there is content in the wild today
>>> that
>>> mis-operates as a result of some current practices of transforming
>>> proxies.
>>
>> Life sucks 8-)  The wild is a big place, and there will always be
>> misbehaving software.  Only if the problem behaviour is pervasive
>> should we even consider accommodating it.
>
> In the mobile world Transforming Proxies are becoming more and more common,
> to the point of being pervasive.

But the problem of changing-state-on-GET links isn't pervasive.

>>>> I don't understand the need for 4.1.5.2.  The second paragraph in
>>>> particular seems overly specific, as proxies should obviously not be
>>>> retrying POST requests unless an error - any error - was received.
>>>> PUT messages can be retried because they're idempotent.
>>>>
>>> Call it "repetition for emphasis" if you like, but we are dealing in some
>>> cases with "wild west" behavior - which is causing a lot of problems in
>>> the
>>> field.
>>
>> Then let's see those problems described in detail.  That would allow
>> people to accommodate them as befits their needs. A blanket "don't do
>> this" guideline only serves to cripple implementations going forward,
>> and reduce the pressure on the producers of the broken software to fix
>> it.
>>
> I think we're on different tacks here. The transforming proxy software is
> not "broken" [in the sense in which I think you mean it] it fulfils its
> operators' needs. The "don't do this" stuff is intended to limit the damage
> that is done to unsuspecting and possibly unbroken servers.

I meant that the pressure should be put on application developers to
fix their broken apps.  Of course, there's already lots of pressure
from the spiders.

>>>>> From 4.1.5.4, "When requesting resources that form part of the
>>>>
>>>> representation of a resource (e.g. style sheets, images), proxies
>>>> should  make the request for such resources with the same headers as
>>>> the request for the resource from which they are referenced.".  Why?
>>>> There may be lots of reasons for using different headers on these
>>>> requests.  For example, I'd expect the Accept header to be different
>>>> for a stylesheet than for an image.  What are you trying to accomplish
>>>> with this restriction?
>>>
>>> The content provider needs to understand that the requests form part of
>>> the
>>> same representation. If the user agent header changes then it's likely
>>> that
>>> a content provider who has created specially crafted experiences for
>>> different classes of device would serve inconsistent parts of that
>>> experience.
>>
>> Ok, but I'm sure agents aren't going to change headers for the heck of
>> it.  They're going to change them when they feel its important for
>> them to communicate that information.  I really don't see what value
>> this adds.
>
> The user agents are not the ones changing the headers. It is the
> transforming proxies that are changing the headers.

I understand.  But what that advice seems to be saying is that if a UA
retrieves some HTML that references a stylesheet, that the proxy has
to use the same headers that the UA used in the first request, on the
second request, independent of whether the UA felt it necessary to
change the headers for the second request.

e.g. if the UA sent these messages

GET /index.html
Host: example.com
Accept: text/html, application/xhtml+xml

followed by

GET /style.css
Host: example.com
Accept: text/css, text/xsl, application/xslt+xml

Then that text prescribes that the second message should really be;

GET /style.css
Host: example.com
Accept: text/html, application/xhtml+xml

I'm sure that was unintentional so some rewording seems in order.

>>
>>>> 4.2.2 "Servers must include a Cache-Control: no-transform directive if
>>>> one is received in the HTTP request."  Why?  What does the
>>>> transformability of a request body have to do with the
>>>> transformability of the associated response body?
>>>>
>>> It's a simple way for preserving the integrity of things like
>>> XMLHttpRequest
>>> originated requests and their responses.
>>
>> XHR responses are sent by the server that served the script, so it's
>> completely within the application developer's power to preserve
>> integrity across requests and responses if they so desire.
>>
>> Outside the context of XHR, that requirement unnecessarily restricts
>> the kinds of transformations which proxies can make.
>
> I'm not sure I understand your point. We are talking here specifically about
> the case where transforming proxies are deployed by third parties and carry
> out unspecified (as far as the content provider is concerned)
> transformations of their content. So the specific point is to restrict the
> kinds of transformations and to provide a way for a user agent to feel
> comfortable that the integrity of both the request and the response are
> restricted. It's not our intention to restrict the way that content
> providers might carry out transformation in their own proxies (which from
> our point of view we regard as part of the server) nor is it our intention
> to restrict the way that it is carried out as part of the user agent
> function.

Ok, so it seems you want a way to enable the user agent to request
that responses not be transformed.  That requires a new protocol
element (e.g. new header, new cache-control directive).

>>>> 4.3.2 "If the response includes a Warning: 214 Transformation Applied
>>>> HTTP header, proxies must not apply further transformation. "  Why?
>>>> The transformation indicated by the warning may have been the result
>>>> of a server-side transformation which a client-side proxy may deem
>>>> suboptimal, and so want to retransform.  I see no problem with that.
>>>>
>>> Well the specific case you point to is out of scope of this document,
>>> which
>>> refers neither to server side nor to client side adaptation.
>>
>> I meant a server-side proxy; a surrogate.
>
> As above.

Then ditto: if you want UAs to be able to request that content not be
retransformed, define a new HTTP extension.

>>
>>> There is an issue with multiple proxies that in the present draft we did
>>> not
>>> feel we could address completely. It's relatively common for there to be
>>> an
>>> operator provided proxy and a search engine provided proxy in the path of
>>> the request and response. Having a second proxy re-transform what it
>>> believes to be a desktop experience, but which actually is a handset
>>> oriented experience is a recipe for muddle and dysfunction, I think.
>>
>> I'm sure there will be cases where the result would be less than
>> ideal, but you can't rule out a perfectly valid form of HTTP message
>> exchange just because that possibility exists.
>>
> We are ruling out a behavior that is known to be harmful in the context of
> mobile web access

Can you point me to some evidence for that claim please?

Mark.
Received on Friday, 5 September 2008 15:01:29 UTC