Re: Review of Content Transformation Guidelines Last Call? from Jo Rabin on 2008-09-04 (public-bpwg-comments@w3.org from July to September 2008)

From: Jo Rabin <jrabin@mtld.mobi>
Date: Thu, 04 Sep 2008 17:01:34 +0100
To: Mark Baker <distobj@acm.org>
CC: public-bpwg-comments <public-bpwg-comments@w3.org>
Message-ID: <48C0065E.5000305@mtld.mobi>
Hi Mark

Thanks for your comments. The group is now looking at the comments in 
detail, and I think it would help us if we could clarify/expand on some 
of your points below.

Thanks in advance
Jo

On 05/08/2008 22:37, Mark Baker wrote:
> Here's my comments.  In summary, the group really needs to decide
> whether this is a guidelines document, or a protocol.  It can't be
> both.  A lot of work remains.
> 

This is a key point. Our view was that this is a guidelines document, 
and we have deliberately tried to steer clear of defining a protocol or 
(as Mark Nottingham suggested later in his remarks) a profile.

It would help us if you would clarify in what way you think the document 
crosses the boundary between a protocol and guidelines for the use of 
HTTP for a particular class of application.

> 4.1.1 Applicable HTTP Methods
> 
> "Proxies should not intervene in methods other than GET, POST, HEAD and PUT."
> 
> I can't think of any good reason for that.  If a request using an
> extension method wants to avoid transformation, it can always include
> the no-transform directive.
> 
I think the reason is that transforming proxies would not know what to 
do with other methods so saying that they leave them alone seems to 
bring some clarity.

> 4.1.3 Treatment of Requesters that are not Web browsers
> 
> "Proxies must act as though a no-transform directive is present (see
> 4.1.2 no-transform directive in Request) unless they are able
> positively to determine that the user agent is a Web browser"
> 
> That seems both vague and arbitrary.  What is a Web browser?  What's
> the objective that this guideline is trying to meet?
We need a better way of expressing this. The idea is to prevent 
intervention in traffic that happens to use HTTP like XMLHttpRequest or 
applications that have nothing to do with Web browsing that find it 
convenient to use HTTP - that's quite common in the mobile space.

> 
> Aside: the use of RFC 2119 keywords here seems quite out of place.
> These are guidelines after all, no?  A "guideline" that uses "MUST" is
> more like a protocol.

Well, I think that just because they are guidelines doesn't mean the 
concept of conformance doesn't exist. But admittedly this may be at the 
heart of the the headline point you make above.


> 
> 4.1.5 Alteration of HTTP Header Values
> 
> RFC 2616 already says a lot about this. See sec 13.5.2 for example.

That section of RFC 2616 is referenced at the start of section 2 but I 
think the reference would benefit from being repeated here. Aside from 
that, are there other references we should quote?

> 
> "The theoretical idempotency of GET requests is not always respected
> by servers. In order, as far as possible, to avoid mis-operation of
> such content, proxies should avoid issuing duplicate requests and
> specifically should not issue duplicate requests for comparison
> purposes."
> 
> First of all, do you mean "safe" or "idempotent"?  That you refer only
> to GET suggests safety, but the second sentence suggests you are
> referring to idempotency.  So please straighten that out.  Oh, and
> there's nothing "theoretical" about GET's safety or idempotency; it's
> by definition, in fact.

Well, yes. We have struggled with this. Often, though, it's not 
side-effect free. Even if that just means that it makes it more 
difficult to track statistics. Among many, one of the practical 
difficulties that has been pointed out is, for example, linking from an 
email - where the method used will be GET irrespective of the purpose of 
the link.
> 
> Secondly, if the server changes something important because it
> received a GET request, then that's its problem.  Likewise, if it
> changes something non-idempotently because it received a PUT request,
> that's also something it has to deal with.  In both cases though, the
> request itself is idempotent (and safe with GET), so I see no merit to
> that advice that you offer ... unless of course the problem you refer
> to is pervasive which clearly isn't the case.
> 
All we are saying, I think, is that there is content in the wild today 
that mis-operates as a result of some current practices of transforming 
proxies.

> I also wonder if most of 4.1.5 shouldn't just defer to 2616.  As is,
> large chunks of this section (as well as others) specify a protocol
> which is a subset of HTTP 1.1.  (see also the RFC 2119 comment above)
> 
We've tried avoiding repeating RFC 2616 but will review this, see also 
next comment.

> I don't understand the need for 4.1.5.2.  The second paragraph in
> particular seems overly specific, as proxies should obviously not be
> retrying POST requests unless an error - any error - was received.
> PUT messages can be retried because they're idempotent.
> 
Call it "repetition for emphasis" if you like, but we are dealing in 
some cases with "wild west" behavior - which is causing a lot of 
problems in the field.

> The rest of the 4.1.5.* sections all seem to be basically "Here's some
> things that some proxies do".  By listing them, are you saying these
> are good and useful things, i.e. best practices?  If so, perhaps that
> should be made explicit.

We are trying to make the situation more tractable and more predictable 
for content providers. I agree that here and in other places we should 
be clearer that we do not recommend this but if it's going to happen 
then there are bounds on the way it should happen.

In essence, if transforming proxy vendors see it as part of their 
value-add to offer such features they are unlikely to be dissuaded from 
offering them. It's preferable, in my view, that they offer them in 
known ways so people understand what is going on than that there is a 
complete free for all.

> 
>>From 4.1.5.4, "When requesting resources that form part of the
> representation of a resource (e.g. style sheets, images), proxies
> should  make the request for such resources with the same headers as
> the request for the resource from which they are referenced.".  Why?
> There may be lots of reasons for using different headers on these
> requests.  For example, I'd expect the Accept header to be different
> for a stylesheet than for an image.  What are you trying to accomplish
> with this restriction?

The content provider needs to understand that the requests form part of 
the same representation. If the user agent header changes then it's 
likely that a content provider who has created specially crafted 
experiences for different classes of device would serve inconsistent 
parts of that experience.

[Incidentally, we did some tests for what actual user agents do in 
respect of the Accept header when retrieving stylesheets and images]

> 
> 4.1.5.5 defines a protocol.  This should be in an Internet Draft, not
> in a guidelines document.

We face a chicken and egg situation I believe. I think we have an urgent 
problem of needing to be able to represent original headers but only a 
"de facto" way of doing that at present. We point out in the scope for 
further work the need to put many things on a more established footing. 
If we can't recommend putting original headers on the request in any way 
then I am not sure what progress can be made to solving that urgent problem.

> 
> 4.2.2 "Servers must include a Cache-Control: no-transform directive if
> one is received in the HTTP request."  Why?  What does the
> transformability of a request body have to do with the
> transformability of the associated response body?
> 
It's a simple way for preserving the integrity of things like 
XMLHttpRequest originated requests and their responses.

> 4.3.2 "If the response includes a Warning: 214 Transformation Applied
> HTTP header, proxies must not apply further transformation. "  Why?
> The transformation indicated by the warning may have been the result
> of a server-side transformation which a client-side proxy may deem
> suboptimal, and so want to retransform.  I see no problem with that.
> 
Well the specific case you point to is out of scope of this document, 
which refers neither to server side nor to client side adaptation.

There is an issue with multiple proxies that in the present draft we did 
not feel we could address completely. It's relatively common for there 
to be an operator provided proxy and a search engine provided proxy in 
the path of the request and response. Having a second proxy re-transform 
what it believes to be a desktop experience, but which actually is a 
handset oriented experience is a recipe for muddle and dysfunction, I 
think.

Regards
Jo
Received on Thursday, 4 September 2008 16:02:31 UTC