Re: Review of Content Transformation Guidelines Last Call? from Mark Baker on 2008-09-04 (public-bpwg-comments@w3.org from July to September 2008)

From: Mark Baker <distobj@acm.org>
Date: Thu, 4 Sep 2008 16:24:05 -0400
To: "Jo Rabin" <jrabin@mtld.mobi>
Cc: public-bpwg-comments <public-bpwg-comments@w3.org>
Message-ID: <e9dffd640809041324u2447ff70t36c4203a5a2235a8@mail.gmail.com>
Hi Jo,

On Thu, Sep 4, 2008 at 12:01 PM, Jo Rabin <jrabin@mtld.mobi> wrote:
> Hi Mark
>
> Thanks for your comments. The group is now looking at the comments in
> detail, and I think it would help us if we could clarify/expand on some of
> your points below.

I'd be happy to ...

>
> Thanks in advance
> Jo
>
> On 05/08/2008 22:37, Mark Baker wrote:
>>
>> Here's my comments.  In summary, the group really needs to decide
>> whether this is a guidelines document, or a protocol.  It can't be
>> both.  A lot of work remains.
>>
>
> This is a key point. Our view was that this is a guidelines document, and we
> have deliberately tried to steer clear of defining a protocol or (as Mark
> Nottingham suggested later in his remarks) a profile.
>
> It would help us if you would clarify in what way you think the document
> crosses the boundary between a protocol and guidelines for the use of HTTP
> for a particular class of application.

The most obvious example is 4.1.5.5, because the document defines an
HTTP extension.  Other less obvious examples include 4.3.2 where the
document, in effect, redefines (by restriction) the meaning of the 214
warning, and 4.2.2 where the protocol that a server can use is
restricted (aka profiled).

>
>> 4.1.1 Applicable HTTP Methods
>>
>> "Proxies should not intervene in methods other than GET, POST, HEAD and
>> PUT."
>>
>> I can't think of any good reason for that.  If a request using an
>> extension method wants to avoid transformation, it can always include
>> the no-transform directive.
>>
> I think the reason is that transforming proxies would not know what to do
> with other methods so saying that they leave them alone seems to bring some
> clarity.

In five years time, there might easily be another method in common
use.  I presume you want these guidelines to be relevant then?

>> 4.1.3 Treatment of Requesters that are not Web browsers
>>
>> "Proxies must act as though a no-transform directive is present (see
>> 4.1.2 no-transform directive in Request) unless they are able
>> positively to determine that the user agent is a Web browser"
>>
>> That seems both vague and arbitrary.  What is a Web browser?  What's
>> the objective that this guideline is trying to meet?
>
> We need a better way of expressing this. The idea is to prevent intervention
> in traffic that happens to use HTTP like XMLHttpRequest or applications that
> have nothing to do with Web browsing that find it convenient to use HTTP -
> that's quite common in the mobile space.

I don't know what you mean by "Web browsing" there.  Do you consider,
say, GMail's use of XHR "Web browsing"?

Regardless, I think it's a fool's errand to try to distinguish XHR
HTTP traffic from non-XHR HTTP traffic.  They're all just HTTP
messages, and that's all a proxy ever sees.

>> "The theoretical idempotency of GET requests is not always respected
>> by servers. In order, as far as possible, to avoid mis-operation of
>> such content, proxies should avoid issuing duplicate requests and
>> specifically should not issue duplicate requests for comparison
>> purposes."
>>
>> First of all, do you mean "safe" or "idempotent"?  That you refer only
>> to GET suggests safety, but the second sentence suggests you are
>> referring to idempotency.  So please straighten that out.  Oh, and
>> there's nothing "theoretical" about GET's safety or idempotency; it's
>> by definition, in fact.
>
> Well, yes. We have struggled with this. Often, though, it's not side-effect
> free. Even if that just means that it makes it more difficult to track
> statistics.

I think those problems are very familiar to Web developers.  I don't
think it needs to be called out.

> Among many, one of the practical difficulties that has been
> pointed out is, for example, linking from an email - where the method used
> will be GET irrespective of the purpose of the link.

I don't understand; "purpose of the link"?

>>
>> Secondly, if the server changes something important because it
>> received a GET request, then that's its problem.  Likewise, if it
>> changes something non-idempotently because it received a PUT request,
>> that's also something it has to deal with.  In both cases though, the
>> request itself is idempotent (and safe with GET), so I see no merit to
>> that advice that you offer ... unless of course the problem you refer
>> to is pervasive which clearly isn't the case.
>>
> All we are saying, I think, is that there is content in the wild today that
> mis-operates as a result of some current practices of transforming proxies.

Life sucks 8-)  The wild is a big place, and there will always be
misbehaving software.  Only if the problem behaviour is pervasive
should we even consider accommodating it.

>> I don't understand the need for 4.1.5.2.  The second paragraph in
>> particular seems overly specific, as proxies should obviously not be
>> retrying POST requests unless an error - any error - was received.
>> PUT messages can be retried because they're idempotent.
>>
> Call it "repetition for emphasis" if you like, but we are dealing in some
> cases with "wild west" behavior - which is causing a lot of problems in the
> field.

Then let's see those problems described in detail.  That would allow
people to accommodate them as befits their needs. A blanket "don't do
this" guideline only serves to cripple implementations going forward,
and reduce the pressure on the producers of the broken software to fix
it.

>>> From 4.1.5.4, "When requesting resources that form part of the
>> representation of a resource (e.g. style sheets, images), proxies
>> should  make the request for such resources with the same headers as
>> the request for the resource from which they are referenced.".  Why?
>> There may be lots of reasons for using different headers on these
>> requests.  For example, I'd expect the Accept header to be different
>> for a stylesheet than for an image.  What are you trying to accomplish
>> with this restriction?
>
> The content provider needs to understand that the requests form part of the
> same representation. If the user agent header changes then it's likely that
> a content provider who has created specially crafted experiences for
> different classes of device would serve inconsistent parts of that
> experience.

Ok, but I'm sure agents aren't going to change headers for the heck of
it.  They're going to change them when they feel its important for
them to communicate that information.  I really don't see what value
this adds.

>> 4.1.5.5 defines a protocol.  This should be in an Internet Draft, not
>> in a guidelines document.
>
> We face a chicken and egg situation I believe. I think we have an urgent
> problem of needing to be able to represent original headers but only a "de
> facto" way of doing that at present. We point out in the scope for further
> work the need to put many things on a more established footing. If we can't
> recommend putting original headers on the request in any way then I am not
> sure what progress can be made to solving that urgent problem.

You can do it but it doesn't belong in a guidelines document, it
belongs in an Internet Draft - probably experimental if you say it's
de-facto (I've personally never seen it used) - vetted through the
IETF process.  W3C WGs have crafted a number of I-Ds in the past.

It seems you're fine, because your charter says;

"There is no intent for the MWBP Working Group to develop new
technology, such as markup languages. However if, during its work, the
need for new technologies is identified, the group may raise
requirements with other W3C groups or groups within other standards
organisations."

>> 4.2.2 "Servers must include a Cache-Control: no-transform directive if
>> one is received in the HTTP request."  Why?  What does the
>> transformability of a request body have to do with the
>> transformability of the associated response body?
>>
> It's a simple way for preserving the integrity of things like XMLHttpRequest
> originated requests and their responses.

XHR responses are sent by the server that served the script, so it's
completely within the application developer's power to preserve
integrity across requests and responses if they so desire.

Outside the context of XHR, that requirement unnecessarily restricts
the kinds of transformations which proxies can make.

>> 4.3.2 "If the response includes a Warning: 214 Transformation Applied
>> HTTP header, proxies must not apply further transformation. "  Why?
>> The transformation indicated by the warning may have been the result
>> of a server-side transformation which a client-side proxy may deem
>> suboptimal, and so want to retransform.  I see no problem with that.
>>
> Well the specific case you point to is out of scope of this document, which
> refers neither to server side nor to client side adaptation.

I meant a server-side proxy; a surrogate.

> There is an issue with multiple proxies that in the present draft we did not
> feel we could address completely. It's relatively common for there to be an
> operator provided proxy and a search engine provided proxy in the path of
> the request and response. Having a second proxy re-transform what it
> believes to be a desktop experience, but which actually is a handset
> oriented experience is a recipe for muddle and dysfunction, I think.

I'm sure there will be cases where the result would be less than
ideal, but you can't rule out a perfectly valid form of HTTP message
exchange just because that possibility exists.

>
> Regards
> Jo

Cheers,

Mark.
Received on Thursday, 4 September 2008 20:24:42 UTC