Re: Feedback on content transformation guidelines from Jo Rabin on 2008-09-04 (public-bpwg-comments@w3.org from July to September 2008)

From: Jo Rabin <jrabin@mtld.mobi>
Date: Thu, 04 Sep 2008 21:39:57 +0100
To: Mark Nottingham <mnot@mnot.net>
CC: public-bpwg-comments@w3.org
Message-ID: <48C0479D.2040101@mtld.mobi>
Thanks very much for your comments, Mark, which we have now started to 
address in the Working Group. It would be helpful if we could understand 
a little more about your points below.

Thanks
Jo

On 29/08/2008 05:18, Mark Nottingham wrote:
> 
> My comments below. I agree with Mark Baker's comments, and have tried 
> not to repeat them here, although a few may have slipped through.

Please see my comments/questions to Mark at [1]

[1] 
http://lists.w3.org/Archives/Public/public-bpwg-comments/2008JulSep/0139.html

> 
> * Section 2.1 - "Alteration of HTTP requests and responses is not 
> prohibited by HTTP other than in the circumstances referred to in 
> [RFC2616 HTTP] Section 13.5.2."  This isn't true; section 14.9.5 needs 
> to be referenced here as well.

On re-reading I note that 14.9.5 refers to the entity body not being 
changeable as an implication of the headers in 13.5.2 not being 
changeable, so it seems that a reference to 14.9.5 would be a more 
complete chapter and verse on the subject.

fwiw the point being made here is that in various heated debates about 
the issue of content transformation it is often claimed that changing 
headers is a violation of HTTP, whereas, perhaps regrettably, that claim 
is not borne out by a reading of RFC 2616.

> 
> * Section 3.4 / 3.5 "A [Content|Transformation] Deployment conforms to 
> these guidelines if it follows the statements..."  What does "follows" 
> mean here -- if they conform to all MUST level requirements? SHOULD and 
> MUST?
> 
I agree. The language needs tightening.

> * Section 4.1.2 "If the request contains a Cache-Control: no-transform 
> directive proxies must forward the request unaltered to the server, 
> other than to comply with transparent HTTP behaviour and as noted 
> below."  I'm not sure what this sentence means.

At the moment it says "If the request contains a Cache-Control: 
no-transform directive proxies must forward the request unaltered to the 
server, other than to comply with transparent HTTP behavior and as noted 
below (see 4.1.6 Additional HTTP Headers)." but I think it should say 
"If the request contains a Cache-Control: no-transform directive proxies 
must forward the request unaltered to the server, other than to comply 
with transparent HTTP behavior and as noted below under 4.1.6 Additional 
HTTP Headers."

Would it make sense to you with that change? Our understanding is that 
transparency is defined by RFC 2616 per the quotation and reference in 
section 2.1 of our document.

> 
> * Section 4.1.3 "Proxies must act as though a no-transform directive is 
> present (see 4.1.2 no-transform directive in Request) unless they are 
> able positively to determine that the user agent is a Web browser."  How 
> do they positively" determine this? Using heuristics is far from a 
> guaranteed mechanism. Moreover, what is the reasoning behind this? If 
> the intent is to only allow transformation of content intended for 
> presentation to humans, it would be better to say that. In any case, 
> putting a MUST-level requirement on this seems strange.
> 
We need to reconsider this language and the intention of this section. 
My perspective is that HTTP is used for lots of different applications 
and at present some transforming proxies make no attempt to distinguish 
"browsing" - i.e. use of a Web browser for real time perception by 
humans, from any other application. That has the effect of making a 
total mess of those applications, as they don't work on transformed 
content. It's indeed a vexed question as to how one might determine the 
difference - and we'd be very grateful for any suggestions.

> * Section 4.1.4 "Proxies should follow standard HTTP procedures in 
> respect to caching..."  This seems a strange way to phrase it, and I 
> don't think it's useful to use RF2616 language here.

Well all we are saying is that transforming proxies are no different to 
any other proxies in the way that they treat caching, except for the 
pagination issue. It would be useful if you'd suggest wording that 
avoids any issues you see with the current wording.

> 
> * Section 4.1.5 Bullet points one and 3 are get-out-of-jail-free cards 
> for non-transparent proxies to ignore no-transform and do other 
> anti-social things. They should either be tightened up considerably, or 
> removed.
I'm not sure why you think that. The intention is that (in 1) if the 
request is rejected with a 406 status then it MAY try again with a 
different user agent (or other) headers. If a subsequent (200) response 
contains no-transform I don't see where the get-out-of-jail-free comes 
from as this clause says nothing about disobeying it.

What 3 is supposed to mean is that in order to avoid inconsistent 
representation states (as discussed in response to Mark Baker's comment 
on 4.1.5.4) if you start with a particular set of headers, then using a 
different set for the style sheets and images associated with a resource 
is likely to end up in a total mess, if the content provider is trying 
to provide a range of different representations of the same resource 
that work well on different devices. Since that is something we 
encourage it's a real problem.

> 
> * Section 4.1.5 What is a "restructured desktop experience"?

Per 2.2 bullet 2 for a definition of restructuring. A reference would be 
appropriate. The point here is that transformimg proxy vendors claim 
that it is reasonable for the user to select a proxy "restructured" 
experience of a Web site (taken from its desktop representation) rather 
than the customised handheld experience created by the Web site itself. 
The logic of this argument is that it is at least consistent with the 
ideals of user choice and also with the principles behind CSS allowing 
the user selection of their own style sheet.

> 
> * Section 4.1.5 "proxies should use heuristics including comparisons of 
> domain name to assess whether resources form part of the same "Web 
> site."  I don't think the W3C should be encouraging vendors to implement 
> yet more undefined heuristics for this task; there are several 
> approaches already in use (e.g., in cookies, HTTP, security context, 
> etc.); please pick one and refer to it specifically.

"I don't disagree". But it's also true that it is a vexed question as to 
what constitutes a Web site, which is the unit we are really interested 
in. We are not encouraging them to do so. They have to, in order to be 
at all effective in what they set out to do. All we are doing is 
pointing out that having set themselves the hard task of being an 
effective transforming proxy, one of the barriers they need to deal with 
is coming up with effective answers to this essentially unanswerable 
question.

> 
> * Section 4.1.5.1 Proxies (and other clients) are allowed to and do 
> reissue requests; by disallowing it, you're profiling HTTP, not 
> providing guidelines.
No doubt it is allowed in principle. But it causes serious operational 
difficulties when proxies duplicate every request, which is behavior we 
see "in the wild" and which we believe has no real justification.

> 
> * Section 4.1.5.2 Again, not specifying the heuristics is going to lead 
> to differences in behaviour, which will cause content authors to have to 
> account for this as well.
Well content authors have the option of offering a 406 status response 
which leaves little room for doubt. What we are trying to deal with here 
is a conforming transforming proxy dealing with a site that knows 
nothing about this and simply returns 200 "your browser is not 
recognised" and similar.

> 
> * Section 4.1.5.2 "A proxy must not re-issue a POST/PUT request..." Is 
> this specific to POST and PUT, or all requests with bodies, or...?

We limit ourselves to considering GET HEAD POST and PUT, per 4.1.1.

> 
> * Section 4.1.5.4 Use of the term 'representation' is confusing here; 
> please pick another one.
> 
Can you suggest something? The term "representation" has a certain favor 
as far as I know amongst those who refer especially to "the 
representation of a URI" and hence seems particularly aposite in this 
context from that perspective. I don't have any other ready vocabulary 
for expressing the idea of "the bits and bytes that result from 
dereferencing a URI in the context of a particular combination of 
request headers and other contextual information" which is what we are 
trying to say here.

> * Section 4.1.5.4 Using the same headers is often not a good idea. More 
> specific, per-header advice would be more helpful.

as I mention in the earlier response to Mark Baker the reason for this 
is to make sure that the same representation (sic) is as far as possible 
referred to.

> 
> * Section 4.1.5.5 This is specifying new protocol elements; this is 
> becoming a protocol, not guidelines.

Again, per my response to Mark Baker, this is recommending by way of 
Guidelines that the original headers are exposed in a "de facto" way 
(rather than not at all, for example).

> 
> * Section 4.1.6.1 When a proxy inserts the URI to make a claim of 
> conformance, exactly what are they claiming -- all must-level 
> requirements are met? Should-level? What is the use case for this 
> information?

To inform the server that if they behave in the manner specified in this 
document they can expect the outcomes specified. Otherwise they may 
decide to take different more drastic action, if they suspect that their 
content wont be treated in the manner specified.

> 
> * Section 4.2.1 Requiring servers to respond with 406 is profiling HTTP; 
> HTTP currently allows the server to send a 'default' representation even 
> when the headers say that the client doesn't prefer it.

I'm note sure what the distinction is between profiling HTTP and 
offering a recommendation that sending a 406 is a clear signal as to 
your intentions, whereas not doing so leaves your content open to 
unwanted transformation.

> 
> * Section 4.2.2 "Servers must include a Cache-Control: no-transform 
> directive if one is received in the HTTP request." Why?

Per response to Mark Baker. For XMLHttpRequest etc.

> 
> * Section 4.2.3.1 "Serves may base their actions on knowledge... but 
> should not choose an Internet content type for a response based on an 
> assumption or heuristics about behaiour of any intermediaries." Why not?

Because this leads to a spiral of second, third and so on guessing. Say 
what your content is and use the recommendations defined to control what 
happens to it. Don't say "it's X" if it isn't just because you know the 
client will tolerate the misstatement and you hope certain things about 
proxy behavior.

> 
> * Section 4.3.2 Why can't proxies transform something that has already 
> been transformed?
> 
Again per comment to Mark Baker - multiple proxies is beyond our scope 
to deal with. There needs to be a way of stopping a cycle of despair 
while we wait for hopefully better mechanisms for inter-proxy communication.

> * Section 4.3.3 Sniffing content for error messages is dangerous, and 
> also unlikely to work. E.g., will you sniff for all languages and all 
> possible phrases? How will you avoid false positives? Remove this 
> section and require content providers to get it right. People may still 
> do this in their products, but there's no reason to codify it.

Well its not recommended. Proxies may do this. And doing so successfully 
is something that they might offer as their USP. It's worth mentioning 
that this is common behavior.

> 
> * Section 4.3.4 What's the purpose behind this behaviour?
> 
There are circumstances in which altered request headers have been 
offered but in which it is inappropriate for the proxy to serve the 
content as the server would have served a different response had it been 
given the original headers.

Hope this helps and grateful for any further comments from you
Jo
Received on Thursday, 4 September 2008 20:40:56 UTC