Re: [ACTION-603] Conversation with Yves, our HTTP expert, about CT and Cache-Control extensions from Francois Daoust on 2008-02-07 (public-bpwg-ct@w3.org from February 2008)

From: Francois Daoust <fd@w3.org>
Date: Thu, 07 Feb 2008 12:42:11 +0100
CC: public-bpwg-ct <public-bpwg-ct@w3.org>
Message-ID: <47AAEE93.6040701@w3.org>
Guys,

Sorry for the lengthy email, it started as a small reply to the thread, 
but grew out of control :(

The point is: I think the guidelines mostly cover the aspects that need 
to be covered, but we have to be more practical, in short, linking the 
directives to real mechanisms.


I agree (I guess) with Bryan: business imperatives and practical 
use-cases such as breaking large pages into small pieces should be taken 
into account by our guidelines. They are in scope with the document, the 
former because if we don't our guidelines won't be followed, the latter 
because we need to be clear as to what the CT-proxy can do to content.

I agree with Aaron: no matter how you call it, I think of the CT-proxy 
more as a remote add-on to the user's browser than as a proxy/gateway 
between the CP and the user.

As such:
1/ the fine-grained control of the CT-proxy behavior should be on the 
user's side. The CT-proxy should follow the CP's wishes whenever 
possible, but if the user wants to override them (see my third point 
below for restriction), he should be able to do so. When I use the 
User-Agent switcher extension in Firefox, servers are just unaware of it 
and can't do anything to prevent me from doing it.

2/ the control definitely needs to be on the user's side and not imposed 
by the carrier. That is, the user must have the right not to use it. I 
would prefer opt-in because most mobile phone users don't understand 
what CT is (identically, most users don't use the User-Agent switcher 
extension and/or Lynx) but I guess this guideline wouldn't be respected, 
so opt-out is OK.

3/ if we go for opt-out (even with opt-in actually), the common case 
will be users who have never heard of CT. I would thus restrict the 
cases where the user may override the CP's wishes to exceptional cases, 
in other words cases when the CT-proxy detects the content would crash 
the user's browser, and that's about it.

4/ since it's a REMOTE add-on, there is a limit to what the CT-proxy 
should be allowed to do when security comes in line. I'm really worried 
about the possibility to rewrite HTTPS links (which indeed looks like 
re-creating the "WAP gap" Jo mentioned). I would say the only exception 
to the rule would be if we're using WAP1, but the exception cannot be 
waived in that case anyway...

5/ If we don't view the CT-proxy as an add-on but really as a forwarding 
agent and stick to the vocabulary elements that exist in HTTP today, 
then our CT-proxy is a gateway, as noted in the CT landscape document:
http://www.w3.org/TR/2007/WD-ct-landscape-20071025/#terminologyNote
But it's even more than that in my mind, as you can't expect to have 
much control on a gateway. I doubt using extensions and/or new headers 
to manage CT options will ever be validated by someone. I may be wrong 
though.



Below is my view of what we should put in the guidelines. It does not 
cover all the points already in the guidelines but I didn't find any 
contradiction with remaining ones.


Define CT-proxy
---------------
We should copy the definition of the landscape document, and explain 
that from a practical point of view, the CT-proxy is a user browser's 
add-on (rephrased in an appropriate way, that is)


Between the user and the CT-proxy
---------------------------------

Here is a list of CT-options, extracted from current draft:
- allow-compress: default yes
- allow-recode: default yes
- allow-restructure: default yes
- preferred-medium: default handheld
- reload-untransformed(*): default yes
- allow-dangerous-fixes: default yes
- silent-mode: default no
- allow-https-rewrite: default no [and I strongly favor its pure removal...]
+ the possibility to define specific settings on a domain/page basis.

+ other settings? From a very practical point of view, do these settings 
cover transformations that are actually being done by existing 
CT-proxies? Breaking pages into small pieces?


[(*): about reload-untransformed, actually I have the feeling the option 
would be more useful if it told the Proxy to change the "caching" 
setting of images when they are sent back to the user. This way, the 
user's browser could use its own cache to store the images, and we would 
gain the costly HTTP round-trip to the CT-proxy. Well, that's another 
story.]

As our main target is legacy browsers, the only way for the Proxy and 
the user to interact is via a web page. This means the user's 
preferences, if kept, will have to be kept CT-proxy side. If the 
CT-proxy can't store these preferences, it sounds rather clumsy to 
present this amount of options to the user each time he requests a page, 
and wants to change the CT-proxy's behavior. In other words, if the 
CT-proxy can't store the user's preferences or if the user is anonymous, 
the CT-proxy should work with a mere on/off switch, and should probably 
better be off... I suppose most CT-proxy have access to a userID 
provided by the carrier and can store the user's preferences. Public 
CT-proxies can't do that, but it's probably OK not to focus on them for 
the time being.

So, no, between the user and the CT-proxy, I don't think there's any 
need for new HTTP headers and/or extensions (besides, all options don't 
fit as Cache-Control extensions, and having the options scattered all 
over the place is not a good idea). The CT-proxy should be configurable 
the same way any add-on would be: through settings.

The guidelines should list these options and leave the door open to new 
ones.


Between the CP and the CT-proxy
-------------------------------
We can't stick to the HTTP RFC, because it's not enough.

I still doubt there will be many cases when the CP wants a finer control 
over the CT-proxy than just switching it off. Cache-Control: 
no-transform to prevent CT seems enough for me. If we really want a 
finer control, HTTP Cache-Control extensions are required.

There is however a problem with the User-Agent (and the preferred medium 
directive, but this option as well doesn't strike me as deeply needed). 
The CT-proxy will change the User-Agent header. I wish it didn't but I 
guess there's no way to make do without it, is there?

Recommending the use of the HTTP Vary: User-Agent header is indeed a 
good idea but it cannot cover all the cases:
a/ the CT-Proxy would have to re-issue a second request on receipt of 
such a request, which makes it rather hard to count statistics from the 
CP's point of view. HTTP never ensures that statistics may be counted 
but from a business point of view, that's fairly unacceptable.

b/ in POST requests, a second request cannot be issued, because that 
would mean already posted data to the CP. Such content negotiation 
doesn't work in POST requests, the only workaround being to send a HEAD 
request to start with and the POST request afterward, but that doesn't 
sound realistic. [There's another way around using a HTTP 303 (see other 
location) response, but again, that's not implemented anywhere].

c/ the above b/ actually extends to some GET requests, because, even 
though GET requests should never be used to post data to a server, they 
are in practice (confirmation URIs that work once for instance).

The original user-agent needs to be available to the CP (because there 
would be no way to access mobile content otherwise...). And so, if the 
CT-proxy changes the User-Agent header, a new HTTP header is needed.

That being said, an (X-)Original-User-Agent header for instance has... 0 
chances of ever being accepted as a new HTTP header, because its purpose 
is limited to a "bad practice".

What might be accepted is a more generic header, something like:

(X-)Headers-Modified: User-Agent=[original UA], Accept=[original Accept]

... listing the headers and the original values before modification by a 
gateway (in practice our CT-proxy). The header is generic, it can be 
extended, and thus might be of some use in other cases than our bad CT use.

I prefixed headers with a (X-) because there will be no way to actually 
have a Headers-Modified header accepted before proving that it's being 
used. In practice, that means our guidelines could suggest the use of 
the X-Headers-Modified header, and when CT-proxies start using it, we 
would be able to claim for a real header. Unfortunately, this has to be 
a two-step process, but I guess it fits our charter in the sense that we 
would not recommend new technology, but suggest a new mechanism.

The same discussion would also apply I guess with Cache-Control 
extensions, but we should really be sure each one of them will be used 
in practice.


Summary
-------
Guidelines based on existing technologies.

One suggestion: the use of a new HTTP header, experimental at first, 
generic enough to be of some use to other WEB areas in the future.


Some questions
--------------
I/ does this view match that of the CT-proxy vendors? Any other 
operations that need to be addressed? Any other X- headers that are 
being used in the already existing CT-world? Would switching to only one 
be a solution that may be implemented?
II/ Heiko, you mentioned the need for a mobileok tag in headers for HEAD 
requests, is that really needed?
II/ what claims from CP should we take into account?
III/ Am I missing something?


Hope that helps, apologies otherwise ;)
François.




Sullivan, Bryan wrote:
> Jo,
> To clarify my comments, I was not referring to the question of whether 
> the CT proxy must *always* comply to CP directives. The group does need 
> to resolve that question. For me it's OK if we leave non-compliance to 
> whatever we decide as a CT proxy service provider decision (it will be, 
> anyway). I think most service providers would choose non-compliance if 
> it was necessary due to business imperatives.
>  
> I was speaking to the different question of what type of CT functions 
> are in scope. If a CP does not say "no-transform", I believe a CT proxy 
> should have the option of breaking large pages into small pieces that 
> are served locally, including emulation of Javascript etc. Do you agree 
> to that?
>  
> 
> Best regards,
> 
> Bryan Sullivan | AT&T
> 
> ------------------------------------------------------------------------
> *From:* Jo Rabin [mailto:jrabin@mtld.mobi]
> *Sent:* Wednesday, February 06, 2008 3:13 PM
> *To:* Sullivan, Bryan; public-bpwg-ct
> *Subject:* RE: [ACTION-603] Conversation with Yves, our HTTP expert, 
> about CT and Cache-Control extensions
> 
> I am sad to think that actually we have now run out of road on options 
> to create a more complete solution. We have known for some time that new 
> vocabulary is needed. We thought we might be able to do that while 
> remaining in scope of our charter. We can write a note describing the 
> characteristics of such a solution. We can’t as the BPWG write an RFC 
> describing how that solution is implemented.
> 
>  
> 
> I suggest that on the call tomorrow we discuss this and decide what kind 
> of Rec we want to produce bearing in mind that we are using only the 
> vocabulary elements that exist in HTTP today.  To my mind this also, 
> btw, resolves in favor of the CP when it comes to the question of “whose 
> preference counts”. As the CP has no graduation of vocabulary available 
> to them other than “leave it alone”.
> 
>  
> 
> Jo
> 
>  
> 
>  
> 
> ------------------------------------------------------------------------
> 
> *From:* public-bpwg-ct-request@w3.org 
> [mailto:public-bpwg-ct-request@w3.org] *On Behalf Of *Sullivan, Bryan
> *Sent:* 06 February 2008 22:41
> *To:* public-bpwg-ct
> *Subject:* RE: [ACTION-603] Conversation with Yves, our HTTP expert, 
> about CT and Cache-Control extensions
> 
>  
> 
> Jo,
> 
> The notion of a "gap" is only relevant to end-to-end security, thus for 
> non-secure page access is a non-issue.
> 
>  
> 
> For non-secure pages, whether we call the function one of a "gateway" or 
> "proxy", the question is whether W3C wants to address recommendations 
> for this degree of content transformation (e.g. breaking a big page up 
> into smaller pages served locally, emulating scripting, etc). For AT&T, 
> that is an important use-case and we support it being in scope for the 
> CT guidelines.
> 
>  
> 
> Best regards,
> 
> Bryan Sullivan | AT&T
> 
> ------------------------------------------------------------------------
> 
> *From:* Jo Rabin [mailto:jrabin@mtld.mobi]
> *Sent:* Wednesday, February 06, 2008 10:56 AM
> *To:* Aaron Kemp
> *Cc:* Sullivan, Bryan; public-bpwg-ct
> *Subject:* RE: [ACTION-603] Conversation with Yves, our HTTP expert, 
> about CT and Cache-Control extensions
> 
> Well, looks like we are on course to disagree again L
> 
>  
> 
> I am worried about the idea of a Transforming proxy being regarded as a 
> gateway precisely because of that kind of issue. (Not to mention 
> reintroducing the WAP Gap and so on)
> 
>  
> 
> Jo
> 
>  
> 
> ------------------------------------------------------------------------
> 
> *From:* Aaron Kemp [mailto:kemp@google.com]
> *Sent:* 06 February 2008 18:51
> *To:* Jo Rabin
> *Cc:* Sullivan, Bryan; public-bpwg-ct
> *Subject:* Re: [ACTION-603] Conversation with Yves, our HTTP expert, 
> about CT and Cache-Control extensions
> 
>  
> 
> On Feb 6, 2008 1:47 PM, Jo Rabin <jrabin@mtld.mobi 
> <mailto:jrabin@mtld.mobi>> wrote:
> 
>     I think the point is that no-transform is not a new lock.
> 
> Your previous comment was about adding finer grained bits to 
> no-transform (which would be new).
> 
> No-transform is only applicable if we treat these things as proxies 
> anyway -- I can argue they are more like user agents of their own, or 
> user agent extensions, which makes the no-transform not applicable.  
> It's more like a text mode browser (which won't adhere to the no-transform).
> 
> Aaron
>
Received on Thursday, 7 February 2008 11:42:20 UTC