RE: [ACTION-603] Conversation with Yves, our HTTP expert, about CT and Cache-Control extensions

Belatedly, put possibly in time for consideration on this afternoon's call:


In general I agree that we could drop signalling between the client/user and the CT proxy and say something mysterious about the proxy SHOULD offer the user control and options and we might want to discuss those options.

I think we should clarify that the client may send a no-transform directive. This would at least be helpful if any browser creators wanted to use it.


> Between the CP and the CT-proxy
> -------------------------------
> We can't stick to the HTTP RFC, because it's not enough.
> 
> I still doubt there will be many cases when the CP wants a finer control 
> over the CT-proxy than just switching it off. Cache-Control: 
> no-transform to prevent CT seems enough for me. If we really want a 
> finer control, HTTP Cache-Control extensions are required.

I think it is needed but I don't think we can do this, as you say.

> 
> There is however a problem with the User-Agent (and the preferred medium 
> directive, but this option as well doesn't strike me as deeply needed). 
> The CT-proxy will change the User-Agent header. I wish it didn't but I 
> guess there's no way to make do without it, is there?

Well, I think we should re-examine why it does this, and wonder which way round it should do it, if it does. If it is tasting content then should it taste as the original UA first? If it modifies the UA then it should add a component to the UA string of the form <Vendor>/CT-Proxy or something.

I think there are other options we should look at again. Like, if the CT-Proxy has a priori knowledge of the site's operation, from previous experience, white-lists and so on it should use that. Likewise, we can revisit the idea of having a robots.txt like indication (though we'd probably be politely asked to do it in POWDER). i.e. if you don't know anything about a site go look for a label somewhere that tells you useful things. The question is mainly where you should go. POWDER will have it that the label is linked to the retrieved resource, and what we are trying to avoid is double dipping of content. Maybe a special HEAD request will reveal this info.



> 
> Recommending the use of the HTTP Vary: User-Agent header is indeed a 
> good idea but it cannot cover all the cases:
> a/ the CT-Proxy would have to re-issue a second request on receipt of 
> such a request, which makes it rather hard to count statistics from the 
> CP's point of view. HTTP never ensures that statistics may be counted 
> but from a business point of view, that's fairly unacceptable.
> 
> b/ in POST requests, a second request cannot be issued, because that 
> would mean already posted data to the CP. Such content negotiation 
> doesn't work in POST requests, the only workaround being to send a HEAD 
> request to start with and the POST request afterward, but that doesn't 
> sound realistic. [There's another way around using a HTTP 303 (see other 
> location) response, but again, that's not implemented anywhere].

The labelling approach may help with this. i.e. the request is made somewhere else to tell you what to expect. 

> 
> c/ the above b/ actually extends to some GET requests, because, even 
> though GET requests should never be used to post data to a server, they 
> are in practice (confirmation URIs that work once for instance).
> 
> The original user-agent needs to be available to the CP (because there 
> would be no way to access mobile content otherwise...). And so, if the 
> CT-proxy changes the User-Agent header, a new HTTP header is needed.
> 
> That being said, an (X-)Original-User-Agent header for instance has... 0 
> chances of ever being accepted as a new HTTP header, because its purpose 
> is limited to a "bad practice".
> 
> What might be accepted is a more generic header, something like:
> 
> (X-)Headers-Modified: User-Agent=[original UA], Accept=[original Accept]
> 
> ... listing the headers and the original values before modification by a 
> gateway (in practice our CT-proxy). The header is generic, it can be 
> extended, and thus might be of some use in other cases than our bad CT use.

I feel very reluctant on this. But it may be the way to go. 

> 
> I prefixed headers with a (X-) because there will be no way to actually 
> have a Headers-Modified header accepted before proving that it's being 
> used. In practice, that means our guidelines could suggest the use of 
> the X-Headers-Modified header, and when CT-proxies start using it, we 
> would be able to claim for a real header. Unfortunately, this has to be 
> a two-step process, but I guess it fits our charter in the sense that we 
> would not recommend new technology, but suggest a new mechanism.
> 
> The same discussion would also apply I guess with Cache-Control 
> extensions, but we should really be sure each one of them will be used 
> in practice.
> 
> 
> Summary
> -------
> Guidelines based on existing technologies.
> 
> One suggestion: the use of a new HTTP header, experimental at first, 
> generic enough to be of some use to other WEB areas in the future.
>

I'd add that I agree we need to add something about not automatically rewriting HTTPS links and that what the nature of the warning is - is up to the porvider but should at least make it clear that their details become insecure if they accept the transform option, and that the information may be unusable if they don't But before offering this draconian option they need to be sure that the content isn't actually suitable. As pointed out you'd hope that financial insititutions etc would be among the early adopters of providing a made-for-mobile presence anyway.

Also, on the "ignore no-transform" option, I would like to see a note saying that the user should be alerted and given the choice.

I'd also like to see some discussion of heuristics applied to content (like link rel="handheld") to help the CT proxy infer things.

And the whole pandora's box of sites that redirect rather than adapt for different experiences still needs to be dealt with.

I'd also like us to say something about testing. At present there is no way a CP can realistically test what is going to happen to their content. (I think most of the other issues on Jo's CT Shopping List [1] are nearly dealt with)

[1] http://lists.w3.org/Archives/Public/public-bpwg-ct/2007Oct/0012.html


Jo

> -----Original Message-----
> From: public-bpwg-ct-request@w3.org [mailto:public-bpwg-ct-request@w3.org]
> On Behalf Of Francois Daoust
> Sent: 07 February 2008 17:14
> To: Robert Finean
> Cc: public-bpwg-ct@w3.org
> Subject: Re: [ACTION-603] Conversation with Yves, our HTTP expert, about
> CT and Cache-Control extensions
> 
> 
> Thanks for your comments!
> 
> 
> Re "2/", I hadn't thought about these cases, but there doesn't seem to
> be a real problem here. The wording should perhaps be updated, but even
> in the value-added services you mention, the user was "in control" in
> the sense that he agreed he would use them to get free or cheaper
> browsing... Same thing when I use an email or a browser desktop client
> that displays ads. I guess there could be lots of discussion on whether
> that's a good practice or not, but that's beyond the scope of our CT
> guidelines anyway since the working group "neither approves nor
> disapproves of Content Transformation", as stated in the Introduction of
> the document.
> 
> Magnus or Bryan, can you think of other situations where the CT-proxy
> escapes the user's control?
> 
> 
> Re "4/", I may be the only one willing to ban this completely. If
> everyone else agrees, I'll surrender... provided the user really is in
> charge there, no matter what the "global" terms & conditions may say.
> That's "just" a side guideline though, having it or not should not
> prevent us from grounding the guidelines in the real world.
> 
> 
> Re "5/", Yes, that's the reason why I suggested later in the email
> trying to promote one (or a restricted set of) more generic HTTP
> header(s). If we can prove that using it solves most of our issues, and
> show around that we agreed on something implemented for real, we'll have
> much more traction to have it accepted by the community. If we can't
> find any "generic" solution, it's going to be harder. (I realize it may
> look like trying to wrap something ugly in a shiny golden piece of paper
> though).
> 
> 
> François.
> 
> 
> 
> Robert Finean wrote:
> > +1 to Bryan's comments, particularly re point 4 - yes there should be
> clear indication to the end-user about CT inside HTTPS and a great deal of
> care implementing it but no there should not be a ban on it.
> >
> > Thanks,
> >
> > Robert
> >
> > -----Original Message-----
> > From: public-bpwg-ct-request@w3.org [mailto:public-bpwg-ct-
> request@w3.org] On Behalf Of Sullivan, Bryan
> > Sent: Thu 07 February 2008 16:02
> > To: public-bpwg-ct@w3.org
> > Subject: RE: [ACTION-603] Conversation with Yves, our HTTP expert, about
> CT and Cache-Control extensions
> >
> >
> > Francois,
> > Just a couple of comments:
> >
> > Re "2/ the control definitely needs to be on the user's side and not
> imposed by the carrier. That is, the user must have the right not to use
> it. ": within the simple objectives of the CT guidelines (usability,
> interoperability, efficiency), I can agree with you. Beyond those (e.g. in
> respect to value-added services such as content insertion and content
> filtering, which may be performed by the same proxy), the CT guidelines
> are not responsible for limiting proxy behavior (I don't call it a "CT
> proxy" there because those other value-adds are outside the "CT role" of
> the proxy as a "CT proxy"). Thus if an Operator offers a free browsing
> service that requires a header/footer be inserted on each page (and the
> user agrees to that service by buying it or by simply using the proxy),
> the user *may* not have the right to disable proxy behavior per the terms
> and conditions (T&C's: you hear us speak of them often) of service. But
> lack of "control privilege" in that case is not an aspe
> 
> ct of the CT proxy and does not affect proxy compliance to the CT
> guidelines.
> >
> > Re "4/ ...I'm really worried about the possibility to rewrite HTTPS
> links (which indeed looks like re-creating the "WAP gap" Jo mentioned)": I
> understand the reluctance but BPWG should realize that it's leaving a very
> significant use case off the table in that case. Users will not be able to
> access many secure services not specifically tailored to the mobile web.
> But I will note that AT&T, as a key contributor to the standardization of
> WAP2 (and the 1st US Operator to deploy it), did so specifically to help
> close the WAP gap and increase up-take of secure services. So we aren't
> really concerned about browser compatibility in secure services. We expect
> that providers of secure services are mobile-aware already, and are fine
> with TLS tunneling through the CT proxy.
> >
> > Re "5/ ...I doubt using extensions and/or new headers to manage CT
> options will ever be validated by someone.": I think these may make sense
> on the CP side (but *not* on the client side), and we can craft test
> server code to validate them pretty easily.
> 
> >
> > I will add further comments to your details later.
> >
> > Best regards,
> > Bryan Sullivan | AT&T
> > -----Original Message-----
> > From: public-bpwg-ct-request@w3.org [mailto:public-bpwg-ct-
> request@w3.org] On Behalf Of Francois Daoust
> > Sent: Thursday, February 07, 2008 3:42 AM
> > To: public-bpwg-ct@w3.org
> > Cc: public-bpwg-ct
> > Subject: Re: [ACTION-603] Conversation with Yves, our HTTP expert, about
> CT and Cache-Control extensions
> >
> >
> > Guys,
> >
> > Sorry for the lengthy email, it started as a small reply to the thread,
> but grew out of control :(
> >
> > The point is: I think the guidelines mostly cover the aspects that need
> to be covered, but we have to be more practical, in short, linking the
> directives to real mechanisms.
> >
> >
> > I agree (I guess) with Bryan: business imperatives and practical use-
> cases such as breaking large pages into small pieces should be taken into
> account by our guidelines. They are in scope with the document, the former
> because if we don't our guidelines won't be followed, the latter because
> we need to be clear as to what the CT-proxy can do to content.
> >
> > I agree with Aaron: no matter how you call it, I think of the CT-proxy
> more as a remote add-on to the user's browser than as a proxy/gateway
> between the CP and the user.
> >
> > As such:
> > 1/ the fine-grained control of the CT-proxy behavior should be on the
> user's side. The CT-proxy should follow the CP's wishes whenever possible,
> but if the user wants to override them (see my third point below for
> restriction), he should be able to do so. When I use the User-Agent
> switcher extension in Firefox, servers are just unaware of it and can't do
> anything to prevent me from doing it.
> >
> > 2/ the control definitely needs to be on the user's side and not imposed
> by the carrier. That is, the user must have the right not to use it. I
> would prefer opt-in because most mobile phone users don't understand what
> CT is (identically, most users don't use the User-Agent switcher extension
> and/or Lynx) but I guess this guideline wouldn't be respected, so opt-out
> is OK.
> >
> > 3/ if we go for opt-out (even with opt-in actually), the common case
> will be users who have never heard of CT. I would thus restrict the cases
> where the user may override the CP's wishes to exceptional cases, in other
> words cases when the CT-proxy detects the content would crash the user's
> browser, and that's about it.
> >
> > 4/ since it's a REMOTE add-on, there is a limit to what the CT-proxy
> should be allowed to do when security comes in line. I'm really worried
> about the possibility to rewrite HTTPS links (which indeed looks like re-
> creating the "WAP gap" Jo mentioned). I would say the only exception to
> the rule would be if we're using WAP1, but the exception cannot be waived
> in that case anyway...
> >
> > 5/ If we don't view the CT-proxy as an add-on but really as a forwarding
> agent and stick to the vocabulary elements that exist in HTTP today, then
> our CT-proxy is a gateway, as noted in the CT landscape document:
> > http://www.w3.org/TR/2007/WD-ct-landscape-20071025/#terminologyNote
> > But it's even more than that in my mind, as you can't expect to have
> much control on a gateway. I doubt using extensions and/or new headers to
> manage CT options will ever be validated by someone. I may be wrong
> though.
> >
> >
> >
> > Below is my view of what we should put in the guidelines. It does not
> cover all the points already in the guidelines but I didn't find any
> contradiction with remaining ones.
> >
> >
> > Define CT-proxy
> > ---------------
> > We should copy the definition of the landscape document, and explain
> that from a practical point of view, the CT-proxy is a user browser's add-
> on (rephrased in an appropriate way, that is)
> >
> >
> > Between the user and the CT-proxy
> > ---------------------------------
> >
> > Here is a list of CT-options, extracted from current draft:
> > - allow-compress: default yes
> > - allow-recode: default yes
> > - allow-restructure: default yes
> > - preferred-medium: default handheld
> > - reload-untransformed(*): default yes
> > - allow-dangerous-fixes: default yes
> > - silent-mode: default no
> > - allow-https-rewrite: default no [and I strongly favor its pure
> removal...]
> > + the possibility to define specific settings on a domain/page basis.
> >
> > + other settings? From a very practical point of view, do these settings
> > cover transformations that are actually being done by existing CT-
> proxies? Breaking pages into small pieces?
> >
> >
> > [(*): about reload-untransformed, actually I have the feeling the option
> > would be more useful if it told the Proxy to change the "caching"
> > setting of images when they are sent back to the user. This way, the
> > user's browser could use its own cache to store the images, and we would
> > gain the costly HTTP round-trip to the CT-proxy. Well, that's another
> > story.]
> >
> > As our main target is legacy browsers, the only way for the Proxy and
> > the user to interact is via a web page. This means the user's
> > preferences, if kept, will have to be kept CT-proxy side. If the
> > CT-proxy can't store these preferences, it sounds rather clumsy to
> > present this amount of options to the user each time he requests a page,
> > and wants to change the CT-proxy's behavior. In other words, if the
> > CT-proxy can't store the user's preferences or if the user is anonymous,
> > the CT-proxy should work with a mere on/off switch, and should probably
> > better be off... I suppose most CT-proxy have access to a userID
> > provided by the carrier and can store the user's preferences. Public
> > CT-proxies can't do that, but it's probably OK not to focus on them for
> > the time being.
> >
> > So, no, between the user and the CT-proxy, I don't think there's any
> > need for new HTTP headers and/or extensions (besides, all options don't
> > fit as Cache-Control extensions, and having the options scattered all
> > over the place is not a good idea). The CT-proxy should be configurable
> > the same way any add-on would be: through settings.
> >
> > The guidelines should list these options and leave the door open to new
> > ones.
> >
> >
> > Between the CP and the CT-proxy
> > -------------------------------
> > We can't stick to the HTTP RFC, because it's not enough.
> >
> > I still doubt there will be many cases when the CP wants a finer control
> > over the CT-proxy than just switching it off. Cache-Control:
> > no-transform to prevent CT seems enough for me. If we really want a
> > finer control, HTTP Cache-Control extensions are required.
> >
> > There is however a problem with the User-Agent (and the preferred medium
> > directive, but this option as well doesn't strike me as deeply needed).
> > The CT-proxy will change the User-Agent header. I wish it didn't but I
> > guess there's no way to make do without it, is there?
> >
> > Recommending the use of the HTTP Vary: User-Agent header is indeed a
> > good idea but it cannot cover all the cases:
> > a/ the CT-Proxy would have to re-issue a second request on receipt of
> > such a request, which makes it rather hard to count statistics from the
> > CP's point of view. HTTP never ensures that statistics may be counted
> > but from a business point of view, that's fairly unacceptable.
> >
> > b/ in POST requests, a second request cannot be issued, because that
> > would mean already posted data to the CP. Such content negotiation
> > doesn't work in POST requests, the only workaround being to send a HEAD
> > request to start with and the POST request afterward, but that doesn't
> > sound realistic. [There's another way around using a HTTP 303 (see other
> > location) response, but again, that's not implemented anywhere].
> >
> > c/ the above b/ actually extends to some GET requests, because, even
> > though GET requests should never be used to post data to a server, they
> > are in practice (confirmation URIs that work once for instance).
> >
> > The original user-agent needs to be available to the CP (because there
> > would be no way to access mobile content otherwise...). And so, if the
> > CT-proxy changes the User-Agent header, a new HTTP header is needed.
> >
> > That being said, an (X-)Original-User-Agent header for instance has... 0
> > chances of ever being accepted as a new HTTP header, because its purpose
> > is limited to a "bad practice".
> >
> > What might be accepted is a more generic header, something like:
> >
> > (X-)Headers-Modified: User-Agent=[original UA], Accept=[original Accept]
> >
> > ... listing the headers and the original values before modification by a
> > gateway (in practice our CT-proxy). The header is generic, it can be
> > extended, and thus might be of some use in other cases than our bad CT
> use.
> >
> > I prefixed headers with a (X-) because there will be no way to actually
> > have a Headers-Modified header accepted before proving that it's being
> > used. In practice, that means our guidelines could suggest the use of
> > the X-Headers-Modified header, and when CT-proxies start using it, we
> > would be able to claim for a real header. Unfortunately, this has to be
> > a two-step process, but I guess it fits our charter in the sense that we
> > would not recommend new technology, but suggest a new mechanism.
> >
> > The same discussion would also apply I guess with Cache-Control
> > extensions, but we should really be sure each one of them will be used
> > in practice.
> >
> >
> > Summary
> > -------
> > Guidelines based on existing technologies.
> >
> > One suggestion: the use of a new HTTP header, experimental at first,
> > generic enough to be of some use to other WEB areas in the future.
> >
> >
> > Some questions
> > --------------
> > I/ does this view match that of the CT-proxy vendors? Any other
> > operations that need to be addressed? Any other X- headers that are
> > being used in the already existing CT-world? Would switching to only one
> > be a solution that may be implemented?
> > II/ Heiko, you mentioned the need for a mobileok tag in headers for HEAD
> > requests, is that really needed?
> > II/ what claims from CP should we take into account?
> > III/ Am I missing something?
> >
> >
> > Hope that helps, apologies otherwise ;)
> > François.
> >
> >
> >
> >
> > Sullivan, Bryan wrote:
> >> Jo,
> >> To clarify my comments, I was not referring to the question of whether
> >> the CT proxy must *always* comply to CP directives. The group does need
> >> to resolve that question. For me it's OK if we leave non-compliance to
> >> whatever we decide as a CT proxy service provider decision (it will be,
> >> anyway). I think most service providers would choose non-compliance if
> >> it was necessary due to business imperatives.
> >>
> >> I was speaking to the different question of what type of CT functions
> >> are in scope. If a CP does not say "no-transform", I believe a CT proxy
> >> should have the option of breaking large pages into small pieces that
> >> are served locally, including emulation of Javascript etc. Do you agree
> >> to that?
> >>
> >>
> >> Best regards,
> >>
> >> Bryan Sullivan | AT&T
> >>
> >> -----------------------------------------------------------------------
> -
> >> *From:* Jo Rabin [mailto:jrabin@mtld.mobi]
> >> *Sent:* Wednesday, February 06, 2008 3:13 PM
> >> *To:* Sullivan, Bryan; public-bpwg-ct
> >> *Subject:* RE: [ACTION-603] Conversation with Yves, our HTTP expert,
> >> about CT and Cache-Control extensions
> >>
> >> I am sad to think that actually we have now run out of road on options
> >> to create a more complete solution. We have known for some time that
> new
> >> vocabulary is needed. We thought we might be able to do that while
> >> remaining in scope of our charter. We can write a note describing the
> >> characteristics of such a solution. We can't as the BPWG write an RFC
> >> describing how that solution is implemented.
> >>
> >>
> >>
> >> I suggest that on the call tomorrow we discuss this and decide what
> kind
> >> of Rec we want to produce bearing in mind that we are using only the
> >> vocabulary elements that exist in HTTP today.  To my mind this also,
> >> btw, resolves in favor of the CP when it comes to the question of
> "whose
> >> preference counts". As the CP has no graduation of vocabulary available
> >> to them other than "leave it alone".
> >>
> >>
> >>
> >> Jo
> >>
> >>
> >>
> >>
> >>
> >> -----------------------------------------------------------------------
> -
> >>
> >> *From:* public-bpwg-ct-request@w3.org
> >> [mailto:public-bpwg-ct-request@w3.org] *On Behalf Of *Sullivan, Bryan
> >> *Sent:* 06 February 2008 22:41
> >> *To:* public-bpwg-ct
> >> *Subject:* RE: [ACTION-603] Conversation with Yves, our HTTP expert,
> >> about CT and Cache-Control extensions
> >>
> >>
> >>
> >> Jo,
> >>
> >> The notion of a "gap" is only relevant to end-to-end security, thus for
> >> non-secure page access is a non-issue.
> >>
> >>
> >>
> >> For non-secure pages, whether we call the function one of a "gateway"
> or
> >> "proxy", the question is whether W3C wants to address recommendations
> >> for this degree of content transformation (e.g. breaking a big page up
> >> into smaller pages served locally, emulating scripting, etc). For AT&T,
> >> that is an important use-case and we support it being in scope for the
> >> CT guidelines.
> >>
> >>
> >>
> >> Best regards,
> >>
> >> Bryan Sullivan | AT&T
> >>
> >> -----------------------------------------------------------------------
> -
> >>
> >> *From:* Jo Rabin [mailto:jrabin@mtld.mobi]
> >> *Sent:* Wednesday, February 06, 2008 10:56 AM
> >> *To:* Aaron Kemp
> >> *Cc:* Sullivan, Bryan; public-bpwg-ct
> >> *Subject:* RE: [ACTION-603] Conversation with Yves, our HTTP expert,
> >> about CT and Cache-Control extensions
> >>
> >> Well, looks like we are on course to disagree again L
> >>
> >>
> >>
> >> I am worried about the idea of a Transforming proxy being regarded as a
> >> gateway precisely because of that kind of issue. (Not to mention
> >> reintroducing the WAP Gap and so on)
> >>
> >>
> >>
> >> Jo
> >>
> >>
> >>
> >> -----------------------------------------------------------------------
> -
> >>
> >> *From:* Aaron Kemp [mailto:kemp@google.com]
> >> *Sent:* 06 February 2008 18:51
> >> *To:* Jo Rabin
> >> *Cc:* Sullivan, Bryan; public-bpwg-ct
> >> *Subject:* Re: [ACTION-603] Conversation with Yves, our HTTP expert,
> >> about CT and Cache-Control extensions
> >>
> >>
> >>
> >> On Feb 6, 2008 1:47 PM, Jo Rabin <jrabin@mtld.mobi
> >> <mailto:jrabin@mtld.mobi>> wrote:
> >>
> >>     I think the point is that no-transform is not a new lock.
> >>
> >> Your previous comment was about adding finer grained bits to
> >> no-transform (which would be new).
> >>
> >> No-transform is only applicable if we treat these things as proxies
> >> anyway -- I can argue they are more like user agents of their own, or
> >> user agent extensions, which makes the no-transform not applicable.
> >> It's more like a text mode browser (which won't adhere to the no-
> transform).
> >>
> >> Aaron
> >>
> >
> >
> >
> >

Received on Tuesday, 19 February 2008 13:46:39 UTC