ACTION-902 Summarise and prepare proposed resolutions on HTTPS link rewriting.

Hello Everyone

This has been stewing for a little while. Hope that it is not over-cooked.

Jo

Caveat - this is supposed to be a summary, so I'm not going to put all 
citations in and I won't quote all possible relevant points of view. If 
you feel that a relevant point of view has been omitted that materially 
alters the outcome, you should of course make your view known.

1. Summary of Correspondence

There has been considerable correspondence on this list over many 
months. The starting point being Eduardo's note [1] on 11 Nov and the 
thread that followed.

[1] http://lists.w3.org/Archives/Public/public-bpwg-ct/2008Nov/0025.html

On the 14th November Francois summarised some correspondence under the 
title "Thoughts on HTTPS Link Rewriting" [2].

[2] http://lists.w3.org/Archives/Public/public-bpwg-ct/2008Nov/0063.html

Eduardo sent a further note on 17th Nov, "[HTTPS] Thoughts on HTTPS 
Links rewriting" [3] and a thread followed.

[3] http://lists.w3.org/Archives/Public/public-bpwg-ct/2008Nov/0065.html

Under ACTION-893 Rob Finean posted under the subject of "Start putting 
together a set of guidelines that could help address the security issues 
triggered by links rewriting" [4] and a lengthy thread followed, which 
continued in February.

[4] http://lists.w3.org/Archives/Public/public-bpwg/2009Jan/0023.html

I'll draw your attention to a couple of posts on that thread (but 
encourage you to read all of them, as part of your review). On 20th Jan 
Eduardo posted an important note [5] summarising some earlier points.

[5] http://lists.w3.org/Archives/Public/public-bpwg/2009Jan/0045.html

I posted, as part of that thread, some commentary on the impact of IETF 
OPES on this subject [6], which should be considered also in the context 
of my "Response to LC-2097 on OPES" [7], which was adopted as a 
resolution on 2nd Dec [8] but which should be considered to be a 
re-opening of that question.

[6] http://lists.w3.org/Archives/Public/public-bpwg/2009Feb/0021.html
[7] http://lists.w3.org/Archives/Public/public-bpwg-ct/2008Nov/0045.html
[8] http://lists.w3.org/Archives/Public/public-bpwg-ct/2008Dec/0026.html

On the 28th Jan Francois, under a separate topic "CT: mandating respect 
of some heuristics" [9] introduced the idea that even though some 
content might heuristically have been determined to be "no-transform" it 
would still be acceptable to re-write links otherwise non-inline proxies 
would "loose the session".

[9] http://lists.w3.org/Archives/Public/public-bpwg/2009Jan/0080.html

2. Summary of Issues

Like most of the questions relating to Content Transformation, if you 
pull on one thread from the "ball of wool" you end up getting more than 
that thread. To my mind the three main issues that we need to address 
together are:

a) Does link re-writing count as transformation?

If it does then "non-inline" network proxies won't work. By inline proxy 
we mean proxies that traffic passes through irrespective of where it is 
addressed, from the user agent's point of view.

b) What provisos need to attach to link rewriting?

There are questions raised, for example by Thomas Roessler, about any 
kind of link rewriting, not just HTTPS link rewriting, relating to "same 
domain" rules on scripts and cookies. Thomas makes the point in respect 
of Web applications, but also I think there are more general concerns.

c) Under what circumstances is intercepting HTTPS acceptable?

[We don't really mean "intercepting HTTPS connections", we mean 
rewriting links that have HTTPS as the protocol]

Expressed opinions vary, ranging from "none" to "at users choice and 
where interrupting the security of HTTPS serves the users expedient 
balance of need and assessment of risk." The latter includes both 
situations where HTTPS is used for log in to services where the user 
doesn't think their confidentiality would be significantly compromised, 
e.g. WebMail to situations like being in a taxi on the way to the 
airport and having an urgent need to do something that has more exposure 
associated with it.

3. Analysis

Caveat: In the following we consider only network deployed proxies and 
not those that form part of the user agent, like Opera Mini, Skyfire 
etc., or those that form part of the content provider's server 
arrangements. These are and have always been out of scope. There is 
another class of transforming proxy - one which has been chosen by the 
user to mediate and transform, like Google, which are also out of scope 
but which possibly require more discussion.

a) Does link re-writing count as transformation?

The suggestion behind not considering it to be transformation is that 
link rewriting is somehow more benign than other forms of 
transformation. However, the reality is that it is a form of 
restructuring and hence less benign at least than recoding or 
compression. Indeed, as pointed out in discussion, link rewriting 
(leaving aside the consideration of HTTPS) has security implications. 
Although it may be possible to address these security implications this 
falls into an area of internal operation of the proxy (which is 
specifically out of scope) and the physical environment in which proxies 
operate, which is also out of scope (though not specifically stated to 
be so previously).

Link rewriting is transformation and should not have any less 
restrictions than any other transformation.

b) What provisos need to attach to link rewriting?

There is no specific mention in the document of the attendant security 
risks. There should be. It is an open question as to whether, in fact, 
link rewriting is required at all. As Eduardo has pointed out, 
in-network proxies that are configured inline do not need to do it.

There is therefore the opportunity for us to say that it is not allowed. 
This would limit the ways in which conforming proxies could be deployed 
in-network, but would not prevent the provision of transformation services.

My preferred option is that we say that a conforming CT proxy cannot 
re-write links. If it is not ruled out, then there is an open question 
as to whether specific concerns should be called out, normatively and in 
a conformance measurable way. If it is ruled out it makes the question 
of intercepting HTTPS somewhat moot.

It's worth noting under this section, that Google and other "user 
chosen" transforming proxies will not work if they can't rewrite links. 
They are not transforming proxies, from our perspective, they are 
gateways so they are not in scope of the document. This is possibly a 
shade of distinction that the man on the Bondi tram simply would not 
appreciate, so it's possible that we should revisit this scoping issue.

Consequently there is room for discussion as to whether such proxies are 
in or out of scope and what we would need to say differently for the two 
types of proxy.

Either way it would be worthwhile making a note as to the security 
issues discussed above in a non-normative way.

c) Under what circumstances is intercepting HTTPS acceptable?

CPs that are users of HTTPS may not realise they are prohibiting 
transformation which they might otherwise want or not mind. HTTP's 
semantics are not expressive enough to express the notion of "ideally 
I'd like this to be HTTPS, but where that conflicts with my other desire 
for, or at least accession to transformation I'll take the 
transformation (please)". Add to this the fact that no-transform does 
not work as an expression of preference in this case - as pointed out on 
several occasions by Eduardo - because the reference is not necessarily 
contained in a document that is under the control of the target of the 
https link.

Consequently, in the general case a transforming proxy cannot tell 
whether a server doesn't mind or is not aware that interception of links 
may happen. Even if a more expressive vocabulary were to exist, the base 
case which is today's environment would have to be addressed for 
backwards compatibility.

It is said that transforming proxies avoid intercepting HTTPS links for 
financial institutions. It would seem that operators of transforming 
proxies are aware of liabilities that they do not wish to assume.

In the general case the level of risk cannot be assessed, the server is 
not aware of what is going on and so it cannot be considered acceptable 
to intercept HTTPS - unless there is consent from the Content Provider 
to do so, as well as consent from the user.

Such CP consent would have to be expressed outside of the scope of HTTP 
signalling - and hence it would be impossible for it to be the subject 
of a conformance claim, since it would not be open to inspection by a 
third party.

I'd like to have us write that intercepting HTTPS links is not 
acceptable unless there is specific consent from the Content Provider to 
do so backed by consent from the user to do so. If that the case then 
the Content Provider would also have to provide consent to rewriting of 
non-https links, which modifies the previous section.

4. Conclusions and Proposed Resolutions

a. PROPOSED RESOLUTION: Link rewriting is a form of transformation and 
at a minimum is subject to the same limitations as other forms of 
transformation

b. PROPOSED RESOLUTION: In-network proxies MUST NOT rewrite links 
without explicit prior agreement from the Content Provider

c. PROPOSED RESOLUTION: Interception of HTTPS is not permissible without 
  explicit prior agreement from the Content Provider and consent from 
the user on a case by case basis

Received on Monday, 9 March 2009 16:37:54 UTC