- From: Mark Watson <watsonm@netflix.com>
- Date: Mon, 15 Dec 2014 08:39:36 -0800
- To: Mark Nottingham <mnot@mnot.net>
- Cc: Noah Mendelsohn <nrm@arcanedomain.com>, "www-tag@w3.org List" <www-tag@w3.org>
- Message-ID: <CAEnTvdCcCA0PtOjikhj1jFbYc6vqR-t+7xSTe=7qKTvCNmdjqQ@mail.gmail.com>
On Mon, Dec 8, 2014 at 7:43 PM, Mark Nottingham <mnot@mnot.net> wrote: > Hi Noah, > > > On 9 Dec 2014, at 11:57 am, Noah Mendelsohn <nrm@arcanedomain.com> > wrote: > > > > I'm really delighted to see you undertaking this: a very important topic > and just the sort of thing the TAG should be doing IMO. > > Thanks, I agree (obviously). > > > > I didn't see an indication of where comments should go, so I'll make two > here: > > On-list or in the repo's issues list are the natural places, I think. > > > > I. Caching and proxies > > > > I would love to see a really balanced analysis of whatever you discover > to be the key tradeoffs involving caching. E.g. where exactly will caching > capability likely be lost and in which such places will the loss be > painful? Will the continued need for caching lead to changes in deployment > of keys, certs and endpoints, if those are the right terms. In other words, > when will the need for caching resulting in a cache node acting as a > decrypting "man in the middle", when it might not otherwise. How about > things like deep packet inspection (which seems to have seem clearly > laudable uses, e.g. for routing incoming traffic and some more > controversial uses.) > > > > So many HTTP features and so much of the Web's early deployment focused > on making proxies and caching effective. No doubt that's become somewhat > less important as links have gotten cheaper and faster, but it would be > great to see a balanced exploration of the tradeoffs as they stand. If the > result of that analysis is that HTTPs is mostly practical and desirable, > then all the better. > > Very much agreed. There's a lot of data here, and I was reluctant to > overload the document with too much detail (yet). It might end up in a > separate document. > It would be good to have some clearer discussion of caching in the main document. Presently there is a reference to "content optimization", but it's not very clear whether this includes transparent caching. I think the impact of HTTPS on ISP transparent caching should be clearly acknowledged and the TAG should explain their rationale for accepting this as a consequence of the proposed transition. > > Some points that I find interesting, off the top of my head (apologies for > the dump): > > * It's long been observed that many aspects of shared Web caching roughly > follow a Zipf curve; there are a comparatively VERY small number of popular > cacheable responses creating the bulk of traffic, followed by a very long > tail. In the past ~two years, much of the "head" has already been > encrypted, with things like Facebook, Twitter, Google, Yahoo!, etc. taking > the lead. Anecdotal evidence suggests that shared cache hit rates have > fallen at least partially as a result of this (other possible factors: more > dynamic sites, decreasing trust in caches), since they're left with just > "tail." If we assume that those sites aren't going to be going back to > unencrypted connections (i.e., they're a dead loss), we're left with the > remaining sites, many of which don't get great service from shared caching > anyway (due to where they are on the curve). > So, one question to ask is whether encrypting the tail is going to be any > worse than what we've already seen in the head, from the standpoint of > getting value out of shared proxy caching. My suspicion is "not even close." > > * Much of that "head" encrypted traffic is still being cached, but by > reverse proxies (CDNs, "HTTP accelerators" and the like) rather than > traditional "forward" proxies. This trend has been going on for a much > longer time; content providers want to maintain control of their content, > and want repeatable performance; an intermediary deployed by them (or on > their behalf) does that, while an intermediary deployed on behalf of the > network acts on behalf of the network (sometimes doing things like caching > longer than the freshness lifetime, changing responses, etc.). > We have frequently observed transparent prox y caches which do not respect the HTTP specification , sometimes even modifying message bodies. Some of these things are deliberate, some are bugs or mis-configuration but either way, these things cause service problems, customer service calls etc. and are very hard to debug: site owners are left having to reverse engineer a black box in the ISPs network with only whatever diagnostics their clients return to them. ...Mark > In other words, I strongly suspect that the apparent loss of shared cache > efficiency in proxies is more than made up for by shared cache efficiency > in gateways (aka "reverse proxies" of various sorts) -- if you're just > worried about load on the origin server, its Internet connectivity and the > backhaul to wherever the reverse proxy is. > > * A major caveat here is locality to the end user. In the general case, a > forward proxy will be closer to the end user than a reverse proxy (although > there's a lot of variance on both sides), meaning it's saving stress on the > user's provider network more often. On the other hand, hit rates in the > former are usually top out at about 30%, whereas the latter see upwards of > 95% (or even 99%) in many cases. > > * Another caveat is locality in space+time; e.g., when everyone in an > office visits a Web page, or downloads some software (again, assuming that > the content is actually cacheable). However, in many cases this traffic > isn't served out of a proxy cache today (because one isn't deployed, or the > response isn't cacheable, or...). > > * After noticing the above, a natural thought is to consider schemes where > data is encrypted / signed and cached, perhaps discovered through some p2p > scheme. However, these invariably leak data about what's being browsed, and > are therefore probably a non-starter; this sort of approach has roughly the > same properties as SRI used for caching, in that you maintain integrity and > authentication, but lose confidentiality (unless you go down the route of > something like < > https://en.wikipedia.org/wiki/Private_information_retrieval>, but AFAIK > that's not anywhere near ready for production). > > It's attractive to consider introducing these with very limited scope > (e.g., explicit buy-in to shared caching on the origin side as well as the > client), but it makes things considerably more complex to do so (both > because you need something like markup support, as well as making the > security model more complex for the user). My gut feeling is that it'll be > difficult to get real value / network effects here. Would still love to see > an attempt. > > * The example of a village with poor access (e.g., in Africa) has > regularly been brought up in the IETF as an example of a population who > want shared caching, rather than encryption. The (very strong) response > from folks who have actually worked with and surveyed such people has just > as regularly been that many of these people value security and privacy more. > > * DPI and other proxy-ish (not cache) use cases are a completely different > thing -- what you're really asking about is the value of intermediation, > not just shared caching. One place to start here: < > http://tools.ietf.org/html/draft-hildebrand-middlebox-erosion-01>. Note > that the primary author is a member of the IAB, FWIW. > > * That leads pretty naturally to a discussion of the priority of > constituencies, as defined by HTML5 < > http://www.w3.org/TR/html-design-principles/#priority-of-constituencies> > -- it'd be interesting to apply here and maybe make it a wider discussion > among the W3C (we've already started putting our foot into this water in > the IETF: < > http://tools.ietf.org/html/draft-nottingham-stakeholder-rights-00>). > > * Finally, with all of that said - networks definitely have a role to > play, and there has been a fair amount of discussion in the IETF and > elsewhere as to how they can manage their costs and meet reasonable goals > without impinging upon security. This discussion is very much in its > infancy, and there are many tricky problems (e.g., setting sane defaults, > security user experience (or the lack thereof)). There are a number of ways > that such efforts might get traction, but I'm really reluctant to include > anything along these lines in the finding, both because we've already seen > a number of false starts, and because the process is turning out to be > (surprise) quite political. > > > > II. Privacy > > > > I also have the vague impression that there is a loss of privacy that > indirectly results from the reduced practicality of proxies, but I'm not > sure that intuition is correct. If there are privacy issues with the HTTPs > transition, that would be worth exploring too. > > Love to hear more if you can triangulate. > > > > Thank you. Good luck with this! > > Thanks! > > > > Noah > > > > On 12/8/2014 6:28 PM, Mark Nottingham wrote: > >> We've started work on a new Finding, to a) serve as a Web version of > the IAB statement, and b) support the work on Secure Origins in WebAppSec. > >> > >> See: <https://w3ctag.github.io/web-https/> > >> > >> Repo w/ issues list at <https://github.com/w3ctag/web-https>. > >> > >> Cheers, > >> > -- > Mark Nottingham https://www.mnot.net/ > > >
Received on Monday, 15 December 2014 16:40:04 UTC