RE: HTTP2 Expression of Interest

Hi Willy,

Thank you for your reply.

I should note that Facebook's EOI was written by myself and Brian Pane; Brian
is the lead engineer on our SPDY efforts and has been doing a lot of work to
prepare our HTTP stack for multiplexed operation. Brian has thought more about
some of your protocol questions than I have, so I think he'll step in and
offer his thoughts.

I have some thoughts on the encryption issue which I'll share below. These
thoughts aren't the official position of Facebook, they are my own opinion.

Hopefully my reply will answer most of the questions that have been raised on
this thread though.

> I don't want to start the encryption debate in this thread, but since you
> have a fairly balanced approach, I'd like to note that at the moment, almost
> 100% of stolen user information is been done on encryption-protected
> services, whether it is bank account credentials or webmail credentials or
> information.  The issue always comes from malware running on the PC,
> infecting the browser and stealing the information at the human interface.
> However, users feel safer because they see the SSL lock. And it's not always
> the browser, as there was a report of stolen webmail information in TLS
> traffic in a certain country when a CA was broken and new certs for a number
> of large sites were emitted.

Here's my thoughts on mandating transport encryption:

  * The SSL/TLS/CA ecosystem is flawed, but it's the most widely deployed
    system we have for securing web traffic. We shouldn't let the flaws in the
    current system stop us from advocating for more user privacy; in fact
    greater pressure on the CA system from a crypto mandate would probably
    lead to reforms there.

  * SSL/TLS doesn't do a lot to address targeted attacks against one user
    (e.g. malware, spear-phishing, etc) but it helps guard against
    surveillance and censorship of large user populations. While there are
    weaknesses with the CA system that make it possible for governments and
    organizations to issue rogue certs in targeted cases, it is very difficult
    to deploy rogue certs globally for all web traffic for all users. Put
    simply, the more widely that SSL/TLS is used, the greater chance that
    users will have privacy in their online communications.
    
  * SSL/TLS stops "helpful" transparent proxies from intercepting your
    unencrypted traffic and doing things with it that you didn't ask them to.
    Encryption keeps these proxies honest, all they can do is choose whether
    or not to forward your connection. 

  * Symmetric crypto costs are not much higher; I think Akamai quoted 10-20%
    in their response. I think the costs aren't a big deal for major sites; if
    you are large enough to care about performance, you are large enough to
    support session resumption, which cuts out the CPU cost of most
    handshakes. Rather, it's a much more interesting question for the very
    small operators and very small embedded devices. For example, if I have a
    thermostat in my fridge that wants to report temperature and power usage
    information somewhere central, it might be onerous to require it to speak
    crypto in order to talk to a web server today. I just think that it won't
    be onerous tomorrow.

  * Monetary cost of the certs is not an issue. You can get free (or cheap) DV
    certs, so hobbyist sites and non-profits would not be locked out from the
    web due to lack of access to cheap certs, and even if they were, I expect
    the market will produce a CA whose costs will meet that user demand. For
    EV cert prices, I expect that the market will continue to optimize those;
    there has been a steady decline in EV cert prices over the years. Either
    way, if you're large enough to require an EV cert, you have other
    infrastructure costs to bear as well (power, rent, hardware, network
    connectivity, domain registration, etc).

The Internet has always been about the (mostly) free expression of ideas, both
in terms of monetary cost and personal freedom in what you can say. I actually
think it's pretty amazing that we have the Internet at all; if you look at all
the civilizations that have existed in human history, not many of them would
have built something that offered so much freedom for people to communicate
with each other and publish their ideas and beliefs. However, we now live in a
time where a sizable fraction of humanity uses the Web to communicate daily,
and that makes those people a very lucrative target.

I also think it is useful to understand why people object to the idea of
mandated transport encryption as well. Critiques include the extra resource
usage, the high cost of certs, the extra round trips, and the broken CA system
(to name a few), while others react against the idea of forcing a mandate
itself and thus having protocols push political agendas.

I think all those positions can be valid. I just happen to think that even
given all the above, it is still better to mandate encryption and give better
privacy to Internet users than it is to punt the ball down the field another
twenty years.

> Also, you said that it could make things harder for you, but did you
> evaluate only the front access or also the protocol used between your load
> balancers and backend servers ? I'm asking because there is a difference
> between mandating the use of encryption in browsers and designing the
> protocol based on this. For instance, WebSocket has the masking flag
> mandatory on upstream traffic but the protocol supports not having it
> between servers.

Regarding load balancers and resource usage, we looked at three cases:

  * Traffic between LB and user
  * Traffic between LB and LB
  * Traffic between LB and web server

For the LB<->user case, you can remove a lot of handshakes with session
resumption. We are seeing a 80% hit rate for our session caching deployment
which supports server side session caching and client side session tickets.
Plus, I would expect that multiplexed protocols hold onto their client sockets
longer since you expect a higher probability of reusing that socket later
(since you've gone from N connections to a given domain to just one
connection). So you end up mostly just paying the symmetric cipher cost.

The LB<->LB case happens when you terminate a user on an edge node close to
them (say London) and tunnel their request to a remote datacenter. This speeds
up the TCP and SSL handshakes since those happen over a low RTT link. If you
advertise HTTPS capability to your users, it's obvious that you want to speak
HTTPS between the LB in London and the end user in Liverpool. It is perhaps
less obvious that you also need to encrypt the link between your edge LB in
London and your datacenter in the US, since that traffic will travel over
circuits leased from major carriers (unless you lay your own transoceanic
fiber, and even then, there are techniques for tapping undersea traffic
cables).

For the LB<->LB case, you tend to use persistent connections, so the handshake
cost is low. This is especially true with multiplexed protocols since you can
fit so much traffic on those sockets; much more traffic than you're ever going
to fit on a single LB<->user connection. So for these connections, you're
mainly paying the cost of the block level cipher versus the more costly
handshake.

Once you get inside the datacenter, encryption is less important since you
don't have to worry as much about third parties intercepting that traffic.
Still, one would imagine that the datacenter load balancer would still
probably keep persistent connections to the web servers it was balancing its
load across and would enjoy similar handshake amortization as the other two
cases.

So I actually think that the sites with the largest request loads have the
hardest time being against TLS on the resource usage front, especially when
you look at the low cost of running TLS on commodity hardware and the expected
gains in hardware power over the coming years.

I'm more concerned about very small devices that want to use HTTP to report
usage statistics (thermostats, pressure monitors, industrial sensors, etc).
They might not be able to afford the hardware power to perform TLS. I think
any personal communication devices (feature phones, smart phones, tablets,
laptops, desktops, etc) will always have enough CPU to handle crypto, or will
have onboard ASICs that they can offload that crypto to.

> Basically, since all sensible sites already make use of TLS, I don't think
> we can make them safer by mandating use of TLS for them. However mandating
> use of TLS will make it harder to work on the backend, it will very often be
> a counter- productive effort which increases costs a lot (cert managing,
> troubleshooting, etc) with no added benefit.

Requiring TLS definitely makes backend work more difficult but I think the
tools will come. A side-effect would probably be that web servers and load
balancers would get better instrumentation. I think Varnish has a nice
implementation with its shared memory ring buffer that's used to log events;
you can attach tools to that region and read the events in realtime as the
proxy operates. The model is quite good.

Regarding added benefit, it's true that many major sites are forcing HTTPS
already. However, many are not, and there's also a very long tail of
unencrypted sites out there. I argue that the added benefit is quite large.

> I think that what you're describing here precisely is what WebSocket offers,
> but I may be wrong, depending on your precise use-cases. It implicitly
> offers server push in the sense you're describing it (push of any data, not
> HTTP objects), and automatically offers the no-buffering flag because when
> HTTP gateways switch to WebSocket, they know this is interactive traffic and
> stop buffering. I think your description confirms the need to unify the
> transport layer to support both HTTP and WS at the same time in the same
> connection.

Brian Pane has thoughts on WebSocket and SPDY, and I think he can better
comment here.

Regards,

Doug

Received on Tuesday, 17 July 2012 22:12:32 UTC