HTTP2 Expression of Interest : haproxy from Willy Tarreau on 2012-07-13 (ietf-http-wg@w3.org from July to September 2012)

From: Willy Tarreau <w@1wt.eu>
Date: Fri, 13 Jul 2012 13:00:43 +0200
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20120713110043.GA13480@1wt.eu>
Hi,

as the project maintainer of the haproxy load balancer[1], I'm facing a
number of issues in HTTP/1.1 that I would like to see addressed in HTTP/2.

A little background first. In HTTP terminology, Haproxy is a gateway, and
its role could more precisely be described as an "HTTP router" as PHK calls
this type of devices. It is most commonly placed at the front of hosting
sites where it serves two main purposes : split incoming traffic based on
a number of criteria (most commonly Host, part of URI, Cookie or Upgrade
header), and balance the load among a server farm using cookies when
stickiness is required. As such, it commonly is the first entry point of
high traffic sites.

On this type of devices, admins have to finely balance performance and
features, and sometimes they adapt their web architecture to maximize the
capacity per CPU (for instance, some decide to use different Host values
for different farms and only process the first request of each connection).

Admins generally prefer not to deploy too many load balancers because the
ability to smooth the load and to maintain QoS drops as the number of LBs
increase. Also, with todays cloud-based hosting infrastructures everywhere,
CPU usage and machines have an operating cost that needs to be kept low.

Connection rates of 20-40000 per second and per core are fairly common, and
users facing DDoS have reached up to 160k connections per second per node
in order to sustain 10 million connections per second attacks[2].

At these rates, the load balancer still has to strictly parse incoming
traffic and route it to the proper destination (or drop it), and the
choice must be made very quickly with the lowest possible per-request
and per-connection overhead.

Here comes the issue with HTTP/1.1. Parsing HTTP/1.1 is expensive, very
expensive. Most of the time is spent parsing useless and redundant
information. Another important amount of CPU cycles are spent updating
the Connection header field and dealing with session-related information
such as looking up a stickiness cookie. Ensuring request integrity is
very expensive too. For instance, checking for unicity of the Content-Length
field is expensive as it implies checking all header fields for it.

Load balancers such as haproxy would benefit a lot from a better layered
HTTP model which distinguishes user session, connection and transaction.
Such products also need a much lower request processing cost and connection
processing cost, since DDoS are often made of one request per connection.
For normal traffic, it is absolutely critical to be able to forward data
with the least possible overhead : for instance, having to forward images
chunked at 4kB definitely kills performance on todays 10 Gbps gateways, as
the cost of processing a chunk is much higher than the cost of forwarding
the 4kB it contains.

Now concerning the proposals.


draft-mbelshe-httpbis-spdy-00
-----------------------------

SPDY is a very interesting work which has gained a lot of experience in
field. It undoubtly is the most advanced proposal. It is aimed at improving
end user experience with little software changes on the server side. As I
once told Mike, I view it as a much improved transport layer for HTTP. It
correctly does what it aims to do and as such it's a success. It addresses
some of the parsing issues HTTP/1.1 has. By prefixing header names and
values with their sizes, it saves some parsing to the recipient. It will
not save all of it to SPDY->1.1 gateways which will still need to check
for validity in names and values in order not to transport dangerous
character sequences such as CRLF + name + colon in values for instance.
Still it helps for most situations.

The few issues that I am having with SPDY have already been explained to
great extents here on the list and are simple :
  - no plan to reuse existing deployed infrastructure on port 80
  - extra processing cost due to header compression without addressing the 
    HTTP/1.1 issues (which is normal in order to transport 1.1).

For haproxy, I'm much concerned about the compression overhead. It requires
memory copies and cache-inefficient lookups, and still maintains expensive
header validation costs. For haproxy, checking a Host header or a cookie
value in a compressed stream requires important additional work which
significantly reduces performance. Adding a Set-Cookie header into a
response or remapping a URI and Location header will require even more
processing. And dealing with DDoSes is embarrassing with compressed
traffic, as it improves the client/server strength ratio by one or two
orders of magnitude, which is critical in DDoS fighting.

We must absolutely do something about the transported protocol. At the
moment it is HTTP/1.1, but it does not seem reasonable to still require
the whole spec we already have plus a new one to ensure safe
implementations.

Some users have already asked if haproxy will implement SPDY, and I replied
that due to the lack of time, it will not implement a draft at this point
in time and would rather implement HTTP/2 once we get an idea about it.

If in the end SPDY becomes HTTP/2, haproxy would implement it without
outgoing header compression in order to limit the performance losses.

Last, the SPDY draft suggests use of TLS without apparently mandating it
(since the framing may rely on anything). While it is important to be able
to benefit from TLS, it is equally important not to make it a necessity.
Many places which use haproxy distribute content which does not affect
user privacy and which need to be delivered quickly with low processing
overhead. Mandating use of TLS would also further deteriorate the current
state of affairs abount certificate management and would discriminate the
two HTTP worlds (1 and 2) on corporate proxies which need to analyse
requests and contents. However we need a way to let the user choose between
accepted corporate filtering and total privacy (see below).


draft-montenegro-httpbis-speed-mobility-02
------------------------------------------

This draft addresses a concern I've had for the last few months : HTTP and
WebSocket are used together and each of them defines its own framing, its
own compression and own multiplexing. I think that the idea of relying on a
single transport layer for both to unify efforts and reduce code duplication
is a good one. I personally am concerned about the 3 layers it involves to
process HTTP/2 :
  - HTTP/1.1 (for upgrade)
  - WebSocket handshake
  - then HTTP/2

However this can probably be addressed with some work. I think that this
draft reminds us that we absolutely need to find a common transport layer
for both HTTP and WebSocket, and that the work on the transport layer and
the work on header encoding may be two independant parallel tasks.


draft-tarreau-httpbis-network-friendly-00
-----------------------------------------

Being one of the authors of this draft, my opinion is certainly viewed as
biased. However, my goal with this draft was not to compete with other ones
but to try to address specific points which were not addressed in the other
ones. It should be seen as a parallel work aiming at improving the situation
for high speed HTTP routers without trading off the end-user benefits that
SPDY has proven to provide.

This draft does not pretend to offer a complete solution (eg: flow control
is not addressed despite absolutely necessary if multiplexing is involved),
but it seeks optimal framing and encoding for very low processing overhead,
making it easier to perform in hardware if needed, and discusses a method
to reuse existing HTTP with multiplexing and without the initial round trip
that is known to kill performance in mobile environments.

The main focus consists in using a bit stronger typing for header entities
(eg: fields transporting a date are not necessarily encoded the same way as
cookies). A number of points still need to be addressed there, such as how
to efficiently discriminate a list of values and a value containing a
comma, etc. I'm personally convinced that stronger typing will implicitly
remove some of the dangerous and complex ambiguities we have with HTTP/1.1.

Amos Jeffries is currently working on a library which will hopefully make
it possible to have a real world implementation soon. I hope that some of
the concepts raised in this draft will be discussed if any of the other
drafts is retained.


Remaining issues not addressed in proposals
-------------------------------------------

Despite our disgust for this fact, HTTP has become a de-facto standard
transport protocol for many purposes. WebSocket is a proof of this, it was
born to address the dirty bidirectional mechanisms that were appearing
everywhere. A wide number of tools are able of using the HTTP CONNECT
method over a proxy to reach a point on the net (for VPNs, SSH, etc...).
HTTP has brought what TCP lacks : user authentication and bouncing over
proxies even in non-routable environments.

I would very much welcome a solution which either maintains the existing
Upgrade mechanism and clarifies it, or provides sort of a sub-protocol
knowledge (where HTTP/1.1 might very well be one of them).

The second issue is the lack of transparency for end users concerning
corporate proxies. Corporate filtering proxies are a necessity. They stop
a lot of malware and are used (eg in schools) to protect visitors from
unexpected contents. They are also used to avoid too much distraction at
some places, and to enforce law where needed.

The problem is that right now the migration to HTTPS for many sites has
caused increased need for HTTPS content analysis, and a large number of
products are now used to spoof certificates and control everything. This
is not acceptable (technically speaking, and from the user's privacy
respect). We absolutely need the new HTTP standard to make it possible
for end users to choose if their contents may be analysed by the proxy
or not. We have already suggested the notion of "GET https://" for proxy
requests as opposed to "CONNECT", and I think that having this or an
equivalent in the new standard will solve the issue. Corporate proxies
will then be able to analyse contents passing over "GET https://" and
will apply per-site policy for CONNECT (eg: OK for banks, NOK for
webmails).

Another point concerns protocol limits. I'm used to see developers do
whatever with HTTP, especially on the backend side. For instance, passing a
complete file in a header field and then complain that Apache or Haproxy in
the middle of the chain has blocked the request when the file is too large!
At least we should suggest some "common use" limits on the total header
size, the number of fields and a field size.


Authentication
--------------

I have not reviewed the proposed authentication schemes yet. I think that
it is a subject which should be dealt with once we clearly define session
management and how to protect privacy of exchanges between users and
proxies. Right now the state of affairs is terrible when it comes to
proxy auth since most (browser,proxy) combinations do not allow any form
of protection for the user's credentials, resulting in the ugly tricks
already described on the WG which deteriorate the browsing experience
and break access to the Web for most components (including software
upgrades).

The logout feature is still missing everywhere and would probably be
addressed once we correctly define how to manage a session, by simply
destroying the session and all associated contents.

As a last point, I'd like that we don't conflate authentication with
authorization. A few components need to validate authentication, but
past the one which validates authentication, only authorization is
commonly needed. This probably means that a user identifier still needs
to be passed along the chain to trusted parties.


Regards,
Willy

[1] http://haproxy.1wt.eu/
[2] http://www.mail-archive.com/haproxy@formilux.org/msg05448.html
Received on Friday, 13 July 2012 11:01:11 UTC