- From: Willy Tarreau <w@1wt.eu>
- Date: Fri, 13 Jul 2012 13:00:43 +0200
- To: HTTP Working Group <ietf-http-wg@w3.org>
Hi, as the project maintainer of the haproxy load balancer[1], I'm facing a number of issues in HTTP/1.1 that I would like to see addressed in HTTP/2. A little background first. In HTTP terminology, Haproxy is a gateway, and its role could more precisely be described as an "HTTP router" as PHK calls this type of devices. It is most commonly placed at the front of hosting sites where it serves two main purposes : split incoming traffic based on a number of criteria (most commonly Host, part of URI, Cookie or Upgrade header), and balance the load among a server farm using cookies when stickiness is required. As such, it commonly is the first entry point of high traffic sites. On this type of devices, admins have to finely balance performance and features, and sometimes they adapt their web architecture to maximize the capacity per CPU (for instance, some decide to use different Host values for different farms and only process the first request of each connection). Admins generally prefer not to deploy too many load balancers because the ability to smooth the load and to maintain QoS drops as the number of LBs increase. Also, with todays cloud-based hosting infrastructures everywhere, CPU usage and machines have an operating cost that needs to be kept low. Connection rates of 20-40000 per second and per core are fairly common, and users facing DDoS have reached up to 160k connections per second per node in order to sustain 10 million connections per second attacks[2]. At these rates, the load balancer still has to strictly parse incoming traffic and route it to the proper destination (or drop it), and the choice must be made very quickly with the lowest possible per-request and per-connection overhead. Here comes the issue with HTTP/1.1. Parsing HTTP/1.1 is expensive, very expensive. Most of the time is spent parsing useless and redundant information. Another important amount of CPU cycles are spent updating the Connection header field and dealing with session-related information such as looking up a stickiness cookie. Ensuring request integrity is very expensive too. For instance, checking for unicity of the Content-Length field is expensive as it implies checking all header fields for it. Load balancers such as haproxy would benefit a lot from a better layered HTTP model which distinguishes user session, connection and transaction. Such products also need a much lower request processing cost and connection processing cost, since DDoS are often made of one request per connection. For normal traffic, it is absolutely critical to be able to forward data with the least possible overhead : for instance, having to forward images chunked at 4kB definitely kills performance on todays 10 Gbps gateways, as the cost of processing a chunk is much higher than the cost of forwarding the 4kB it contains. Now concerning the proposals. draft-mbelshe-httpbis-spdy-00 ----------------------------- SPDY is a very interesting work which has gained a lot of experience in field. It undoubtly is the most advanced proposal. It is aimed at improving end user experience with little software changes on the server side. As I once told Mike, I view it as a much improved transport layer for HTTP. It correctly does what it aims to do and as such it's a success. It addresses some of the parsing issues HTTP/1.1 has. By prefixing header names and values with their sizes, it saves some parsing to the recipient. It will not save all of it to SPDY->1.1 gateways which will still need to check for validity in names and values in order not to transport dangerous character sequences such as CRLF + name + colon in values for instance. Still it helps for most situations. The few issues that I am having with SPDY have already been explained to great extents here on the list and are simple : - no plan to reuse existing deployed infrastructure on port 80 - extra processing cost due to header compression without addressing the HTTP/1.1 issues (which is normal in order to transport 1.1). For haproxy, I'm much concerned about the compression overhead. It requires memory copies and cache-inefficient lookups, and still maintains expensive header validation costs. For haproxy, checking a Host header or a cookie value in a compressed stream requires important additional work which significantly reduces performance. Adding a Set-Cookie header into a response or remapping a URI and Location header will require even more processing. And dealing with DDoSes is embarrassing with compressed traffic, as it improves the client/server strength ratio by one or two orders of magnitude, which is critical in DDoS fighting. We must absolutely do something about the transported protocol. At the moment it is HTTP/1.1, but it does not seem reasonable to still require the whole spec we already have plus a new one to ensure safe implementations. Some users have already asked if haproxy will implement SPDY, and I replied that due to the lack of time, it will not implement a draft at this point in time and would rather implement HTTP/2 once we get an idea about it. If in the end SPDY becomes HTTP/2, haproxy would implement it without outgoing header compression in order to limit the performance losses. Last, the SPDY draft suggests use of TLS without apparently mandating it (since the framing may rely on anything). While it is important to be able to benefit from TLS, it is equally important not to make it a necessity. Many places which use haproxy distribute content which does not affect user privacy and which need to be delivered quickly with low processing overhead. Mandating use of TLS would also further deteriorate the current state of affairs abount certificate management and would discriminate the two HTTP worlds (1 and 2) on corporate proxies which need to analyse requests and contents. However we need a way to let the user choose between accepted corporate filtering and total privacy (see below). draft-montenegro-httpbis-speed-mobility-02 ------------------------------------------ This draft addresses a concern I've had for the last few months : HTTP and WebSocket are used together and each of them defines its own framing, its own compression and own multiplexing. I think that the idea of relying on a single transport layer for both to unify efforts and reduce code duplication is a good one. I personally am concerned about the 3 layers it involves to process HTTP/2 : - HTTP/1.1 (for upgrade) - WebSocket handshake - then HTTP/2 However this can probably be addressed with some work. I think that this draft reminds us that we absolutely need to find a common transport layer for both HTTP and WebSocket, and that the work on the transport layer and the work on header encoding may be two independant parallel tasks. draft-tarreau-httpbis-network-friendly-00 ----------------------------------------- Being one of the authors of this draft, my opinion is certainly viewed as biased. However, my goal with this draft was not to compete with other ones but to try to address specific points which were not addressed in the other ones. It should be seen as a parallel work aiming at improving the situation for high speed HTTP routers without trading off the end-user benefits that SPDY has proven to provide. This draft does not pretend to offer a complete solution (eg: flow control is not addressed despite absolutely necessary if multiplexing is involved), but it seeks optimal framing and encoding for very low processing overhead, making it easier to perform in hardware if needed, and discusses a method to reuse existing HTTP with multiplexing and without the initial round trip that is known to kill performance in mobile environments. The main focus consists in using a bit stronger typing for header entities (eg: fields transporting a date are not necessarily encoded the same way as cookies). A number of points still need to be addressed there, such as how to efficiently discriminate a list of values and a value containing a comma, etc. I'm personally convinced that stronger typing will implicitly remove some of the dangerous and complex ambiguities we have with HTTP/1.1. Amos Jeffries is currently working on a library which will hopefully make it possible to have a real world implementation soon. I hope that some of the concepts raised in this draft will be discussed if any of the other drafts is retained. Remaining issues not addressed in proposals ------------------------------------------- Despite our disgust for this fact, HTTP has become a de-facto standard transport protocol for many purposes. WebSocket is a proof of this, it was born to address the dirty bidirectional mechanisms that were appearing everywhere. A wide number of tools are able of using the HTTP CONNECT method over a proxy to reach a point on the net (for VPNs, SSH, etc...). HTTP has brought what TCP lacks : user authentication and bouncing over proxies even in non-routable environments. I would very much welcome a solution which either maintains the existing Upgrade mechanism and clarifies it, or provides sort of a sub-protocol knowledge (where HTTP/1.1 might very well be one of them). The second issue is the lack of transparency for end users concerning corporate proxies. Corporate filtering proxies are a necessity. They stop a lot of malware and are used (eg in schools) to protect visitors from unexpected contents. They are also used to avoid too much distraction at some places, and to enforce law where needed. The problem is that right now the migration to HTTPS for many sites has caused increased need for HTTPS content analysis, and a large number of products are now used to spoof certificates and control everything. This is not acceptable (technically speaking, and from the user's privacy respect). We absolutely need the new HTTP standard to make it possible for end users to choose if their contents may be analysed by the proxy or not. We have already suggested the notion of "GET https://" for proxy requests as opposed to "CONNECT", and I think that having this or an equivalent in the new standard will solve the issue. Corporate proxies will then be able to analyse contents passing over "GET https://" and will apply per-site policy for CONNECT (eg: OK for banks, NOK for webmails). Another point concerns protocol limits. I'm used to see developers do whatever with HTTP, especially on the backend side. For instance, passing a complete file in a header field and then complain that Apache or Haproxy in the middle of the chain has blocked the request when the file is too large! At least we should suggest some "common use" limits on the total header size, the number of fields and a field size. Authentication -------------- I have not reviewed the proposed authentication schemes yet. I think that it is a subject which should be dealt with once we clearly define session management and how to protect privacy of exchanges between users and proxies. Right now the state of affairs is terrible when it comes to proxy auth since most (browser,proxy) combinations do not allow any form of protection for the user's credentials, resulting in the ugly tricks already described on the WG which deteriorate the browsing experience and break access to the Web for most components (including software upgrades). The logout feature is still missing everywhere and would probably be addressed once we correctly define how to manage a session, by simply destroying the session and all associated contents. As a last point, I'd like that we don't conflate authentication with authorization. A few components need to validate authentication, but past the one which validates authentication, only authorization is commonly needed. This probably means that a user identifier still needs to be passed along the chain to trusted parties. Regards, Willy [1] http://haproxy.1wt.eu/ [2] http://www.mail-archive.com/haproxy@formilux.org/msg05448.html
Received on Friday, 13 July 2012 11:01:11 UTC