- From: Amos Jeffries <squid3@treenet.co.nz>
- Date: Sat, 31 May 2014 23:29:16 +1200
- To: Chris Drechsler <chris.drechsler@etit.tu-chemnitz.de>, ietf-http-wg@w3.org
On 20/05/2014 11:25 p.m., Chris Drechsler wrote: > Hi Amos, > > thank you for your comments! They are very useful for me! Below are my > answers: > > Am 19.05.2014 16:34, schrieb Amos Jeffries: >> On 19/05/2014 7:13 p.m., Chris Drechsler wrote: >>> Dear editors of [Part6], >>> dear working group members, >>> >>> I've written a draft about an improved caching mechanism in HTTP >>> especially for shared caches (to improve caching efficiency and reduce >>> costly Interdomain traffic). It can deal with personalization (e.g. >>> cookies, session IDs in query strings) and varying URLs due to load >>> balancing or the use of CDNs. The caching mechanism ensures that all >>> headers (request and response messages) are exchanged between origin >>> server and client even if the real content is coming from a cache. >>> >>> The draft is available under the following URL: >>> >>> http://tools.ietf.org/id/draft-drechsler-httpbis-improved-caching-00.txt >>> >>> I kindly request you for comments - thank you!. >> >> >> * The introduction states several abuses and deliberate non-use of >> HTTP/1.1 features as the reasons for this proposal. > > In a way I agree: varying URLs due to load balancing/CDNs or the use of > session IDs in query strings are some kind of abuses. I think > personalization via cookies is not. On the other hand they are widely > used in the Internet and ISPs have to deal with it. And especially > caching is suffering from it. I was speaking there of the non-use of conditional requests. And the abuse of 20 status code to perform operations for which other status codes (3xx) were explicitly created. As others have repeated. Content providers which are already using URL and Cookies "incorrectly" despite the available HTTP conditional request and caching features are not going to change for this. > >> >> Which would not usually be bad, but it is actually simpler for the few >> problematic systems to start using existing DNS and HTTP features >> properly than it is for the entire existing software environment to be >> re-implemented to support this proposed mechanism. > > How would you properly solve it via DNS and HTTP features to ensure that > a representation from URL A is the same as from URL B? I think it is > very difficult because you don't know how all kinds of CDNs work > internally. Correct use of DNS is to have a single domain name authoritative for each object. One uniform DNS name may point at several servers or load balancers or CDN gateways. Causing a single URL to be fetched from any one of multiple sources in a load balanced fashion while simultaneously benefiting from existing HTTP caching. How CDNs operate internally is irrelevant. There is a limited set of API which the HTTP firendly CDN all present for public access: 1) URL hostname with multiple A/AAAA records routed to one of many separate servers. 2) URL hostname with single A/AAAA record routed to one of multiple backend servers. The ones picking the third option of domain sharding are operating "incorrectly". Getting them to change into one of the above API forms is easier (being a DNS config file and possibly script change) than getting them to implement whole new HTTP service stacks. Note that under your scheme the cache has to down load a copy from each different URL it sees *in full* before it can confirm the hash assigned is correct. Only then discard one. So there is no bandwidth savings for the domain sharding CDNs. Just a benefit of on-disk storage space in the cache. I propose that a completely internal SHA-256 de-duplication of cached contents (or ZFS/XFS backend storage?) on the cache machine itself would perform equally well for disk savings and has no need of any HTTP level changes. >> >> * Section 2.1 appears to be proposing a new header. But what does "NT" >> mean? >> The use of this header seems wsimilar to an extension of the RFC 3230 >> message digest mechanism. > > I looked for a header field name different to cache-control and named it > that way - NT means new technology. And right, it is similar to RFC 3230 > but there is no extra RTT due to negotiation of the hash algorithm. > There does not have to be any lost RTT at all. Simply define that SHA-256 is that algorithm for your header (Content-SHA256: ??) and the same criteria for generation you have now. Any future algorithm changes will need to work out how to integrate (or make another header name). But that is not a burden on you. >> >> * Section 2.2 things get really weird: >> - 2.2.1 is requiring mandatory disabling of all conditional request >> mechanisms in HTTPbis part4. > > Maybe my formulation is not clear. If cache-control is used by the > content producer then [Part6] caching should be applied and conditional > requests make sense and should not be disabled. > > If the content producer uses CDNs and/or personalization and wants to > benefit from caching then he can make use of the new caching mechanism > (and would omit the cache-control header). Then there is no need of > conditional requests because the cache system will abort the transfer if > the hash values match. The need of conditional requests is to target specific resource representations (via ETag) and to prevent unwanted content bytes being sent (via 304 status) and connection closures (via 304 again). > > It's always controlled by the content producer. > >> - 2.2.2 is requiring mandatory TCP connection termination after every >> duplicate response. Effectively removing all benefits HTTP/1.1 pipeline >> and persistent connections bring. It does this in order to replicate the >> 304 conditional response using just 200 status code + TCP termination. > > You are right. In reality persistent connections and pipelining are not > so widely used (see [1]). Especially pipelining is only used in roughly > 6% of all connections. When it is used then for relatively small > contents. The caching mechanism in my draft should be applied for large > contents (significant larger than 20KB) like bigger images or videos for > example. Is that 6% measured for general traffic or in relation to traffic going through a cache? for server connectison or client connections? Because web servers tend to benefit from quickly closing persistent connections and proxy caches from keeping them open. > >> - 2.2.2 also places several impossible requirements on intermediaries: >> 1) to re-open a just closed TCP connection, presumably without delays >> which would violate TCP TIME_WAIT requirements. > > I've written re-open but thats not the right word - I meant open a new > connection (with a new port) - I will fix it in the draft. And your are > absolutely right: re-open the same TCP connection would violate TCP > TIME_WAIT requirements. Either way this restricts the cache to operating with 64K sockets per 15 minute period (the TIME_WAIT on the closed sockets). With 2-4 sockets per transaction that becomes a hard limit of between 270 and 540 requests per second. Individual proxy caches today need to service up to 500K requests per second, and these numbers are only going upward as end-users get better access technology. > >> 2) to decode the Content-Encoding of responses to compare SHA-256 >> values. Even if the response was a 206 status with byte range from >> within an encoding such as gzip. > > No, you can not compare the hash value until you have the full (range) > of the transfered representation. So the cache system would store the > byte range from the 206 response and would wait until other responses > bring the missing parts. This is a complete waste of bandwidth. The reason for the range being requested in the first place is likely to be that the rest is not going to be requested, and/or that the content is so large that its not worth getting in full. > After that the cache can compute the hash value > and if it is correct (in comparison to the Cache-NT fields of the former > responses) then the cache would store it and would use it for following > requests. The computation should/can be done during low load (not in > busy hour). There is no guaranteed period of low load. Caches are used for example by Wikipedia, the throughput there is measured in whole TB per second minute after minute all day long. That is typical of a caching system. By the time any slack period is encountered it is far too late to do checksum validation of responses. Popular objects will have already been requested a great many times and potentially invalid content delivered to a wide variety of clients. This is enough of a problem to make the delaying of hash calculation a major security vulnerability. You have a nasty loop situation here, the proposed mechanism causes a problem and the workarounds only cause worse problems. >> >> In summary, the proposed feature completely disables the two most >> valuable cache features of HTTP/1.1 and replaces them with an equivalent >> process requiring mandatory use of the worst behaviours from HTTP/1.0. > > In a way I agree. For HTTP/1.0 and HTTP/1.1 my draft is a workaround. It is an outright regression for the majority of services which properly use the HTTP protocol caching features. > What I would need there is something like a STOP-sending-this-response > request. In general the new caching mechanism is more useful for larger > representations and they are seldom pipelined. Do you have any numbers to support that claim? there is no technical reason in working HTTP for large objects to be treated any differently than small objects. > > What about HTTP2? As each HTTP request-response exchange is assigned to > a single stream no side effects will arise - in my opinion. What do you > think, can you see side effects? All the side effects of disabling conditional requests and responses remain. As does the wasted bandwidth from several RTT worth of traffic being emitted before transaction termination. As does the cache corruption and DoS possibilities impicit in the lack of checksum verification under load. Which is caused directly by this scheme avoiding existing HTTP mechanisms. Amos
Received on Saturday, 31 May 2014 11:29:51 UTC