- From: Mark Nottingham <mnot@mnot.net>
- Date: Wed, 24 Aug 2016 16:50:46 +1000
- To: Kazuho Oku <kazuhooku@gmail.com>
- Cc: Martin Thomson <martin.thomson@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Sorry for the delay, been travelling, then stuck, then sick, then catching up. > On 14 Jul 2016, at 5:24 PM, Kazuho Oku <kazuhooku@gmail.com> wrote: > > Hi, > > Thank you for your comments. > > The comments below are mine, and Mark might have different opinions. > > 2016-07-13 11:18 GMT+09:00 Martin Thomson <martin.thomson@gmail.com>: >> As I've said before, this is really interesting work, I'm very much >> interested in seeing this progress. However, I found a lot of issues >> with the current draft. >> >> The latest version seems to be a bit of a regression. In particular, >> the addition of all the flags makes it a lot more complicated, and I'm >> already concerned about managing complexity here, especially since >> this is an optimization. >> >> The draft doesn't actually say where this frame should be sent - on a >> stream that carries a request, or on stream 0. > > In section 2.1 the draft states: A CACHE_DIGEST frame can be sent from > a client to a server on any stream in the “open” state. My > understanding is that it would be enough to indicate that the frame > should be sent on a stream that carries a request as well as when it > should be sent. Right after that, it says: > ... and conveys a digest of the contents of the client’s cache for associated stream. That probably should say "contents of the client's cache for the *origin* of the associated stream. The other obvious design would be to put them on stream 0 and then have an explicit Origin field. Do we anticipate C_D being sent before a stream is opened for a given origin? >> This is important >> because there are several mentions of origin. For instance, the new >> RESET flag talks about clearing digests for the "applicable origin". >> That establishes a large point of confusion about the scope that a >> digest is assumed to apply to; by their nature, this isn't necessarily >> fatal, until you want to talk about RESET and COMPLETE. >> >> To up-level a bit on this general issue, I'd like to see a better >> formulated description of the information that clients and servers are >> expected to maintain. There seem to be multiple objects that are >> stored, but I'm not sure what scope they are maintained in; is the >> scope an origin? > > Yes. +1. We should rewrite to clarify this.See also <https://github.com/httpwg/http-extensions/issues/216>. >> Assuming a particular scope, are there two objects, or four? That is, >> is there could be four stores: >> >> 1. assumed fresh by URL >> 2. assumed fresh by URL and etag >> 3. assumed stale by URL >> 4. assumed stale by URL and etag >> >> Or are 1+2 and 3+4 combined? The definition of RESET implies that all >> four stores are cleared. The definition of COMPLETE implies that only >> 1+2 or 3+4 are affected. > > There are four objects, which are grouped into two. > > Your reading is correct that RESET flag clears all of them, and that > the COMPLETE flag implies to either 1+2 or 3+4. +1 >> The draft doesn't talk about URL normalization. That is a risk to the >> feasibility of this; fail to do something sensible here and you could >> get a lot of spurious misses. Given that it is just an optimization, >> we don't need 100% agreement for this to work, but saying something is >> probably wise. We can probably get away with making some lame >> recommendations about how to encode a URL. Here's a rough cut of >> something that didn't make the draft deadline this time around: >> https://martinthomson.github.io/http-miser/#rfc.section.2.1 > > Thank you for the suggestion. > > I have a mixed feeling about this; in section 2.2.1 the current draft > says "Effective Request URI of a cached response" should be used. > > So the cache digest would work without URL normalization if both of > the following conditions are met: > * if the client caches a response NOT normalizing the request URI into some form > * if the server looks up the cache digest using a URI that a client would send > > For example, if a HTML with a script tag specifying /%7Efoo/script.js > is served to the client, then the draft excepts the client to use that > value (including %7E) to be used as the key, and that the server > should test the digest using the exact same form. > > The pros of this approach would be that it would be easier to > implement. The cons is that it would be fragile due to no > normalization. > > And I agree with you that in case we go without normalization we > should warn the users that the paths should be same in terms of > octets. My inclination would be to do no more normalisation than caches are normally doing, at least to start with. >> I don't see any value in COMPLETE. Even if we accept that there is >> only one connection from this client to this server, the benefit in >> knowing that the digest is complete is marginal at best. Is there >> just one extra resource missing, or thousands. As such, it changes >> the probability by some unknown quantity, which isn't actionable. > > I do find value in COMPLETE. > > For a server with the primary goal to minimize B/W consumption and the > second goal to minimize latency, it is wise to push responses that are > known NOT to be cached by a client. > > That's what the COMPLETE flag can be used for. Without the flag, a > server can only tell if a response is already cached or _might_ by > cached. > >> Can a frame with the RESET flag include a digest as well? > > Yes. That is the intention of the draft. > >> N and P could fit into a single octet. Since you are into the flags >> on the frame anyway, reduce N and P to 4 bits apiece and use flags to >> fill the upper bits as needed. But I don't think that they will be >> needed. At the point that you have more than 2^16 entries in the >> digest, you are probably not going to want to use this. Even with a >> tiny P=3 - which is too high a false positive probability to be useful >> - with N=2^16 you still need 32K to send the digest. You could safely >> increase the floor for P before you might need or want higher bits >> (and make the minimum higher than 2^0, which is probably too high a >> false-positive probability in any case). > > I would argue that P=1 would still be useful in some cases. For > example if 10 resources are missing on the client side, it would mean > that a server can detect 5 of them missing and push them in case P=1 > is used. > > And considering the fact that we would nevertheless have read-n-bits > operation while decoding the Golomb-encoded values, I do not see > strong reason to squash N and P into a single octet. +1 >> Is the calculation of N really round(log2(urls.length)). I thought >> that you would want to use ceil() instead. Is the word "up" missing >> from step 1 in Section 2.1.1? > > That draft has intentionally been written to use round. > > The numbers that matter when using Golomb-coded sets are: > > P: divisor used to divide bits that are unary-encoded and binary-encode > N*P: range of the encoded values > > For efficiency, both P and N*P must be powers of 2. > > To encode effectively, the real probability should be near to the > value of P. And that in turn means that N*P should be > round_to_power_of_two(urls.length * P) rather than > round_up_to_power_of_two(urls.length * P). > >> The draft never actually mentions that it uses [Rice-]Golomb-coding >> until the appendix. Including a reference to the Rice paper would >> help people implementing this understand where this comes from, as >> well as leading them to being able to find the relevant research. >> (nit: Spelling Golomb correctly would help here.) > > I agree. Thank you for noticing that! +1, see <https://github.com/httpwg/http-extensions/issues/230> > > -- > Kazuho Oku -- Mark Nottingham https://www.mnot.net/
Received on Wednesday, 24 August 2016 06:51:16 UTC