Re: draft-ietf-httpbis-cache-digest-00 comments

Sorry for the delay, been travelling, then stuck, then sick, then catching up.


> On 14 Jul 2016, at 5:24 PM, Kazuho Oku <kazuhooku@gmail.com> wrote:
> 
> Hi,
> 
> Thank you for your comments.
> 
> The comments below are mine, and Mark might have different opinions.
> 
> 2016-07-13 11:18 GMT+09:00 Martin Thomson <martin.thomson@gmail.com>:
>> As I've said before, this is really interesting work, I'm very much
>> interested in seeing this progress.  However, I found a lot of issues
>> with the current draft.
>> 
>> The latest version seems to be a bit of a regression.  In particular,
>> the addition of all the flags makes it a lot more complicated, and I'm
>> already concerned about managing complexity here, especially since
>> this is an optimization.
>> 
>> The draft doesn't actually say where this frame should be sent - on a
>> stream that carries a request, or on stream 0.
> 
> In section 2.1 the draft states: A CACHE_DIGEST frame can be sent from
> a client to a server on any stream in the “open” state. My
> understanding is that it would be enough to indicate that the frame
> should be sent on a stream that carries a request as well as when it
> should be sent.

Right after that, it says:

> ... and conveys a digest of the contents of the client’s cache for associated stream.

That probably should say "contents of the client's cache for the *origin* of the associated stream.

The other obvious design would be to put them on stream 0 and then have an explicit Origin field.

Do we anticipate C_D being sent before a stream is opened for a given origin? 


>> This is important
>> because there are several mentions of origin.  For instance, the new
>> RESET flag talks about clearing digests for the "applicable origin".
>> That establishes a large point of confusion about the scope that a
>> digest is assumed to apply to; by their nature, this isn't necessarily
>> fatal, until you want to talk about RESET and COMPLETE.
>> 
>> To up-level a bit on this general issue, I'd like to see a better
>> formulated description of the information that clients and servers are
>> expected to maintain.  There seem to be multiple objects that are
>> stored, but I'm not sure what scope they are maintained in; is the
>> scope an origin?
> 
> Yes.

+1. We should rewrite to clarify this.See also <https://github.com/httpwg/http-extensions/issues/216>.


>> Assuming a particular scope, are there two objects, or four?  That is,
>> is there could be four stores:
>> 
>> 1. assumed fresh by URL
>> 2. assumed fresh by URL and etag
>> 3. assumed stale by URL
>> 4. assumed stale by URL and etag
>> 
>> Or are 1+2 and 3+4 combined?  The definition of RESET implies that all
>> four stores are cleared.  The definition of COMPLETE implies that only
>> 1+2 or 3+4 are affected.
> 
> There are four objects, which are grouped into two.
> 
> Your reading is correct that RESET flag clears all of them, and that
> the COMPLETE flag implies to either 1+2 or 3+4.

+1


>> The draft doesn't talk about URL normalization.  That is a risk to the
>> feasibility of this; fail to do something sensible here and you could
>> get a lot of spurious misses.  Given that it is just an optimization,
>> we don't need 100% agreement for this to work, but saying something is
>> probably wise.  We can probably get away with making some lame
>> recommendations about how to encode a URL.  Here's a rough cut of
>> something that didn't make the draft deadline this time around:
>> https://martinthomson.github.io/http-miser/#rfc.section.2.1
> 
> Thank you for the suggestion.
> 
> I have a mixed feeling about this; in section 2.2.1 the current draft
> says "Effective Request URI of a cached response" should be used.
> 
> So the cache digest would work without URL normalization if both of
> the following conditions are met:
> * if the client caches a response NOT normalizing the request URI into some form
> * if the server looks up the cache digest using a URI that a client would send
> 
> For example, if a HTML with a script tag specifying /%7Efoo/script.js
> is served to the client, then the draft excepts the client to use that
> value (including %7E) to be used as the key, and that the server
> should test the digest using the exact same form.
> 
> The pros of this approach would be that it would be easier to
> implement. The cons is that it would be fragile due to no
> normalization.
> 
> And I agree with you that in case we go without normalization we
> should warn the users that the paths should be same in terms of
> octets.

My inclination would be to do no more normalisation than caches are normally doing, at least to start with.


>> I don't see any value in COMPLETE.  Even if we accept that there is
>> only one connection from this client to this server, the benefit in
>> knowing that the digest is complete is marginal at best.  Is there
>> just one extra resource missing, or thousands.  As such, it changes
>> the probability by some unknown quantity, which isn't actionable.
> 
> I do find value in COMPLETE.
> 
> For a server with the primary goal to minimize B/W consumption and the
> second goal to minimize latency, it is wise to push responses that are
> known NOT to be cached by a client.
> 
> That's what the COMPLETE flag can be used for. Without the flag, a
> server can only tell if a response is already cached or _might_ by
> cached.
> 
>> Can a frame with the RESET flag include a digest as well?
> 
> Yes. That is the intention of the draft.
> 
>> N and P could fit into a single octet.  Since you are into the flags
>> on the frame anyway, reduce N and P to 4 bits apiece and use flags to
>> fill the upper bits as needed.  But I don't think that they will be
>> needed.  At the point that you have more than 2^16 entries in the
>> digest, you are probably not going to want to use this.  Even with a
>> tiny P=3 - which is too high a false positive probability to be useful
>> - with N=2^16 you still need 32K to send the digest.  You could safely
>> increase the floor for P before you might need or want higher bits
>> (and make the minimum higher than 2^0, which is probably too high a
>> false-positive probability in any case).
> 
> I would argue that P=1 would still be useful in some cases. For
> example if 10 resources are missing on the client side, it would mean
> that a server can detect 5 of them missing and push them in case P=1
> is used.
> 
> And considering the fact that we would nevertheless have read-n-bits
> operation while decoding the Golomb-encoded values, I do not see
> strong reason to squash N and P into a single octet.

+1

>> Is the calculation of N really round(log2(urls.length)).  I thought
>> that you would want to use ceil() instead.  Is the word "up" missing
>> from step 1 in Section 2.1.1?
> 
> That draft has intentionally been written to use round.
> 
> The numbers that matter when using Golomb-coded sets are:
> 
>    P: divisor used to divide bits that are unary-encoded and binary-encode
>    N*P: range of the encoded values
> 
> For efficiency, both P and N*P must be powers of 2.
> 
> To encode effectively, the real probability should be near to the
> value of P. And that in turn means that N*P should be
> round_to_power_of_two(urls.length * P) rather than
> round_up_to_power_of_two(urls.length * P).
> 
>> The draft never actually mentions that it uses [Rice-]Golomb-coding
>> until the appendix.  Including a reference to the Rice paper would
>> help people implementing this understand where this comes from, as
>> well as leading them to being able to find the relevant research.
>> (nit: Spelling Golomb correctly would help here.)
> 
> I agree. Thank you for noticing that!

+1, see <https://github.com/httpwg/http-extensions/issues/230>

> 
> -- 
> Kazuho Oku

--
Mark Nottingham   https://www.mnot.net/

Received on Wednesday, 24 August 2016 06:51:16 UTC