Re: Submitted new I-D: Cache Digests for HTTP/2 from Alex Rousskov on 2016-01-08 (ietf-http-wg@w3.org from January to March 2016)

From: Alex Rousskov <rousskov@measurement-factory.com>
Date: Fri, 8 Jan 2016 11:33:37 -0700
To: ietf-http-wg@w3.org
Cc: Kazuho Oku <kazuhooku@gmail.com>
Message-ID: <56900101.1050506@measurement-factory.com>
On 01/08/2016 12:17 AM, Kazuho Oku wrote:

> Yesterday, Mark and I have submitted a new draft named "Cache Digests
> for HTTP/2."
> https://datatracker.ietf.org/doc/draft-kazuho-h2-cache-digest/
> 
> The draft proposes a new HTTP/2 frame named CACHE_DIGEST that conveys
> client's cache state so that a server can determine what should be
> pushed to the client.

> Please let us know how you think about the proposal.


If possible, I recommend removing Draft language that makes (or appears
to make) your feature specific to optimizing push traffic to user
agents. Cache digests are useful for many things. Optimizing push
traffic to user agents is just one use case. For example, Squid proxies
already use Cache Digests (based on Bloom Filters) to optimize
cache-to-cache communication in caching hierarchies [1,2].

  [1] http://www.squid-cache.org/CacheDigest/cache-digest-v5.txt
  [2] http://wiki.squid-cache.org/SquidFaq/CacheDigests

I suspect it is possible to define the new CACHE_DIGEST frame without
adding artificial restrictions on its use. Let the agents sending and
receiving that frame decide what use is appropriate between them while
following some general guidelines.


Since there are already two cache digests formats (based on Bloom
filters and based on Golumb-coded sets), we should expect a third one.
Have you considered allocating the first few response octets to specify
the digest format?


> A CACHE_DIGEST frame can be sent from a client to a server on any
>    stream in the "open" state, and conveys a digest of the contents of
>    the cache associated with that stream

Perhaps I am missing some important HTTP/2-derived limits here, but the
"cache associated with a stream" sounds too vague because HTTP caches
are often not associated with specific streams. Did you mean something
like "the cache portion containing shared-origin URIs?"


>    servers ought not
>    expect frequent updates; instead, if they wish to continue to utilise
>    the digest, they will need update it with responses sent to that
>    client on the connection.

Perhaps I am missing some important HTTP/2 caveats here, but how would
an origin server identify "that client" when the "connection" is coming
from a proxy and multiplexes responses to many user agents?



>        1.  Convert "URL" to an ASCII string by percent-encoding as
>            appropriate [RFC3986].

There are many ways to percent-encode the same URI. This step must
define a single way for doing so. Besides case insensitive parts and the
decision of what characters to [un]escape, please do not forget about
trailing slashes, URI fragments, and other optional parts. This is
critical for interoperation!


>    MUST choose a parameter, "P",
>    that indicates the probability of a false positive it is willing to
>    tolerate

For clarity, please detail what you mean by a "false positive" in this
context. It may also be useful to mention whether the digesting
algorithm may create false negatives.


> 7.  Write log base 2 of "N" and "P" to "digest" as octets.

The wording is ambiguous: Store log2(N) and then store log2(P)? Store
log2(N&P)? Store log2(N) and then store P? I suspect it is the latter
and recommend splitting step #7 into two steps, one step per number.

BTW, why note store the actual value of N?


> 7.  Write log base 2 of "N" and "P" to "digest" as octets.
...
> 8.  Write "R" to "digest" as binary, using log2(P) bits.

It is not clear how a number should be written/encoded. Different
programming languages and different systems store/represent numbers
differently, so I would expect the Draft to specify encoding precisely.
Sorry if I missed that detail.


The draft appears to be missing a section documenting how the digest
recipient can test whether the digest contains a given URI.


Please consider an additional Security Consideration: Origin servers are
expected to store digests so that the stored digests can be consulted
when pushing traffic. Most origin servers will store digests in RAM. A
malicious client may send a huge digest as a form of a DoS attack on a
naive server that does not validate digest sizes. Malicious client(s)
may send many small digests as a form of a (D)DoS attack on a naive
server that do not control the total size of stored digests.


Thank you,

Alex.
Received on Friday, 8 January 2016 18:34:20 UTC