Re: querystring part of cache key from Jamie Lokier on 2009-05-22 (ietf-http-wg@w3.org from April to June 2009)

From: Jamie Lokier <jamie@shareable.org>
Date: Fri, 22 May 2009 18:21:39 +0100
To: David Morris <dwm@xpasc.com>
Cc: Adrien de Croy <adrien@qbik.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <20090522172139.GB10237@shareable.org>

David Morris wrote:
> >>Since URIs can be arbitrarily long, yet database fields aren't good with
> >>this, I'd presume it's common practise to look up based on some hash
> >>value.  Is this approach used?  Is there any industry-standard hashing
> >>method, e.g. MD5 of method+URI(normalised) + querystring ?
> >
> >I doubt it.  Why would you do that?  I don't think it's normal to use
> >a URI to select an application and pass the querystring verbatim to a
> >database, or at least it's not a good idea :-)
> 
> Why not? .. this is a caching related question where the URI is part of
> the cache key ... since I've not implemented such a cache, I can't speak 
> to what I have done, but a hash such as MD5 seems reasonable ... in 
> particular if followed by an exact match comparison with a value stored
> in a blob, etc.

Ah, your question was about how to implement a cache.

There's lots of ways.  Hashing the URI is one, then that could look up
in a big hash table or a file in a directory, or multi-level directory tree.

Or it could look up in a database like DB or TDB.  There are lots of
key-value databases which are happy with arbitrary length key strings,
or which have a fairly big limit and don't pad them.

Some key-value databases hash internally, and some of them use B-trees
or other data structures.  If it's on disk, a B-tree might be good
because it'll preserve locality among similar URIs.

I've used a multi-level multi-key tree structure, in order to handle
Etags properly with different Vary on the same URI.

Using an SQL database sounds like a way to make your cache
unnecessarily slow, and not a good fit for the problem.

Be careful when normalising that you don't convert %xx of any
"sensitive" characters, as it can change the meaning of the URI.
Since any escaped characters could be meaningfully distinct from its
unescaped form to an application, that might mean don't convert any
%xx at all.

-- Jamie

Received on Friday, 22 May 2009 17:22:16 UTC