Delta I-D update feedback

Just finally getting a chance to read through the updated draft.. this is
definitely looking good so far. I'll likely have additional feedback later
on but two items definitely caught by eye...

Within the compression algorithm, you show the call to
ComputeTrangsFromRawToggls but it does not appear that you describe what
exactly is going on in that step beyond a single comment. You should likely
expand that to say that the function reduces multiple numerically adjacent
toggls into trangs in an *attempt* to reduce the actual number of bits
serialized on the wire and simplify processing. I say "attempt" because
there is a notable (albeit minor in the big picture) exception and
optimization that is also worth mentioning -- Suppose you have N adjacent
toggls with a total serialized length of len(N) bytes. Let's also say that
those toggls can be represented by M trang operations with a serialized
length len(M), if len(N) <= len(M), the ComputeTrangsFromRawToggls
procedure can and should be skipped. A quick pseudocode comment ought to be
enough to call this out.

The second item has to do with the reindexing of the LRU at the end of the
ParseAndExecuteHeaderBlock procedure. It would be good to explain in more
detail what exactly is happening there and why, especially given that it
does have a fairly significant impact on overall performance.

As we've discussed offline, I've been experimenting with a couple of
alternative ways of handling the reindexing. I have yet to settle on any
one particular method but the one I've played with the most uses two
internal tables, one holding the key value pairs themselves and a second
that maps external id's to the internal storage keys.. that is,

  Pair<K,V>[] store;
  Integer[] index;

Where each value i in index is a pointer to a location in store.

After each compression/decompression, the order of index is recalculated by
sorting the keys of store by frequency of use over the lifetime of the
store, dropping all the items whose frequency falls below a given
threshold. In my current implementation, the sorting is currently done with
a simple efficient quicksort. Since most of the index items will already be
sorted, this runs very quickly and does not require us to touch the actual
key value pairs at all. With proper optimization, we ought to be able to
implement it so that len(index) is significantly less than len(store). When
reindexing is complete, the changed items are noted and the header group is
updated accordingly.

Obviously there's a tradeoff here in performance vs. used storage space. In
your current scheme, it will be common for key-value pairs to be duplicated
within the store -- that is, once ParseAndExecuteHeaderBlock is run, all of
the current Key-Values that are used are re-added to the store, regardless
of whether they already exist. Any already existing items do not disappear
until they happen to drop off the end. In my revised scheme, stored items
in the main lru cache are not duplicated, they just get reindexed with a
new external ID.

Now, I'm a long way off from settling on any particular scheme, and I'll
readily admit that there are issues with the method I describe here, so for
now just take this all as fodder for discussion.

Anyway, that's it for now. More feedback likely next week once I free up
some time to continue working on the java implementation.

- James

Received on Wednesday, 20 March 2013 00:28:47 UTC