Re: Cache-Cache and Binary Encoding

On 19/01/2013 8:58 a.m., James M Snell wrote:
> Just one more for the day... Looking at Cache-Control.. Currently the 
> cache-control header consists of a list of named directives that 
> optionally have associated values. The format is extensible which is 
> great, but makes things a bit more difficult to optimize. Let's look 
> at a few random examples...
>
> Cache-Control: public (6 bytes)
> Cache-Control: public, max-age=1600 (21 bytes)
> Cache-Control: no-store, no-transform, must-revalidate (39 bytes)
>
> Let's see if we can do better.
>
> First off, let's assume that Cache-Control on requests can have a 
> different encoding than Cache-Control on responses. For requests, 
> let's make it:
>
>  +----------+----------+---------------------+
>  | no-cache | no-store |   no-transform      |
>  +----------+-----+----+---------+-----------+
>  | only-if-cached |xxxx| max-age | max-stale |
>  +-----------+----+----+---------+-----------+
>  | min-fresh | num-ext | repeating ext block |
>  +-----------+-----+---+---------+-----------+
>
>  no-cache       = 1 bit
>  no-store       = 1 bit
>  no-transform   = 1 bit
>  only-of-cached = 1 bit
>  xxx            = 4 reserved bits
>

I looked at these headers encoding for network-friendly-01 draft.

I suggest we take a closer look at what those flags *mean* and translate 
the meaning into bits not a 1:1 mapping of the flags. There are a couple 
like no-cache which become much clearer and some missing HTTP/1 flags 
which become visible when we do that.

storage: read-only, write-only, revalidate


HTTP/1          -> HTTP/2
must-revalidate -> store revalidate
no-cache        -> store revalidate
no-store        -> store read-only
only-if-cached  -> store read-only, revalidate
max-age=0       -> store write-only


We are missing a "store write-only + revalidate" option in HTTP/1. 
Meaning fetch this object as a cache MISS but allow it to be stored for 
future use. At face value "no-cache" gives people to assume that it 
means store is write-only, but the specification details do not match 
the common assumption.

store write-only + revalidate is missing as a single HTTP/1 flag. It 
sort of makes sense as a client driven force-refresh update which is 
forcibly revalidating the clients copy and only updates cached data IF 
it matches that same copy. A sort of cross between must-revalidate and 
only-if-cached.



>  max-age        = uintvar
>  max-stale      = uintvar
>  min-fresh      = uintvar
>  num-ext        = 1 byte
>

These ones are much clearer, but min-fresh is still a little obtuse in 
its naming.

   time-since-creation: min-age, max-age, max-stale

We can also go a little further. min-age as a question only makes sense 
on requests, wanting an object of minimum age in response. On Responses 
it makes sense to use it as an answer saying this object is minimum of X 
age already. Which allows us to drop the Age: headers entirely and use 
the min-age Cache-Control bits to store the responses current age value.

NOTE to Phillip in response to
"Why do HTTP request messages have dates in them anyhow?
If they do not cause a state machine to behave differently then lets get 
rid of them."

All these age fields are timestamps relative to the Date: header on the 
response. Which allows us to store values of up to 1 year offset from 
the Date: epoch in 31-bits, with one bit for valid/invalid value marker. 
Caches can opt the 32-bit signedness bit as that marker for easy coding.

PS. This is another reason I'm in favour of the 1-year default caching 
limitation. It would help us avoid wasting 4 bytes and doing 64-bit 
calculations on max-age which are usually short.


content-transform: yes, no

private : yes, no


>  repeating ext block =
>
>  +---------------------------+
>  |TYP|XXXXXX|len(key)|key|val|
>  +---------------------------+
>
>  TYP = 2 bit type code
>    00 = Boolean, no val
>    01 = Numeric, val is uintvar
>    10 = Text, val is encoded text
>    11 = Reserved
>  XXXXXX = Reserved Bits
>  if TYP is 00, then val is omitted. The idea is that this is a boolean 
> flag, like no-cache, no-store, etc. The key identifies the flag. Key 
> is a text label.
>  if TYP is 01, then val is uintvar.
>  if TYP is 02, then val is 2-byte length followed by encoded text
>
> So if we look at examples, then,
>
>   Cache-Control: no-cache  encodes as five-bytes
>   Cache-Control: only-if-cached, max-age=1600, encodes as seven-bytes
>
> Looking at the Cache-Control header for Responses we can do:
>
>  +--------+---------+----------+-------------+
>  | public | private | no-cache | no-transform|
>  +--------+-+-------+----------+-----------+-+
>  | no-store | must-revalidate  |proxy-reval|X|
>  +----------+----------+-------+-----------+-+
>  | max-age  | s-maxage | num-no-cache-headers|
>  +----------+-------+--+---------------------+
>  | no-cache-headers | num-private-headers    |
>  +------------------+------------------------+
>  |private-headers|num-ext|repeating ext block|
>  +------------------+------------------------+
>
> Same idea,
>
>   public               = 1 bit
>   private              = 1 bit

-1 bit. These are two sides of a single boolean. We can add that to 
HTTP/2 specification as a 1-bit flag to prevent future bungling and 
still be 1.1 compliant when it translates.

>   no-cache             = 1 bit
>   no-transform         = 1 bit
>   no-store             = 1 bit
>   must-revalidate      = 1 bit

-1 bit. parameterless no-cache and must-revalidate are semantically 
equivalent. The presence or absence of no-cache-headers below takes care 
of the parametered no-cache cases where the semantics differ.

>   proxy-reval          = 1 bit
>   X                    = reserved
>   max-age              = uintvar
>   s-maxage             = uintvar
>   num-no-cache-headers = 1-byte
>   no-cache-headers     = null-byte separated list of header names
>   num-private-headers  = 1-byte
>   private-headers      = null-byte separated list of header names

private-headers and no-cache-headers are a bit of an overlap. I 
personally would like to see them merged into one semantic field whih 
can be handled the same by caches. But we shall have to investigate that 
first.


> Examples...
>
> Cache-Control: public (encodes as 6 bytes)
> Cache-Control: public, max-age=1600 (encodes as 6 bytes, saving 17 bytes)
>   Cache-Control: no-store, no-transform, must-revalidate (encodes as 6 
> bytes, saving 33 bytes)
>
> So looking at these examples, it is definitely possible to save a lot 
> of space but at the cost of quite a bit of encoding-complexity. I'm 
> sure we could possibly do better but this provides a good starting 
> point, and, it's bidirectionally compatible with 1.1. Whether or not 
> it's worth the effort is a different question entirely.
>
> - James

With these alterations I end up with:

storage controls: 1 byte

  +-+-+---+---+
  |P|T|RWV|rwv|
  +-+-+---+---+

P = private
T = no-transform
R = shared cache read-only
W = shared cache write-only
V = shared cache must-revalidate
r = private cache read-only
w = private cache write-only
v = private cache must-revalidate

Caching heuristic age controls: 12 bytes

  +-+------+--------+--------+--------+
  |V|          min-age / Age:         |
  +-+------+--------+--------+--------+
|V|           max-age              |
  +-+------+--------+--------+--------+
  |V|        max-stale             |
  +-+------+--------+--------+--------+

V = invalid/unset.

A static 13 bytes for cache controls no matter which are set.

OR, a specified order for optional blocks with a byte up front flagging 
which blocks are omitted => variable 2-13 bytes, with on average 6 bytes 
for just the store controls block and max-age block.


no-cache-headers and private-cache-headers, as said I'd like to see 
merged. If not they can be split into a separate header each with the 
same field-value format as Connection:.


Amos

Received on Saturday, 19 January 2013 08:55:33 UTC