- From: Roberto Peon <grmocg@gmail.com>
- Date: Thu, 4 Apr 2013 17:52:31 -0700
- To: Mark Nottingham <mnot@mnot.net>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAP+FsNegsAR16OaM5C-h6Xi67YSHh5uOWGHf2D1auQV7OxmYbA@mail.gmail.com>
It used the default streamifier. Happy to provide data with whatever streamifier... -=R On Apr 4, 2013 5:02 PM, "Mark Nottingham" <mnot@mnot.net> wrote: > How did you handle connection coalescing (the "streamifier" issue)? > > Cheers, > > > On 05/04/2013, at 10:55 AM, Roberto Peon <grmocg@gmail.com> wrote: > > > Here is some data based on an analysis of a perfect (i.e. resource > unconstrained) atom-based compressor. > > I've removed any serialization sizes from this-- this will hold true for > HeaderDiff or Delta or whatever, presupposing it does atom-based > compression. > > > > Feel free to jump to the bottom for take-aways if looking at the data is > boring :) > > > > The top most compressible headers are (key% is the percentage of the > compressible bytes where were from the key): > > |compressed > > key-name | bytes key% val% > > -----------------------+----------------------- > > user-agent | 2443472 9.37% 90.63% > > cookie | 1957141 2.88% 97.12% > > referer | 1340198 11.89% 88.11% > > via | 980712 4.53% 95.47% > > accept-charset | 700174 32.62% 67.38% > > accept-language | 685850 48.86% 51.14% > > accept-encoding | 676516 49.53% 50.47% > > cache-control | 571590 50.27% 49.73% > > date | 544049 16.31% 83.69% > > x-cache-lookup | 523453 37.32% 62.68% > > content-type | 521956 50.88% 49.12% > > :host | 506321 23.66% 76.34% > > accept | 416278 32.65% 67.35% > > x-cache | 414766 28.26% 71.74% > > last-modified | 398843 58.89% 41.11% > > proxy-connection | 389619 63.48% 36.52% > > server | 369522 32.27% 67.73% > > :path | 337031 35.55% 64.45% > > content-length | 313985 94.79% 5.21% > > expires | 305406 35.69% 64.31% > > :method | 239569 70.02% 29.98% > > :status | 237574 70.61% 29.39% > > p3p | 206589 2.93% 97.07% > > accept-ranges | 198471 73.02% 26.98% > > content-encoding | 124912 81.30% 18.70% > > vary | 110368 22.38% 77.62% > > age | 70211 66.89% 33.11% > > set-cookie | 66181 23.90% 76.10% > > etag | 62416 48.63% 51.37% > > x-powered-by | 54561 46.49% 53.51% > > x-content-type-options | 47171 77.61% 22.39% > > x-varnish-server | 33947 51.61% 48.39% > > x-varnish | 33376 89.04% 10.96% > > x-xss-protection | 28685 58.73% 41.27% > > x-cdn | 27428 23.06% 76.94% > > location | 26152 19.79% 80.21% > > transfer-encoding | 25147 72.94% 27.06% > > xcache | 24800 18.75% 81.25% > > ... > > > > The distribution of backreferences is falls off extremely quickly as > distance from the newest element increases: > > 1 35460 > > 2 20774 > > 3 16886 > > 4 12926 > > 5 9601 > > 6 6947 > > 7 5100 > > 8 4304 > > 9 3658 > > 10 3313 > > 11 2789 > > 12 2710 > > 13 2678 > > 14 2354 > > 15 2372 > > 16 2180 > > 17 2066 > > 18 2023 > > 19 1979 > > 20 1885 > > 21 1888 > > 22 1799 > > 23 1724 > > 24 1673 > > 25 1584 > > 26 1530 > > 27 1506 > > 28 1435 > > 29 1395 > > 30 1305 > > 31 1355 > > 32 1283 > > 33 1305 > > 34 1341 > > 35 1339 > > 36 1178 > > 37 1200 > > 38 1223 > > 39 1169 > > 40 1180 > > 41 1111 > > 42 1105 > > 43 1074 > > 44 1079 > > 45 1043 > > 46 1050 > > 47 1030 > > 48 972 > > 49 963 > > 50 942 > > 51 935 > > 52 916 > > ... > > > > I've attached a png of a graph of this (y axis=frequency, x-axis=dist > from newest element). > > The knee in the graph is very nice indeed. > > > > > > Take aways? > > • We need a better survey of headers from everywhere :) > > • Compression over our corpus should scale favorably with small > table (and state) size. > > • Encoding index as dist-from-newest really works well, and LRU > appears to be extremely effective as an expiration policy (the attached > graph looks good). > > • We're getting substantial compression from both key and value > backreferences/tokenization. > > • Algorithmically, there isn't a whole lot to do-- the devil is > really in the serialization details and the tradeoffs involved in > generating/parsing. There are obvious tweaks that compressors could do when > space constrained (e.g. looking at the first table, above, as the likely > benefit and making decisions based upon that), but the data which suggests > that the LRU is so effective also suggests that this benefit is likely > limited unless they can predict the future :) > > > > -=R > > > > > > > -- > Mark Nottingham http://www.mnot.net/ > > > > >
Received on Friday, 5 April 2013 00:53:03 UTC