Statistics on reusing request headers in persistent connections (repost)

   The benefits of reusing request headers in persistent
   ----------------------------------------------------
   HTTP connections: A statistical analysis.
   -----------------------------------------

                                           Oct 31, 1995
                                           Koen Holtman, koen@win.tue.nl

 1. INTRODUCTION
 ---------------

When sending HTTP request over a persistent (keep-alive) HTTP
connection, it would be possible to re-use request headers from
earlier requests in subsequent requests.  For example, if the
User-agent header for requests n and n+1 are the same, there would be
no need to send the header twice, a special request header (using less
bytes) could indicate that the User-agent header is to be reused.

Roy Fielding recently proposed a mechanism allowing such reuse.  The
question is whether designing and implementing such a mechanism would
be a good move.

For:  - less HTTP traffic
      - faster browsing response time

Against: - more software complexity
         - time spent in design and implementation cannot be
           used for making other improvements

I have made some statistics about the size of the gains.


 2. CONCLUSION
 -------------

My conclusion is that the gains are too small to bother about request
header reuse at this point:
      - HTTP traffic savings would be about 1.3%
      - speedup of browsing response time would be minimal:
        page+inline loading times would be noticeably faster in
        about 17% of all cases.

Much higher gain/effort ratios can be had by focusing on other
desirable features of future HTTP software, for example

 - (general) support for `Content-Encoding: gzip'
 - support for sending .jpg inlines instead of .gif inlines to all
   browsers that can handle .jpg
 - reducing the amount of Accept headers generated by some browsers
   (my Mosaic for X browser sends 822 bytes of accept headers, most of
   them for MIME types I can't even view!), maybe introducing a
   mechanism for reactive content negotiation at the same time.
 - proxies that change multiple Accept headers in a request into one
   big Accept header when relaying the request

I therefore propose to drop the subject of request header reuse on
http-wg.

Header reuse mechanisms would only get interesting again if we find
some good reason to make the average request message much larger (say
500 bytes) than it needs to be now (200 bytes).

(End of conclusions.)

Yes, you can stop reading now!

You can also page to Section 6, which contains some statistics about
the number of requests done over persistent connections.


 3. HOW LARGE DO REQUEST MESSAGES NEED TO BE?
 --------------------------------------------

 3.1 CURRENT ACCEPT HEADER PRACTICE
 -----------------------------------

I captured the request headers sent by the three browsers present on
my Linux box. 

A typical Mozilla/1.12 (X11) GET request message for a normal URL:

  ---------------------------------------------------
  GET /blah/blebber/blex.html HTTP/1.0
  User-Agent: Mozilla/1.12 (X11; I; Linux 1.2.9 i486)
  Referer: http://localhost/blah/blebber/wuxta.html
  Accept: */*
  Accept: image/gif
  Accept: image/x-xbitmap
  Accept: image/jpeg

  ---------------------------------------------------

When GETting URL contents for inline images, Mozilla omits the
`Accept: */*' header above.

Note that the four Accept headers above could be combined into a
single Accept header:

  Accept: */* image/gif image/x-xbitmap image/jpeg .

None of the three browsers on my Linux system do such combining,
though it would make the request message shorter (see also the table
below).  Is there some ancient HTTP server, not supporting
multi-element Accept headers, they want to stay compatible to?

Here is a table of typical GET request message sizes for the browsers
on my Linux system:

  -----------------------+---+---+-----+----
  Browser                 Len Acc (Ac1) Rest
  -----------------------+---+---+-----+----
  NCSA Mosaic for X/2.2   995 882 (299) 113
  Lynx/2.3 BETA           349 248 (100) 101
  Mozilla/1.12 (normal)   207  73  (36) 134
  Mozilla/1.12 (inline)   194  61  (34) 133
  -----------------------+---+---+-----+----

  Len  : #bytes in request message
  Acc  : #bytes in the Accept headers
  (Ac1): #bytes that would be in an equivalent single-line Accept header
  Rest : #bytes in non-Accept headers and first line of request


 3.2 LACK OF NEED FOR LARGE ACCEPT HEADERS
 -----------------------------------------

In current practice on the Web, 99% of all URLs (if not more) only
have one content variant, so the Accept headers contained in a request
are almost never used.  It is unlikely that this will change in the
future.

Thus, there is no good reason for tacking large Accept headers onto a
request, now or in the future.  An accept header larger than

  Accept: */* image/gif image/x-xbitmap image/jpeg

is wasteful, the small number of cases case not covered by the header
above could be solved by reactive content negotiation (300 and 406
responses).  Note that, if a browser discovers it is doing a lot of
reactive content negotiation to a site, it could dynamically make its
Accept headers to that site larger to reduce future reactive
negotiation.  So sending large Accept headers may be efficient
sometimes, but not by default.

I see the large default Accept header problem as a problem that will
disappear with browser upgrades in the near future, after a reactive
negotiation mechanism has been defined.


 4. STATISTICS
 -------------

To make the statistics below, I took a set of proxy<->server HTTP
transactions between the www.win.tue.nl proxy and off-campus servers
(18 days worth of traffic, approximately 150Mb in 14501 HTTP
transactions), and calculated what would happen if these
transactions were all done over persistent HTTP connections.

If a simulated persistent connection has been idle for 10 minutes, it
is closed.


 4.1 HEADER SIZES
 ----------------

Working from the reasoning above, I take the following request
message, generated by Mozilla, as typical.

  ---------------------------------------------------
  GET /blah/blebber/blex.html HTTP/1.0
  User-Agent: Mozilla/1.12 (X11; I; Linux 1.2.9 i486)
  Referer: http://localhost/blah/blebber/wuxta.html
  Accept: */*
  Accept: image/gif
  Accept: image/x-xbitmap
  Accept: image/jpeg

  ---------------------------------------------------

Every header in this message could potentially be reused in future
requests.  Only the `GET' line will always be different.

I will use the following figures in the statistics below:

- Without header reuse, the average request size is 200 bytes

- With header reuse, the average request size is
    - 200 bytes for the first request over a persistent connection
    -  40 bytes for all subsequent requests over a persistent connection

- The average size of the response headers is always 180 bytes.


 4.2 RESULTS
 -----------

 4.2.1 Size of HTTP traffic transmitted.

                           in response  in
                           bodies       headers   total 
    ---------------------+------------+---------+----------------

    Without header reuse:     145 Mb     5.3 Mb   150.3 Mb (100.0%)
    With header reuse:        145 Mb     3.3 Mb   148.3 Mb ( 98.7%)

    Reuse saves:                         2.0 Mb            (  1.3%)

Compared to other possible savings, 1.3% is too little to care about.

But traffic size counts are dominated by very large requests: maybe we
can get a noticeably faster response time on small requests?

 4.2.2. Response time

I use the following approximations for getting response time results:

 - The sequence of requests done over each persistent HTTP connection
   is divided into `wait chains'.

 - Each subsequent request in a `wait chain' is no more than 20
   seconds apart.

 - the idea is that the user does not perceive the speedup of
   individual HTTP transactions in a `wait chain', but only the
   average transaction speedup for the whole `wait chain'.

 - We want to determine the percentage of wait chains that get
   noticeably faster after the introduction of header reuse.

 - We assume that for a wait chain to get noticeably faster, the
   HTTP traffic size generated in that wait chain must decrease
   with at least 10%.

Amount of wait chains with a certain percentage of traffic decrease:

           decrease %   amount
           -----------+-------------
                   0     1069    24%
                 1-4     1763    39%
                 5-9      926    21%
               10-19      396     9%
               20-49      278     6%
               50-         70     2%

Thus, request header reuse will lead to a noticeable speedup for
17% of all wait chains.



 5. ALTERNATIVE 500 BYTE SCENARIO
 --------------------------------

The above statistics assume that 

- Without header reuse, the average request size is 200 bytes

- With header reuse, the average request size is
    - 200 bytes for the first request over a persistent connection
    -  40 bytes for all subsequent requests over a persistent connection

The reasons for these assumptions are given in Section 3.

One could imagine an alternative scenario, in which we have a good (or
bad) reason to make the requests much larger.  To see if introducing
header reuse is a good idea under such a scenario, I made the above
statistics again with the following assumptions:

- Without header reuse, the average request size is 500 bytes

- With header reuse, the average request size is
    - 500 bytes for the first request over a persistent connection
    -  40 bytes for all subsequent requests over a persistent connection

This gets us:

 5.1.1 Size of HTTP traffic transmitted in 500 byte scenario

                           in response  in
                           bodies       headers   total 
    ---------------------+------------+---------+----------------
    Without header reuse:     145 Mb     9.4 Mb   154.4 Mb (100.0%)
    With header reuse:        145 Mb     3.7 Mb   148.7 Mb ( 96.3%)

    Reuse saves:                         5.7 Mb            (  3.7%)


 5.1.2 Response time in 500 byte scenario

Amount of wait chains with a certain percentage of traffic decrease:

           decrease %   amount
           -----------+------------
                   0     809    18%
                 1-4     884    20%
                 5-9     749    17%
               10-19     980    22%
               20-49     718    16%
               50-       362     8%

Thus, request header reuse will lead to a noticeable speedup for 46% of
all wait chains.

I conclude that header reuse becomes moderately interesting _IF_ we
find a good reason use request messages which contain a large (>460
bytes) amount of reusable headers.


 5.1.2 Comparison between Section 4 and 500 byte scenario
 ---------------------------------------------------------

Traffic generated:
                                    in response  in
                                    bodies       headers   
    -------------------------------+------------+---------

    Section 4 without header reuse:  145 Mb       5.3 Mb 
    Section 4 with header reuse:     145 Mb       3.3 Mb
    500 byte without header reuse:   145 Mb       9.4 Mb
    500 byte with header reuse:      145 Mb       3.7 Mb 


Amount of wait chains with a certain percentage of traffic decrease,
when going from Section 4 _without_ reuse to 500 byte _with_ reuse:

           decrease %   amount
           -----------+------------
               - -21     121     3%
           -20 - -11     189     4%
           -10 -  -6     198     4%
            -5 -  -1     442    10%
             0 -   4    2088    46%
             5 -   9     784    17%
            10 -  19     356     8%
            20 -         324     7%

(7% of wait chains get noticeably slower, 15% get noticeably faster)


 6. RANDOM STATISTICS
 -------------------

The statistics below are not very relevant for deciding about reuse,
but they are nice to have anyway.

Amount of proxy<->server responses with a certain response body size:

    body size (bytes)  amount cumulative amount
    ------------------+------+-----------------
                0-99     4%     4%
             100-199     4%     8%
             200-499     8%    16%
             500-999     9%    25%
           1000-1999    19%    44%
           2000-4999    25%    69%
           5000-9999    16%    85%
         10000-19999     7%    92%
         20000-49999     6%    97%
         50000-99999     2%    99%
        100000-          1%   100%


Amount of persistent proxy<->server connections over which a certain
number of HTTP transactions are made (the connections have a timeout
of 10 minutes):

- on average, one persistent connection gets 9.2 transactions.

    # of transactions   amount     cumulative amount
    ------------------+-----------+-----------------         
                   1    415   26%     26%
                   2    214   14%     40%
                   3    169   11%     50%
                   4    118    7%     58%
                 5-6    148    9%     67%
                 7-9    134    8%     76%
               10-19    198   12%     88%
               20-49    139    9%     97%
               50-       49    3%    100%


(End of document.)

Received on Thursday, 8 August 1996 16:18:38 UTC