Re: [AC] URI canonicalization problem with Access-Control-Policy-Path from Jonas Sicking on 2008-06-11 (public-webapps@w3.org from April to June 2008)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 11 Jun 2008 16:07:45 -0700
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: Anne van Kesteren <annevk@opera.com>, "WAF WG (public)" <public-appformats@w3.org>, public-webapps@w3.org
Message-ID: <48505AC1.2010804@sicking.cc>
Bjoern Hoehrmann wrote:
> * Anne van Kesteren wrote:
>> The feature avoids the overhead you get when you need to issue 10 POST  
>> requests to 10 distinct URIs on the same server scoped to some path.  
>> Without Acess-Control-Policy-Path that requires 20 requests. With  
>> Access-Control-Policy-Path it requires 12. So for the N requests you want  
>> to make it roughly safes you of N additional requests for larger values of  
>> N.
> 
> Consider you have a simple echo protocol, a client sends a line to the
> server and the server returns the same line to the client; then they
> repeat until m bytes have been transferred. What does it matter how many
> lines these m bytes have been split into? It does not, and it does not
> matter much with HTTP either if you have persistent connections, it's
> just a bit more complicated to find the end of a message.
> 
> So let's look at the traffic instead. The minimal OPTIONS request and
> response look somewhat like this:
> 
>       +---------------------------+      +---------------------------+
>       | OPTIONS /example HTTP/1.1 |      | HTTP/1.1 200              |
>       | Host:example.org          | ===> | Access-Control:allow <*>  |
>       | Origin:example.net        | <=== | Content-Length:0          |
>       |                           |      |                           |
>       +---------------------------+      +---------------------------+
>                               --- 128 byte ---
> 
> So let's assume the average OPTIONS request generates three times as
> much (you may want to include a Data or User-Agent header, for example).
> The POST requests and responses are quite a bit longer, you would send
> various Accept headers, Content-Type, Set-Cookie, whatever, so let's
> assume an average of 7*128 bytes protocol overhead per transaction:
> 
>   80% ++-----------+-----------+------------+-----------+---··········
>       +            +           +            +OPTIONS exchange xxxxxx +
>   70% oo                            ········Protocol Overhead oooooo++
>   60% ++oo                   ·······           Message Bodies ······++
>       |   oo            ·····                                        |
>   50% ++    ooo     ····                                            ++
>   40% ++       oooo·                                                ++
>       |       ···  ooooo                                             |
>   30% xx    ···         ooooooo                                     ++
>       | xxxxx                  oooooooooooo                          |
>   20% ++ ·· xxxxxxxxxx                     ooooooooooooooooooooo    ++
>   10% ++··            xxxxxxxxxxxxxxxxxxxxx                    ooooooo
>       +·           +           +           xxxxxxxxxxxxxxxxxxxxxxxxxxx
>    0% ·+-----------+-----------+------------+-----------+-----------++
>       0           1000        2000         3000        4000         5000
>       Average size of message bodies in bytes (request plus response).
> 
> If your average request and response body are together less than about
> 2000 bytes, it would be silly to use many POSTs to different resources
> and care a lot about optimizing the OPTIONS overhead away, you would
> gain much more by using fewer POSTs which would then eliminate OPTIONS
> overhead aswell. And if the average size exceeds about 3000 bytes, you
> would gain only very little, almost all resources are spent processing
> the POST requests.
> 
> (Calculating how much you might save in terms of "load" on the server
> is more difficult than this simple model, because you have to consider
> how the server handles concurrent and subsequent requests and consider
> the cost of creating e.g. new threads and new processes; that's very
> specific to the web server software, operating system, their configu-
> ration, and even the hardware they are running on, and you might pick
> slightly different numbers than I have; but the conclusion is similar
> in all reasonable cases).
> 
> Note that you can always avoid the overhead of cross site posts using
> cross document messaging and same origin posts, the target just needs
> to install a suitable web page that accepts and dispatches the requests.
> At increased risk of course, but also with increased flexibility and
> very likely better performance compared to cross site requests, whether
> utilizing Access-Control-Policy-Path or not.
> 
> Less is always less of course, but let's look at how many requests per
> page load is considered normal among the homepages of the the Alexa 100
> sites (www-archive has the raw data including methodology):
> 
>       +---------------+---------------+---------------+---------------+
>    16 ***             +          Number of sites in this range ******++
>    14 *+*     *******                                                ++
>    12 *+*     *  *  *                                                ++
>    10 *+*     *  *  *****                                            ++
>     8 *+*******  *  *   *                                            ++
>     6 *+*  *  *  *  *   *                                            ++
>     4 *+*  *  *  *  *   *************                                ++
>     2 *+*  *  *  *  *   *  *  *  *  *****************                ++
>     0 *****************************************************************
>       +---------------+---------------+---------------+---------------+
>       0               50             100             150             200
>               Number of requests when loading the front page.
> 
> That is an average of about fifty requests and a median of about fourty.
> For the nba.com website you have to wait for over 200 requests to fully
> load the page, and this isn't in response to any user-initiated data
> submission. I am afraid I do not see much need for this misdesigned op-
> timization. Certainly not in the first version of this specification.

I agree that from a load point of view optimizing away the OPTIONS 
requests if you have a large set or POSTs makes less sense than 
optimizing to fewer POSTs.

The reason that was brought up was people that wanted to support a 
proper REST API for a forum which apparently resulted in lots of POSTs 
(or possibly other methods) to lots of different URIs.

And since we have a caching mechanism when doing multiple POSTs to the 
same URI there was concern that that would either discourage proper REST 
which required POSTs to multiple URIs, or there was concern that it 
would be a bad hit for the people that did use proper REST. Or both.

However, I would be perfectly happy with not addressing this use case 
for now. As you bring up, this is just the first version of the spec. 
Though I'm also ok with keeping it in with the "..\" amendment.

/ Jonas
Received on Wednesday, 11 June 2008 23:11:24 UTC