Re: [AC] URI canonicalization problem with Access-Control-Policy-Path from Bjoern Hoehrmann on 2008-06-11 (public-webapps@w3.org from April to June 2008)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 11 Jun 2008 05:24:59 +0200
To: "Anne van Kesteren" <annevk@opera.com>
Cc: "WAF WG (public)" <public-appformats@w3.org>, public-webapps@w3.org
Message-ID: <avau44tga0ahd18cimj0h8fnnb830hkurk@hive.bjoern.hoehrmann.de>
* Anne van Kesteren wrote:
>The feature avoids the overhead you get when you need to issue 10 POST  
>requests to 10 distinct URIs on the same server scoped to some path.  
>Without Acess-Control-Policy-Path that requires 20 requests. With  
>Access-Control-Policy-Path it requires 12. So for the N requests you want  
>to make it roughly safes you of N additional requests for larger values of  
>N.

Consider you have a simple echo protocol, a client sends a line to the
server and the server returns the same line to the client; then they
repeat until m bytes have been transferred. What does it matter how many
lines these m bytes have been split into? It does not, and it does not
matter much with HTTP either if you have persistent connections, it's
just a bit more complicated to find the end of a message.

So let's look at the traffic instead. The minimal OPTIONS request and
response look somewhat like this:

      +---------------------------+      +---------------------------+
      | OPTIONS /example HTTP/1.1 |      | HTTP/1.1 200              |
      | Host:example.org          | ===> | Access-Control:allow <*>  |
      | Origin:example.net        | <=== | Content-Length:0          |
      |                           |      |                           |
      +---------------------------+      +---------------------------+
                              --- 128 byte ---

So let's assume the average OPTIONS request generates three times as
much (you may want to include a Data or User-Agent header, for example).
The POST requests and responses are quite a bit longer, you would send
various Accept headers, Content-Type, Set-Cookie, whatever, so let's
assume an average of 7*128 bytes protocol overhead per transaction:

  80% ++-----------+-----------+------------+-----------+---··········
      +            +           +            +OPTIONS exchange xxxxxx +
  70% oo                            ········Protocol Overhead oooooo++
  60% ++oo                   ·······           Message Bodies ······++
      |   oo            ·····                                        |
  50% ++    ooo     ····                                            ++
  40% ++       oooo·                                                ++
      |       ···  ooooo                                             |
  30% xx    ···         ooooooo                                     ++
      | xxxxx                  oooooooooooo                          |
  20% ++ ·· xxxxxxxxxx                     ooooooooooooooooooooo    ++
  10% ++··            xxxxxxxxxxxxxxxxxxxxx                    ooooooo
      +·           +           +           xxxxxxxxxxxxxxxxxxxxxxxxxxx
   0% ·+-----------+-----------+------------+-----------+-----------++
      0           1000        2000         3000        4000         5000
      Average size of message bodies in bytes (request plus response).

If your average request and response body are together less than about
2000 bytes, it would be silly to use many POSTs to different resources
and care a lot about optimizing the OPTIONS overhead away, you would
gain much more by using fewer POSTs which would then eliminate OPTIONS
overhead aswell. And if the average size exceeds about 3000 bytes, you
would gain only very little, almost all resources are spent processing
the POST requests.

(Calculating how much you might save in terms of "load" on the server
is more difficult than this simple model, because you have to consider
how the server handles concurrent and subsequent requests and consider
the cost of creating e.g. new threads and new processes; that's very
specific to the web server software, operating system, their configu-
ration, and even the hardware they are running on, and you might pick
slightly different numbers than I have; but the conclusion is similar
in all reasonable cases).

Note that you can always avoid the overhead of cross site posts using
cross document messaging and same origin posts, the target just needs
to install a suitable web page that accepts and dispatches the requests.
At increased risk of course, but also with increased flexibility and
very likely better performance compared to cross site requests, whether
utilizing Access-Control-Policy-Path or not.

Less is always less of course, but let's look at how many requests per
page load is considered normal among the homepages of the the Alexa 100
sites (www-archive has the raw data including methodology):

      +---------------+---------------+---------------+---------------+
   16 ***             +          Number of sites in this range ******++
   14 *+*     *******                                                ++
   12 *+*     *  *  *                                                ++
   10 *+*     *  *  *****                                            ++
    8 *+*******  *  *   *                                            ++
    6 *+*  *  *  *  *   *                                            ++
    4 *+*  *  *  *  *   *************                                ++
    2 *+*  *  *  *  *   *  *  *  *  *****************                ++
    0 *****************************************************************
      +---------------+---------------+---------------+---------------+
      0               50             100             150             200
              Number of requests when loading the front page.

That is an average of about fifty requests and a median of about fourty.
For the nba.com website you have to wait for over 200 requests to fully
load the page, and this isn't in response to any user-initiated data
submission. I am afraid I do not see much need for this misdesigned op-
timization. Certainly not in the first version of this specification.

>Ian was one of the persons who proposed this feature and he doesn't think  
>it's worthwhile to have it if it's scoped to the entire triple (just  
>allowing the / value for instance).

I believe Ian initially didn't think "preflight" requests were necessary
for POST requests to begin with, then he thought the additional requests
weren't much of an issue and require no optimization, then they became
"somewhat painful", and I think the latest is "high cost". It seems wise
then to use some reasoning instead of relying on his opinion.

If none of your scripts on the host is secured against malicious cross
site requests, you should not be using the specification's features at
all. If all your scripts are secured (I am /assuming/ that is possible),
then there would be no problem scoping it on the whole host (you'd have
to worry about denial of service perhaps, but you have that either way).

So Ian would seem to be saying this feature is only really useful if you
mix secured and unsecured scripts on the same host. Now clearly you have
your request savings whether or not you've secured your other scripts,
so either those savings are not what makes the feature worthwhile, or it
is not actually possible to secure those other scripts (and you cannot
simply put the cross site scripts on their own host).

Now I don't know which it is, perhaps the header is really only meant as
placebo for people irrationally afraid of seeing many OPTIONS requests
in their server logs, or perhaps it is expected that the 'Origin' header
will be filtered out frequently in which case you probably cannot tell
same-origin and cross-site requests apart. There are many possibilities,
but right now Ian's stance, as you relay it anyway, seems rather silly.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Wednesday, 11 June 2008 03:25:42 UTC