[whatwg] Proposal for separating script downloads and execution from Boris Zbarsky on 2011-02-17 (public-whatwg-archive@w3.org from February 2011)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Thu, 17 Feb 2011 15:24:57 -0500
Message-ID: <4D5D8419.9040502@mit.edu>
On 2/17/11 2:38 PM, Kyle Simpson wrote:
> I don't know of any browsers which are set to download more than 8
> parallel connections.

You don't need it, if the content is cached, right?

>I can't imagine that you'd have 10,000 separate
> downloads of the same resource. Depending on race conditions, you might
> get a few extra requests in before the item was cached (if it caches).
> But, once the item is in the cache, any additional identical requests
> for that element should be pulling from the cache (from a
> network-request-level, that is), right?

Yes, but they have to pull from the cache and then hold on to the data, 
because the cache might change later, no?  That's the whole point.  If 
you're forced to load at particular times, you can't fake it by actually 
not loading and then reading it from cache later, because later the data 
might be different and you'd have to go hit the network instead of using 
the cached version... and will run the wrong script.

This sort of thing should be detectable if you can tell whether the 
script is done with the "load the data but don't run it" bit.

> The question becomes, can the browser create a unique in-memory "cache"
> entry for each distinct script contents' processing, such that each
> script element has a pointer to its appropriate copy of the
> ready-to-execute script contents, without duplication?

Maybe or maybe not.  It really depends on how much control there is over 
the HTTP implementation.

> I can't imagine the browser would need separate copies for identical script contents,
> but perhaps I'm missing something that prevents it from doing the
> uniqueness caching.

The fact that the browser might not have a good way to tell that two 
scripts are referencing the same content, for one thing (short of 
loading both and then comparing them to each other).

And yes, you can build a very complicated system to make this all work 
somehow.  It's all software.  You could implement a whole separate HTTP 
stack just for this, even, if needed.  I wasn't saying this is 
_impossible_ to do, just that it's a significant amount of work to do 
this well.

> Even if they can't , what it means is, these edge cases with 10,000
> script requests might get exponentially bad. But it still seems like the
> normal majority cases will perform roughly the same, if not better. Or
> am I missing something?

Yes, the fact that exponential (heck, even polynomial nonlinear!) 
badness is Really Bad from users' and hence implementors' point of view.

If we can find as solution that makes the sane cases better without 
having failure points like this, that's vastly preferable to one that 
has such failure points.

>> Doing that while obeying HTTP semantics might be pretty difficult if
>> you don't have very low-level control over your network layer.
>
> I'm not sure what you mean by "HTTP semantics" if it isn't about
> caching.

It is.  Specifically when you can cache what.

> But I don't think there'd be any reason that this proposal
> would be suggesting any different handling of resources (from a HTTP
> semantics, network-request-layer perspective) than is already true of
> script tags.

Sure.  The difference is that it would require that script text that can 
otherwise be either not loaded at all.

> I wonder how IE is handling this, seemingly without too many issues
> (since they've been doing it forever).

Are you sure there are no issues?

> In other words... if I loop through and create 10,000 script elements
> (no DOM append), and that causes 10,000 (or so) requests for that
> resource... how is that different/worse than if I loop through and
> create 10,000 script elements that I append to the DOM? Won't they have
> roughly the same impact on HTTP-layer loading, caching, etc?

In the latter case the browser doesn't have to have the scripts all in 
memory at once; it can execute them as they load, then forget them.

More importantly. the scripts will execute 10,000 times, which you will 
definitely notice.

>> Just to be clear, that process, on our end, was a huge
>> engineering-time sink. Several man-months were wasted on it. We would
>> very much like to avoid having to repeat that experience if at all
>> possible.
>
> It's a shame that it's being viewed as "wasted".

I can call it "spent" if you prefer, but the upshot is that people who 
could have been working on various standards implementation were instead 
hand-holding sites through the problem...

This wasn't specific to async="false", btw; it was the combination of 
all the script ordering changes.  Sorry if that wasn't clear.

>> Does IE obey HTTP semantics for the preloads? Has anyone does some
>> really careful testing of IE's actual behavior here?
>
> I've done a lot of careful testing of IE's actual behavior here. But I'm
> not sure I know exactly what HTTP semantics I should be looking for. If
> you would be able to share some specific questions to probe about IE's
> implementation/behavior, I'm more than happy to extend my existing tests
> to figure out the answers.

Some examples:

1)  If your script is no-cache, or max-age:0, does IE make a new
     request for it for every <script> element?
2)  If you create a bunch of <script> elements and set src on them all
     and the script returned is different on every GET, and then you run
     them, do you see all the different scripts running?
3)  If you do that experiment with 1,000 scripts all of which return
     the same 50KB of data and none of which you insert, do you see
     memory usage go up by 50MB?  Does this depend on whether the
     requests can be satisfied from cache or not?

-Boris
Received on Thursday, 17 February 2011 12:24:57 UTC