Re: [whatwg] Preloading and deferred loading of scripts and other resources from Ian Hickson on 2014-09-08 (public-whatwg-archive@w3.org from September 2014)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 8 Sep 2014 20:33:57 +0000 (UTC)
To: whatwg <whatwg@lists.whatwg.org>
Message-ID: <alpine.DEB.2.00.1409081816450.28228@ps20323.dreamhostps.com>
I got some feedback on my last e-mail to the effect that having the 
proposal sandwiched between the rationale and the examples of how it would 
be used made it hard to find, so I'm reproducing the proposal here 
(slightly updated based on feedback):

---------------------------------------------------------------------------
These "loadable" elements:

   <script>, <link>, <style>, <video>, <img>, <object>, <iframe>, <audio>

...get the following new attributes:

   needs=""          Gives a list of IDs of other elements that this
                     one needs, known as The Dependencies. Each
                     dependency is added to this element's
                     [[Dependencies]] in the ES6 loader.

   loadpolicy=""     The load policy. Consists of a space-separated
                     set of keywords, of which one may be from the
                     following list: block, async, optimistic,
                     when-needed, late-run, declare. The other allowed
                     keywords are precache, low-priority, and force.
                     (Maybe we disallow "block" and "force" since
                     they're for legacy only.) Different elements have
                     different defaults. "precache" isn't allowed if
                     the keywords "block" or "async" are specified,
                     since those always load immediately. The
                     keywords' meanings are as follows:

                      block - stop parsing until this resource is
                      applied

                      async - fetch now, apply asap

                      optimistic - fetch when needed, apply asap

                      when-needed - fetch when needed, apply when
                      needed

                      declare - fetch when needed, don't apply

                      precache - for "fetch with needed", consider
                      fetching earlier

                      low-priority - let other things go first

                      force - always fetch anew, don't de-dupe


   loadsettings=""   A JSON-encoded dictionary to pass to the Request
                     constructor. (Or some other syntax. Proposals
                     welcome. JSON isn't great in an attribute.)

...and API:

   .addDependency()  Passed a promise, makes this element depend on that
                     promise. Passed a "loadable" element, does the same
                     as if that element's ID was mentioned in needs="".

   .load()           Mark the element as needed, and apply or execute it 
                     as soon as possible. Returns the new .loaded 
                     promise (any earlier one is rejected).

   .ready            Promise indicating calling load() will immediately
                     apply or execute when load() is called.

   .loaded           Promise indicating that the element has applied or 
                     executed.

   .request          The current Request object, if a fetch has been 
                     started.

   .needs            reflects needs, maybe as a custom object, or 
                     otherwise as a DOMTokenList

   .loadPolicy       reflects load-policy, maybe as a custom object, or
                     otherwise as a DOMTokenList

   .loadSettings     reflects load-settings, maybe as a custom object

These elements can be in six states. The first five are sequential;
elements try to go through them in turn:

   - idle (the initial state at creation time)
   - prefetching...
   - ready (matches the .ready promise)
   - loading...
   - loaded (matches the .loaded promise)

...and the sixth is "error", meaning something failed permanently.

Setting src="", or whatever causes the element's state to be reset,
immediately rejects the preexisting .loaded promise and creates a new
one, moving the element back to "idle".

When an element is created, it's added to the ES6 module registry.
(When one of these elements has its ID or URL changed, its entries in
the registry are updated.) The ES6 LoadModule() operation is called
for this module (that's how it is added to the registry).

Except if the load policy has the "force" flag, when the element is
added to the registry it's done in such a way as to rely on ES6
deduping.

An element can be needed. By default it's not, unless it has a
loadpolicy of "block" or "async". Upon creation, and when its needs=""
is changed while the element is still not ready, or when another
element's ID is changed and that matches an ID in an element's
needs="", the element's [[Dependencies]] list is updated accordingly.

When an element is marked as needed, all the things in its
[[Dependencies]] get marked as needed also.

An element in "idle" moves to "prefetching" if the loadpolicy is
optimistic and the browser has nothing better to do, or the loadpolicy
has "precache" declared and the browser has nothing to do, or the
element is marked as "needed" somehow.

An element's "fetch" hook blocks until the element reaches
"prefetching". Once it does, if this is something to download, it
creates a Request object from the loadsettings="" attribute and the
appropriate URL. For inline scripts and styles, the body comes from
the element. Once the fetch hook finishes, an event is to be fired at
the element. Once all its dependencies are ready, another event is to
be fired at the element.

When a script is needed, once it's ready, it is EnsureEvaluated() (see
the ES6 spec for details).

When scripts run, if they throw an uncaught exception then they go to
the "error" state and that prevents any dependencies from resolving.
If something is being loaded and it depends on something that's
reached "error", it aborts loading. Something that depends on a
resource in the "error" state won't load, it'll just transition to
"error" straight away. (Or should it just wait, so you can remove the
dependency and unblock it?)

Changing the "loadpolicy" doesn't reset the element's state, it just
causes it to resume from its current state with the new policy. If the
new policy is irrelevant (because it applies to a state earlier than
the current one), then nothing interesting happens. For instance,
moving from "block" to "declare" after the file has already been
executed does nothing.

For style elements (and most elements, in fact), the "execute"
callback does nothing. The StyleSheet object is created earlier. The
script is applied once it is both needed and ready (in place of
EnsureEvaluated()). While it is marked with a loadpolicy of "declare",
subsequent fetches of the same URL in a style sheet context in the
same Loader will use the same file, not refetch it.

Certain loadable elements can also be told to execute by the browser
even when they are awaiting being needed. Notably, <video>, <audio>,
<img>, <object>, and <iframe> will self-need themselves if the come
into view.

The <link rel=preload> feature is given an attribute, kind="", which
it can use to determine how to parse the file (image, style sheet,
HTML import, JS script, JS module, ...).

Documents add things like synchronous scripts, style sheets, deferred
scripts, images, etc, to their dependency list while they are loading.

When a Document loads, all the <script defer> elements are told they
are needed.

Elements that aren't one of these "loadable" elements that end up
being imported as ES6 modules act as follows:

 - fetch: blocks until the ID is visible
 - instantiate: returns the element

By default, the priority of loads is based on the load policy and
whether the element is needed or not. UAs can use the dependency tree
to manage the priority in a more fine-grained fashion. All things
being equal, resources referenced earlier in the document are more
important than resources references later.
---------------------------------------------------------------------------

For examples and use cases see:
   http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Aug/0177.html

I'm mainly looking for whether this is something vendors are interested in 
or not. Should I spec this? Will you implement it? What changes would it 
need before you'd want to implement it?


On Sat, 23 Aug 2014, Kyle Simpson wrote:
> 
> 1. A "hand-authorable markup only" (aka "zero script loaders") approach 
> is, and always will be, limited. Limited to what? To the initial page 
> load.

There's a number of efforts that go beyond the initial page load. For 
example, Ilya's work on rel=prerender, and indeed the very proposal on 
this thread, which is about managing resource loads throughout the page's 
lifetime, even resources not loaded during the initial page load.


> The hand-authored-markup-only solutions being proposed largely don't 
> care at all about the script loaders use cases

I'm pretty sure all the use cases you have given were explicitly addressed 
by the proposal on this thread. Can you elaborate on which use cases you 
have that weren't addressed? A big part of the focus, especially the part 
of this relating to integrating with the ES6 module loader mechanism, is 
explicitly about making it possible for scripts to manipulate this whole 
mechanism.


> var s1 = document.createElement("script");
> s1.id = "some-id-1";
> s1.src = "http://some.url.1";
> 
> var s2 = document.createElement("script");
> s2.id = "some-id-2";
> s2.src = "http://some.url.2";
> s2.needs = s1.id;
> 
> var s3 = document.createElement("script");
> s3.src = "http://some.url.3";
> s3.needs = s1.id + "," + s2.id;
> 
> document.head.appendChild(s3); // append in reverse order to make sure `needs` is registered early enough
> document.head.appendChild(s2);
> document.head.appendChild(s1);

I don't understand why these have to be added in reverse order. Can you 
elaborate?

Here's how I'd imagine the above being done in a scripted loader:

   // utility function
   var globalScriptID = 0;
   function addScript(src, needs) {
     var e = document.createElement('script');
     e.src = src;
     e.id = 'script' + ++globalScriptID;
     e.loadPolicy = 'when-needed';
     e.needs = needs.map(function (a) { return a.id }).join(' ');
     document.body.appendChild(e);
     return e;
   }

   var s1 = addScript('http://some.url.1');
   var s2 = addScript('http://some.url.2', [s1]);
   var s3 = addScript('http://some.url.3', [s1, s2]);

Is this not satisfactory? It seems like it would also work find for the 
case of non-hand-authoring.


> But, as the chains get longer (more scripts) and the dependencies get 
> more complex, this becomes increasingly difficult (in fact, approaching 
> impossible/impractical) to actually generate via a generalized script 
> loader.

I don't really understand why. If you have a tree, you just have hte 
script map that tree to the equivalent HTML. Just walking the tree should 
be sufficient as far as I can tell. Can you elaborate on why this wouldn't 
work for this case?


> Why? Because the script loader has to know the entire list of scripts 
> (IOW it needs to make its own internal queue/s) in this group, before it 
> can reason about how to generate the ID's and wire up the attributes.

Why? You should be able to do it in a streaming fashion, as far as I can 
tell.


> By contrast, most good script loaders right now that take advantage of 
> the `async=false` internal browser loading queue can start loading in 
> 1->2->3 order in a streaming fashion, not having to wait for all 3 to 
> start. Why? Because the `async=false` loading queue that the browser 
> provides implicitly handles the ordering.

That will still work.


> The result? Current script loaders can load 1. Then later 2. Then later 
> 3. And regardless of how long or short "later" is, or of how quickly 
> things load, or if all 3 are indeed loaded "at the same time" -- in all 
> these variations, the queue the browser provides just makes the 
> loading/ordering work.

Sure. That would still work. The proposed feature would only add the 
ability to have the browser manage this so that it's even simpler.


> 2. There is, intuitively, some threshold of complexity of markup beyond 
> which most hand-authors will not go.
> 
> They may author:
> 
> <script src="http://some.url.1" async id="s1">
> <script src="http://some.url.2" async needs="s1">
> 
> But, most probably would never author:
> 
> <script src="http://some.url.1" async id="s1">
> <script src="http://some.url.2" async id="s2" needs="s1">
> <script src="http://some.url.3" async id="s3">
> <script src="http://some.url.4" async id="s4" needs="s2, s3">
> <script src="http://some.url.5" async id="s5" needs="s2">
> <script src="http://some.url.6" async needs="s3, s5">
> …
> 
> No matter how good we are at inventing a markup syntax that handles all 
> the script loading use-cases, only a simple subset of that will ever be 
> hand-authored.
> 
> The rest? YOU GUESSED IT!!! Will be handled by a "script loader". And by 
> that, I don't just mean a script in the runtime, I mean some tool in the 
> build process on the server that's generating markup that has to go 
> through some manifest of dependencies and construct a tree and figure 
> out all that interconnected markup and inject it.
> 
> As soon as we admit that such tools will exist, we're back to my (1) 
> above, which is that "hand-authorable markup-only" is a false premise.

As far as I can tell, the markup above would work fine for this script 
loader case. Can you elaborate on why a build script couldn't generate 
this markup?



On Sat, 23 Aug 2014, Ben Maurer wrote:
> 
> (1) Dependencies in this model seem to be strict execution dependencies. 
> It's possible some use cases might want to use dependencies to describe 
> loading priority.
> 
> As an example, imagine a Facebook page with 3 JS files:
> Feed.js -- Needed to render the user's feed
> Chat.js -- Displays the user's friends who are online
> PhotoViewer.js -- When the user clicks a photo, this file creates a photo
> viewer allowing the user to browse an album. Until the user clicks
> something, this file isn't needed.
> 
> Roughly, the behavior we want here is: load Feed, then Chat, then
> PhotoViewer. But if the user clicks a photo move PhotoViewer to the front
> of the line.
> 
> How do I express the relative ordering of Feed vs Chat vs PhotoViewer?

The easiest way to address this is probably to just say that in the 
absence of any other factors, the tree order is the final tie-breaker. 
Then you would list them in that order (feed, chat, photoviewer), marked 
with "precache". If one is needed, it'll be bumped to the front of the 
line, otherwise, it'll just be in the given order.



> (2) Can inline scripts have needs? This seems like something potentially 
> useful (eg, you include a script tag for Google maps then you have an 
> inline script to create a map. You want the inline script to only 
> execute once maps is loaded)

Yes, this isn't limited to external resources.


> (3) How does loading interact with the cache? For example, if I have a 
> <link load-policy="declare"> and the resource is in the browser's cache, 
> would the ready promise be fulfilled? This could be very useful for 
> applications -- sometimes there is functionality you are willing to 
> execute immediately if the resource is in the user's cache.

HTTP caching semantics would be honoured (or not) as they are now.


On Tue, 26 Aug 2014, Ilya Grigorik wrote:
> 
> The first thing that strikes me about this entire topic is its scope, 
> and I'm wondering if we should take a step back and (a) extract some 
> lower level primitives, (b) leave the rest to libraries and web 
> developers to experiment with, and (c) if (b) leads to some good 
> patterns, then codify them at a later date... Instead of trying to 
> tackle all of this at once.

That's exactly what I tried to do here.


> In particular, it seems like we might be coupling two topics:
> (1) a flexible declarative mechanism to fetch arbitrary resources
> (2) some set of mechanisms to express dependencies, execution order, etc.
> 
> If we do our job right with (1), I think (2) can (should?) be deferred 
> to developers and library writers.

(1) is what we did in the past, and now we're doing (2) given the 
experience that we've gotten over the years.


> Specifically, for (1):
> - We need a way to initiate arbitrary downloads that doc + preload 
> parsers will understand and can process
>
> - We need a way to communicate type, prioritization, MQ, and other custom
> fetch information for each download
>
> - We need a way to listen on download progress of these resources to build
> custom processing logic
>
> - By default there is no UA processing on the response, this mechanism
> simply fetches bytes

Isn't this just XHR?

 
> I really like your proposal for "as="... Concretely it could look something
> like this:
> 
> <link rel="preload"
>         href="/some/asset.css"
>         as="stylesheet"    (used to initialize default priority, headers,
> etc)
>         load-settings="{}"  (JSON payload with custom headers, priority
> [2], etc)
>         media= "..."          (relevant media queries..)
>         probability=""        ([0.0-1.0] used for speculative fetches [3])
> >

I don't understand why this would be better than:

   <link rel=stylesheet loadpolicy="declare" href="/some/asset.ass"
         loadsettings="..." media="...">


> The combination of all of the above allows me to fetch any content-type, 
> specify custom priorities and headers (or use a default set via 'as'), 
> apply MQ's, etc. Given all that, assuming I can extract a Promise/Fetch 
> object (or some such) out of it, I can then track the download progress 
> and apply any arbitrary logic for how and when it should be processed. 

You can do all this with the proposed markup, except that the proposed 
markup avoids having to categories all resource load types, and the fetch 
for a particular type is kept associated with its core element (e.g. 
script loads are still associated with <script>), so that you can 
trivially apply the resource (remove "declare") when needed.


> For example:
> 
> - I can execute a script immediately by waiting for the download to finish
> and inject the script tag referencing same URL

Why is this better than just telling the browser to run the script and 
letting it manage the load?


> - I can setup a callback that waits for any other arbitrary resource to
> finish before I execute it...
> - I can defer execution until a particular action occurs.

The proposal I put forward lets you do these too.


> - I can prefetch arbitrary resources for later use

Sure. You can do this today.


> (note: the script example is completely arbitrary.. the entire point is 
> that this mechanism is independent of content-type)

It's not actually independent, since you have to give the "as" attribute 
to tell the user agent how to preparse it, and then when you do execute it 
you still have to know how to invoke it (and we have to hope the browser 
associates the two loads together rather than starting a new one).


> In other words, it seems like you could build most (all?) of the doc'ed 
> use cases in client-land... I can implement needs, loadPolicty, 
> addDependency on my own. Which, in my books, is a much better outcome 
> anyway because it will allow more and much more rapid experimentation.

You can do all those today. The whole point here is to take the most 
common things that people do, and build it into the browsers so that you 
don't have to write script and so that the browsers can do a better job.


On Mon, 25 Aug 2014, Simon Pieters wrote:
> On Sat, 23 Aug 2014 02:44:23 +0200, Ian Hickson <ian@hixie.ch> wrote:
> > On Wed, 12 Mar 2014, Boris Zbarsky wrote:
> > > 
> > > I realize no one would write actual code like this; the real-life 
> > > use case I'm worried about would be more like this:
> > > 
> > >  // img is already loaded sometimes
> > >  // Would like to observe a new load
> > >  var promise1 = img.loaded(); // oops! This will be pre-resolved if
> > >                               // we were already loaded, but otherwise
> > >                               // will resolve with the new load we're
> > >                               // about to start.
> > >  img.src = bar;
> > 
> > promise1 would be rejected as soon as you set 'src' if it hadn't 
> > loaded yet.
> 
> The old image doesn't stop loading immediately when setting 'src' if its 
> dimensions are known. In that case it only stops loading if the new 
> image gets known dimensions before the old one finishes loading.

Ah, right.

IMHO it's probably best if the API pretends that only the most recently 
set src="" is actually loading, hiding the weirdness around <img>'s 
two-track thing. But if you think we should expose both loads, we can 
certainly try. What would the API look like?


> > Yeah. I think it makes sense to expose a Request object once one is 
> > underway, and a RequestInit object (probably in the form of a 
> > JSON-encoded content attribute?) to configure it, at least for the 
> > main resources.
> > 
> > I'm not sure how to handle elements with multiple resources, e.g. 
> > <video poster> or the new <picture> stuff.
> 
> So currently <video> and <img> have an attribute to configure the 
> request, namely crossorigin="". It doesn't apply to poster, but <video 
> crossorigin> applies to <video src>, <source src> *and* <track src>. 
> (You can't paint the poster on a canvas anyway so it doesn't matter 
> much.) For <img crossorigin> it applies to the URL that gets loaded, 
> whether that is from src, srcset or <source srcset>.
> 
> integrity would need to be able to apply to each individual URL somehow 
> (probably with a new srcset descriptor for <img>).
> 
> Is crossorigin's coarseness OK or do we need something per URL?

If we're exposing integrity in loadsettings="", then presumably we need to 
be finer-grained.


On Tue, 26 Aug 2014, Smylers wrote:
> > 
> > > [Use-case G:] A website knows there's a piece of Javascript code 
> > > that the user might need if they click on a part of the page. The 
> > > developer would like to have the user download it, but not at the 
> > > expense of other resources.
> > 
> >    <script src="button-reaction.js" id="reaction"
> >            load-policy="when-needed precache low-priority">
> >     // button-reaction.js defines react()
> >    </script>
> >    <button type=button 
> >            onclick="document.scripts.reaction.load().then(
> >                     function() { react(); })"> Part of the Page </button>
> 
> What does low-priority add in case G? How does that differ from case H, 
> where "when-needed precache" is sufficient to avoid delaying other 
> things from loading?

low-priority just makes it happen after other things that are otherwise at 
the same level. Think of it as the opposite of CSS "!important".


> > > [Use-case H:] A website is prefetching photos in a photo album and
> > > would like to make sure these images are lower priority than images
> > > the user is actually viewing.
> > 
> >    <img src="photo1.jpg" alt="..." load-policy="when-needed precache">
> >    <img src="photo2.jpg" alt="..." load-policy="when-needed precache">
> >    <img src="photo3.jpg" alt="..." load-policy="when-needed precache">
> >    <img src="photo4.jpg" alt="..." load-policy="when-needed precache">
> >    <img src="photo5.jpg" alt="..." load-policy="when-needed precache">
> > 
> > As they come into view, they'll become needed automatically. When they
> > are not needed, they get precached if that wouldn't get in the way of
> > other things getting loaded.

The key difference between G and H is that in H I'm assuming that the 
images are the only resources in contention, so all that's needed is a way 
to load the images when they're needed and precache them when nothing is 
needed. In G, I'm assuming there's lots of other stuff going on and this 
one shouldn't even be downloaded at the expense of other things being 
precached.


On Thu, 28 Aug 2014, Yoav Weiss wrote:
> > On Wed, 4 Sep 2013, William Chan (陈智昌) wrote:
> > >
> > > * Given current browser heuristics for resource prioritization based 
> > > on resource type, all <script> resources will have the same 
> > > priority. Within HTTP/1.X, that means you'll get some amount of 
> > > parallelization based on the connection per host limit and what 
> > > origins the script resources are hosted, and then get FIFO. New 
> > > additions like lazyload attributes (and perhaps leveraging the defer 
> > > attribute) may affect this. With HTTP/2, there is a very high 
> > > (effectively infinite) parallelization limit. With prioritization, 
> > > there's no contention across priority levels. But since script 
> > > resources today generally all have the same priority, they will all 
> > > contend and most naive servers are going to round robin the response 
> > > bytes, which is the worst thing you could do with script resources, 
> > > since current JS VMs do not incrementally process script resources, 
> > > but process them as a whole. So round-robining all the response 
> > > bytes will just push out start time of JS processing for all 
> > > scripts, which is rather terrible.
> >
> > I'm not sure what to do about this exactly.
> 
> Wouldn't that be something that is best handled as part of HTTP? e.g. 
> sending a flag with the request indicating whether the resource can be 
> progressively decoded or not?

That's an option, indeed.


> Wouldn't the "needs" attribute enable the browser to create a dependency
> tree that would allow for finer grained priorities?

Yes, that is true.


> >    load-policy=""      The load policy. Consists of a space-separated
> >                        set of keywords, of which one may be from the
> >                        following list: block, async, optimistic,
> >                        when-needed, late-run, declare. The other
> >                        allowed keywords are precache, low-priority,
> >                        and force. (Maybe we disallow "block" and
> >                        "force" since they're for legacy only.)
> >                        Different elements have different defaults.
> >                        "precache" isn't allowed if the keywords
> >                        "block" or "async" are specified, since those
> >                        always load immediately.
> 
> Can you perhaps expand on what each of these would mean?

Sure; see the updated proposal at the top of this e-mail.


> > > [Use-case P:] download dynamic page components (e.g. maps) only on 
> > > larger devices.
> >
> > Long term, we could add a media="" attribute to <script> to make this 
> > easier. Short term, you can do it with scripts by checking the width 
> > of the device and calling load() on the script if you want it.
>
> Wouldn't that still download the resource, and just avoid the 
> parsing/execution part?

Depends on the load policy. If it's "when-needed", then no.


On Thu, 28 Aug 2014, Boris Zbarsky wrote:
> 
> Firefox is moving to a setup where we will progressively parse scripts, 
> and possibly progressively bytecode-compile, and possibly progressively 
> native-code-compile in the cases when we do AOT compilation, but 
> obviously not progressively execute.

Can you elaborate on how this will work with the ES6 module loader?


On Thu, 28 Aug 2014, Yoav Weiss wrote:
> 
> Maybe the flag should indicate a "please send this progressively" hint to
> the server. Then the browser is free to send whatever hint to the server
> that gives it best performance.
> If receiving multiple resources progressively provides better performance
> then having the server sending them one after the other, then the hint
> should be sent.

This seems reasonable. I recommend raising this in the HTTP/2 working 
group.


On Fri, 29 Aug 2014, bizzbyster@gmail.com wrote:
>
> We need to bite the bullet and add a priority attribute and an 
> expected-size attribute.
> 
> Priority: 
> The web server will often have information that allows it to know better 
> than the UA about the priority of objects. Basing this on type is not 
> super useful when we have only a few types and we have lots of objects. 
> The UA can ignore it but a priority field allows the web server to give 
> the UA as much information as it has about how to download the objects 
> to optimize load time. Why not just make it like probability and allow 
> the web server to specify a value between 0.0 and 1.0, which 1.0 being a 
> top priority object?

This seems like a perfect candidate for whatever we put in 
loadsettings="", right?


> Expected-size: 
> I’ve argued this previously 
> (https://github.com/igrigorik/resource-hints/issues/12) and Ilya agrees 
> its a nice to have. Along with the probability attribute that is in 
> Ilya’s latest draft, this provides a simple way to threshold which 
> objects to prefetch at the UA.

This seems similarly something to put in loadsettings="".

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 8 September 2014 20:34:24 UTC