[whatwg] script-related feedback from Ian Hickson on 2012-05-07 (public-whatwg-archive@w3.org from May 2012)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 7 May 2012 21:40:51 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.1205072040580.17060@ps20323.dreamhostps.com>
On Fri, 5 Aug 2011, Den.Molib wrote:
>
> I noticed that the html spec doesn't state what should be done with a 
> script tag with a non-empty src attribute *and* content inside.

Can you elaborate on precisely what it is you mean? I couldn't find 
anything where this was left undefined. For example, the <script> 
processing algorithms seem unambiguous on this point (the contents get 
ignored), as do the parsing rules. The conformance requirements seem to 
cover that case too.


> Per section 4.3.1.3 [1] we just know that 'If a |script| element's |src| 
> attribute is specified, then the contents of the |script| element, if 
> any, must correspond to putting the contents of the element in 
> JavaScript comments'. I think that such content would be ignored: since 
> it has a src tag, it would be marked as 'from an external file' and 'The 
> contents of that file are the script source', but it could also be 
> considered to be external and inline at the same time.

The contents are ignored in this case, yes. The spec says "For the 
purposes of these steps, the script is considered to be from an external 
file if, while the prepare a script algorithm above was running for this 
script, the script element had a src attribute specified".


> I realised this when looking at the recommended code for embedding 
> Google+ [2] (choose a language other than US English). It looks like
>
> > <script type="text/javascript" src="https://apis.google.com/js/plusone.js">
> >   {lang: 'en-GB'}
> > </script>
> 
> The script [3] is too minified to follow, but it looks they are using 
> the same tag for including the script and embedding parameters, even 
> though they are disobeying a must by doing so. Maybe they have a string 
> reason and it's the spec what should allow such use. Either way, a 
> clarification in the specification looks good.

What Google are doing here is non-conforming (precisely because it is so 
confusing -- it looks like it's an inline script, but isn't).


On Sat, 10 Sep 2011, Kyle Simpson wrote:
> 
> So, can I clarify something? You have moved `onreadystatechange` and 
> `readyState` off of the <script> element entirely, and onto the HTML 
> element? If we have multiple scripts loading at the same time, how do 
> you get notified of the different states of each script element, when 
> there's only one property and one event handler?

As soon as the element is inserted, it begins loading. So you know whether 
it's loading by whether or not you've inserted it. Once it's loaded, it 
will run as soon as it can modulo the loading constraints (like defer, 
etc), at which point beforescriptexecute fires. After it's run, the 
afterscriptexecute and load events fire.


> In regards to all the concern about double-firing of load detection 
> logic, IE9 added both `onload` event firing to their existing script 
> element's `onreadystatechange` firing. That's been around now for 6 
> months (not to mention the year long platform-preview stage where 
> content was tested in IE9 relentlessly).
> 
> AFAIK, there've been no major compat problems with that. Why? Because 
> most script loaders were already aware of a case (in Opera) where the 
> load handler might be fired twice, and so were already doing the 
> filtering with the "loaded" flag. LABjs has done exactly that for over 2 
> years now, as have almost all other script loaders since. This is hardly 
> something new.
> 
> So, I'm not sure why we're rushing to fear these problems. A few years 
> ago, maybe this was an issue, but I don't see how there's real evidence 
> of current problems. Most script loaders are already immune to this 
> problem.

Script loader libraries are not the only consumer here.


On Sat, 10 Sep 2011, Boris Zbarsky wrote:
> 
> Opera pointed to a specific script loader in the Facebook API that is 
> not thus immune, as well as one in popcornjs.
> 
> Given an existence proof like that, "most" doesn't really cut it for me, 
> unfortunately.
> 
> Or put another way, I would not be willing to implement readyState on 
> scripts in in Gecko as things stand, without a lot stronger data 
> supporting the fact that scripts no longer listen for both load and 
> readystatechange.

That's pretty much the strongest argument one can make. :-)


On Tue, 20 Sep 2011, Simon Pieters wrote:
>
> We're implementing window.onerror in Opera. In order to not expose the 
> URL of redirects in cross-origin resources with window.onerror, errors 
> from cross-origin scripts are masked in Gecko and WebKit, i.e. instead 
> of invoking window.onerror with a useful error message, a URL and the 
> line number, it's invoked with "Script error.", "", 0.
> 
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=14177 
> https://bugzilla.mozilla.org/show_bug.cgi?id=568564
> 
> This makes window.onerror rather useless for cross-origin scripts. 
> However, it is still possible to tell if the user is logged in or not if 
> a site serves a script for a particular URL when the user is logged in 
> and redirects to the home page or so when the user is not logged in. We 
> have found a bank site where this is possible. There are other ways to 
> tell if the user is logged in, however it seems we should try to keep 
> them to a minimum. Therefore we suggest that window.onerror should not 
> be invoked at all for errors in cross-origin scripts.

On Wed, 21 Sep 2011, Bjoern Hoehrmann wrote:
> 
> I note there are at least two other ways to minimize the disclosure pro- 
> blem here, which is due to a bug on the bank's site, and it seems quite 
> likely there should be many more ways to check whether the script loaded 
> (like checking for global variables it sets, markup it might add, mess 
> with event listeners it might register, and so on): limit this to the 
> "cookie domain" and basing the decision on the media type of responses.
> 
> Either would disclose more, but taking away the ability to issue alerts 
> when there are too many scripting errors (new browser update pushed to 
> users that you did not catch in advance is incompatible with script, as 
> an example) short of having people add "script_xy_loaded_okay" data to 
> the scripting environment, which might be a new source of leaks when it 
> is used incorrectly, is a bit of a problem, even if the rule that you do 
> not get errors from "cross-origin" loads is certainly the most simple.

On Tue, 20 Sep 2011, Boris Zbarsky wrote:
> On 9/20/11 5:40 PM, Simon Pieters wrote:
> > However, it is still possible to tell if the user is logged in or not 
> > if a site serves a script for a particular URL when the user is logged 
> > in and redirects to the home page or so when the user is not logged 
> > in.
> 
> Can't you tell this from the load event for the <script> tag, without 
> involving the error event in any way?
> 
> I'd love it if we could close this hole up, but the ship has long 
> sailed.  :(
> 
> > There are other ways to tell if the user is logged in, however it 
> > seems we should try to keep them to a minimum.
> 
> I'm not sure that onerror and onload are really different ways to tell 
> here.
> 
> Unless the proposal is that in this case onload fire instead of onerror 
> for the script that ends up as an HTML document?

On Thu, 22 Sep 2011, Simon Pieters wrote:
> 
> I was talking about window.onerror. <script onerror> per spec fires for 
> empty src="", unresolvable URL and network errors (DNS or 404). If we 
> want to make onload always fire for cross-origin, it would make sense 
> for <script onerror> to not fire for network errors. (Opera doesn't fire 
> error on script, assuming my testing isn't bogus this time.)
> 
> I don't know if it's worth it to try to plug this hole this way, 
> however. We won't be able to plug it everywhere, e.g. <img> will expose 
> if an image is loaded. So masking onload/onerror for script just makes 
> the feature less useful without solving the problem. Maybe we should 
> instead focus on implementing the From-Origin header and try to get 
> sites to use that.

On Fri, 23 Sep 2011, Simon Pieters wrote:
> 
> It was pointed out to me that the following site expects an error event 
> for a cross-origin script (which returns 404):
> 
> http://www.alvoradafm.com.br/Player/player.html
> 
> which tries to load http://lp.longtailvideo.com/5/%20gapro/%20gapro.js

Mostly on the strength of Bjorn's comments, I haven't changed anything 
here. It's not clear to me how not firing window.onerror would really help 
in this situation.

It's not the fact that there's an error that we're hiding by obfuscating 
window.onerror in cross-origin scripting cases, it's information about the 
file's contents that wouldn't otherwise be obtainable (like line numbers) 
that we're trying to hide.


On Thu, 27 Oct 2011, David Flanagan wrote:
> ?4.3.1 The Script Element says:
> > 
> > When a |script| element that is not marked as being "parser-inserted" 
> > experiences one of the events listed in the following list, the user 
> > agent must synchronously prepare the |script| element:
> > 
> >   * The |script| element gets inserted into a document.
> >   * The |script| element is in a |Document| and its child nodes are changed.
> >   * The |script| element is in a |Document| and has a |src| attribute 
> >     set where previously the element had no such attribute.
>
> Bullet point 2 seems ambiguous to me.  Does it mean only that the list 
> of children changes, or does it mean that any change to any child node 
> also causes the script to be prepared?  In particular, if a script with 
> no src attribute whose only child is an empty text node is inserted into 
> the document, the prepare() algorithm will abort before the 
> already_started flag is set.  Later, if I do 
> script.firstChild.insertData(jscode) does that trigger script execution?
> 
> I haven't tried it out yet to see what browsers do, but I think that the 
> spec should be clarified to make it explicit.

On Fri, 28 Oct 2011, David Flanagan wrote:
> 
> First of all, the following code obviously runs the specified code and 
> displays an alert:
> 
>     var s0 = document.createElement("script");
>     document.head.appendChild(s0);
>     var t0 = document.createTextNode("alert('added a text node child');");
>     s0.appendChild(t0);
> 
> All browsers do that correctly.  The case I'm interested in is this one:
> 
>     var s1 = document.createElement("script");
>     var t1 = document.createTextNode("");
>     s1.appendChild(t1);
>     document.head.appendChild(s1);
>     t1.appendData("alert('changed text node data');");
> 
> Firefox runs this script and Chrome, Safari and Opera do not. (I don't 
> have a windows installation, so I haven't tested IE)
> 
> Step 4 of the "prepare a script" algorithm says: " If the element has no 
> |src| attribute, and its child nodes, if any, consist only of comment 
> nodes and empty text nodes, then the user agent must abort these steps 
> at this point. The script is not executed."  So when the script is added 
> to the document, it has only an empty text node, and it does not 
> execute, and (this is the important part) it does not get its already 
> started flag set.  So it should still be runnable.
> 
> One thing that is supposed to trigger script execution is "the script 
> element is in a Document and its child nodes are changed".  My original 
> point in this post was that "child nodes are changed" isn't specific 
> enough.  The most obvious interpretation to me would be "a child is 
> inserted or deleted". Firefox has a more sophisticated interpretation 
> that seems to boil down to "when the value of the text idl attribute 
> changes".  Is Firefox correct here?
> 
> We're not done yet, though.  If I comment out the appendData() call in 
> the code above and replace it with this line:
> 
>     s1.appendChild(document.createTextNode("alert('then added a new text
> node');"));
> 
> Firefox now runs this new script.  But Chrome, Safari and Opera still 
> don't run it.  So the issue here isn't that the other browsers differ 
> from Firefox on the interpretation of "child nodes are changed".  
> Apparently the other browsers are marking the script with the empty text 
> node as already started, and aren't allowing it to run when a change 
> happens later.  And this isn't just limited to the empty text node case.  
> If I change that empty text node into a <div> element, or to a comment, 
> Firefox still (correctly) runs a script inserted later, and the other 
> browsers still (incorrectly) fail to run it.
> 
> Frankly, from an implementation standpoint, having to do what the spec 
> says (and what Firefox does) seems unnecessarily complex.  One way to 
> simplify things and to bring Chrome, Safari and Opera into compliance 
> would be to change step 4 of the prepare a script algorithm so that it 
> only aborts if the script tag has no children at all.  If it has 
> children then the already_started flag would be set, and the script 
> would never run again even if those children do not define any script 
> content.
> 
> Making this change would also simplify that second trigger for preparing 
> the script.  Instead of a vague "its child nodes are changed", the spec 
> could just say "a child is inserted".

On Fri, 28 Oct 2011, Bjoern Hoehrmann wrote:
>
> [IE9 acts like Firefox for the above tests]

On Fri, 28 Oct 2011, David Flanagan wrote:
>
> Thanks, Bjoern. That makes it a lot harder for me to argue that the spec 
> should change to match Chrome, Safari and Opera... But can we at least 
> change "when child nodes change" to something like "when the text IDL 
> attribute changes from the empty string to a non-empty string"?

Done.


On Fri, 28 Oct 2011, David Flanagan wrote:
>
> Here's another ambiguity about the "child nodes are changed" trigger for 
> executing a script element. What is the correct behavior for the 
> following code?
> 
> <script>
> window.onload = test;
> 
> function test() {
>     var s = document.createElement("script");
>     document.head.appendChild(s);
> 
>     var f = document.createDocumentFragment();
>     f.appendChild(document.createTextNode("alert(document.scripts[1].text);"));
>     f.appendChild(document.createTextNode("alert(2);"));
>     f.appendChild(document.createTextNode("alert(3);"));
> 
>     s.appendChild(f);
> 
>     alert(s.text);
> }
> </script>
> 
> In Firefox, the code in all three text nodes runs, so there are 4 alerts in
> total, and the first and the fourth display the same text: the concatenation
> of the three text nodes.
> 
> In Chrome, Safari and Opera (I can't test on IE), only the first text 
> node is run as a script. There are two alerts.  The first displays the 
> content of the first text node, and the second alert displays the 
> concatenation of all three text nodes.
> 
> I would guess that Firefox's behavior is correct here, because DOM4 
> specifies the algorithm for DocumentFragment insertion without using 
> recursion. But its not really specified clearly there either.  Does the 
> HTML spec need a clarifying note on this point?  (I also plan to raise 
> this issue on the www-dom mailing list)

On Fri, 28 Oct 2011, Bjoern Hoehrmann wrote:
> 
> IE9 Standards mode is the same [as Firefox].

I've clarified the text to hopefully be compatible with IE and Firefox.


On Fri, 2 Dec 2011, Jonas Sicking wrote:
> 
> Currently HTML5 defines that a <script src="..."> element that is 
> inserted into the DOM should always execute if the load succeeds. Even 
> the the element is removed from the Document before it is executed. See
> 
> http://www.whatwg.org/specs/web-apps/current-work/multipage/scripting-1.html#script-processing-src-prepare
> 
> This makes a lot of sense to me since otherwise we'll introduce a race 
> condition where if the load happens quickly enough the script will 
> execute despite being later removed. In other words, a piece of DOM 
> which is removed may or may not cancel any <script>s inside it.
> 
> In webkit things are even worse. It appears that if you insert a 
> <script> in the DOM and *immediately* remove it, before returning to the 
> event loop, it still sometimes executes. I.e. webkit appears to always 
> be exhibiting racy behavior.
> 
> Gecko currently follows the spec, but is the only browser that does so. 
> We are not aware of any sites that break because of this.
> 
> The main use case for wanting to support scripts getting [aborted] 
> appears to be wanting to abort JSONP loads. Potentially to issue it with 
> new parameters. This is a decent use case, but given the racyness 
> described above in webkit, it doesn't seem like a reliable technique in 
> existing browsers.

Isn't this use case adequately handled by XMLHttpRequest?


> So the questions are:
> 
> 1. Should we keep the spec as it currently stands?
> 2. Are browsers willing to follow the specced behavior?
> 3. Do we want to support the use-case of abourting JSONP loads?
> 4. If we do, should we use the "existing" technique even though it'll 
> fail intermittently in existing browsers and comes with other risks. Or 
> should we define a new API for this use case (which could be feature 
> detected).

On Fri, 2 Dec 2011, Tab Atkins Jr. wrote:
> 
> If it's unreliable *and* no sites appear to break with the proper 
> behavior, we shouldn't care about this use-case, since cross-domain XHR 
> solves it properly.

On Sat, 3 Dec 2011, Yehuda Katz wrote:
> 
> Cross-domain XHR *can* solve this use case, but the fact is that CORS is 
> harder to implement JSONP, and so we continue to have a large number of 
> web APIs that support JSONP but not CORS. Unfortunately, I do not forsee 
> this changing in the near future.

On Sat, 3 Dec 2011, Jonas Sicking wrote:
> 
> I think we can solve this in 3 ways:
> 
> 1. Keep spec as it is. Pages can simply ignore the JSONP callback when
> it happens.
> Disadvantages:
> Additional bandwidth.

Note that this would likely happen anyway even if the script load is 
aborted.

> More complexity for the web page.
> 
> 2. Make removing scripts cancel any execution
> Disadvantages:
> Pages will have to deal with the fact that removing scripts can still
> cause the callback to happen if the load just finished. So the same
> amount of complexity for page authors that don't want buggy pages as
> alternative 1.
> Since many pages likely won't properly handle the callback happening
> anyway will likely cause pages to be buggy in contemporary browsers.
> 
> 3. Add a new API to reliably cancel a script load
> Disadvantages:
> New API for pages to learn.
> 
> I'm personally leaning towards 3 or 1. If we go with 3 pages can always 
> call the API and remove the script in order to get buggy "working" 
> behavior in contemporary browsers.

On Sat, 3 Dec 2011, Yehuda Katz wrote:
> 
> 4. Add a new API (or customize XHR) to explicitly support JSONP 
> requests, and allow those requests to be cancelled.

My preference would be 1 then 4 then 3 then 2.


On Sat, 3 Dec 2011, Jonas Sicking wrote:
> 
> It will be sort of a weird API since the security model will be sort of 
> strange. Traditionally we say that you can't load data cross site, but 
> that you can execute scripts cross site. Here we want something sort of 
> in between.
> 
> It could have significant advantages if it makes it easier for sites to 
> do cross-site loading of data without exposing themselves to XSS risks.

Why is XMLHttpRequest with CORS complicated here? It seems like it'd be as 
easy as JSONP -- it's only one header, right?


On Wed, 7 Dec 2011, Adam van den Hoven wrote:
> 
> If we went for a hybrid approach, namely that XHR has a cancellable way 
> to call and execute some arbitrary JavaScript and sandbox the execution 
> so that "this" is something explicitly provided to the XHR, would we not 
> suddenly have a rather secure way to load any javascript in general (and 
> probably make things like lab.js and yepnope easier to write)? Now I can 
> load some javascript (say from some ad server) without giving it access 
> to the window object and the global scope, if I don't want to. Wouldn't 
> this address some of the security issues that Doug Crockford has brought 
> up in the past?

That would certainly be interesting. I'm not really sure how to approach 
it though.


On Wed, 7 Dec 2011, Jonas Sicking wrote:
> 
> Ideally we wouldn't execute anything. We'd just parse the JSON literal 
> and hand that back. That is what'll give us safety.
> 
> To make a concrete, but hideous, example:
> 
> We could add xhr.responseType = "jsonp".
> 
> When this is set, the XHR object will look for contents on the following 
> form:
> 
> <js identifier> '(' <js-literal> ')'
> 
> followed by an optional ';'
> 
> When the contents follows that syntax, the XHR object parses the 
> js-literal and sets it's .response property to the result.
> 
> Other than that the XHR object works just as it currently does. I.e. it 
> fires progress events, load events and readystatechange events as 
> normal.
> 
> This way no JS execution happens, and no global names need to be set up. 
> The <js identifier> part is simply ignored other than to check that it's 
> a valid js identifier.
> 
> I believe we can come up with something better than this, but it's a 
> demonstration of what's technically feasible.

That seems fine to me, but I'll leave that up to Anne to pick up if he 
wants to spec it. (I've left out other XHR feedback along these lines. I 
recommend discussing XHR on public-webapps at w3.org.)


On Fri, 2 Dec 2011, Glenn Maynard wrote:
> 
> This use case should be covered by this:
> 
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-September/033132.html
> 
> I don't actually see the mentioned readyState attribute in 
> HTMLScriptElement, though.

We've taken this feature out due to compat issues.


> I think that if aborting script loads is worth supporting (which seems 
> reasonable), it should be supported in a way that doesn't cause race 
> conditions (such as the above, or any of the other proposals for delayed 
> script execution).

Yes, of course. (Not sure it's worth supporting though.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 7 May 2012 14:40:51 UTC