Re: [whatwg] Navigation and history traversal issues from Justin Lebar on 2012-09-19 (public-whatwg-archive@w3.org from September 2012)

From: Justin Lebar <justin.lebar@gmail.com>
Date: Tue, 18 Sep 2012 21:32:54 -0400
To: Ian Hickson <ian@hixie.ch>
Cc: WHAT Working Group <whatwg@whatwg.org>
Message-ID: <CAFWcpZ7HSSqkvMCJ2bjTUqG2NnQ0hqcwR+NC_bFQsv5MBU1XCQ@mail.gmail.com>
This is all great; thanks for the quick turnaround!

> I've also made back()/forward()/go() not work during the document's unload
> handler, since that could be used for griefing. I'm tempted to disable it
> entirely for all docs a la alert(), but I've no idea if that's Web-
> compatible and I suspect not.

I don't know what you mean by the last sentence here.  In my tests, IE
and Opera do not support cross-origin back/forward/go, if that's what
you mean.  I don't see any good reason for us to support that in
Firefox, either, if we could get away with removing it.

-Justin

On Tue, Sep 18, 2012 at 8:18 PM, Ian Hickson <ian@hixie.ch> wrote:
> On Tue, 12 Jun 2012, James Graham wrote:
>>
>> In particular, what stops such navigations from re-triggering the unload
>> handler, and thus starting yet another navigation?
>
> I've updated the spec to have guards in place for 'pagehide' and 'unload'.
>
> (Not yet 'beforeunload'. Should we do that too?)
>
>
>> It looks like the spec tries to make a distinction between navigations
>> that are cross-origin and those that are not (step 4 in the "navigating
>> across documents" algorithm); I'm not sure why this inconsistency is
>> desirable rather than using the cross-origin approach always.
>>
>> Based on some tests ([1]-[5]), it seems that WebKit seems to cancel the
>> navigation in the unload handler always, Opera seems to always carry out
>> the navigation in the unload handler, and Gecko seems to follow WebKit
>> in the cross-origin case and Opera in the same-origin case. In all cases
>> the unload handler is only called once.
>>
>> [1] http://hoppipolla.co.uk/tests/navigation/003.html
>> [2] http://hoppipolla.co.uk/tests/navigation/004.html
>> [3] http://hoppipolla.co.uk/tests/navigation/005.html
>> [4] http://hoppipolla.co.uk/tests/navigation/006.html
>> [5] http://hoppipolla.co.uk/tests/navigation/007.html
>
> On Tue, 12 Jun 2012, Boris Zbarsky wrote:
>>
>> For what it's worth, we initially tried to do what you say WebKit does
>> but ran into web compat issues.  See
>> https://bugzilla.mozilla.org/show_bug.cgi?id=371360 for the original bug
>> where we blocked all navigation during unload and
>> https://bugzilla.mozilla.org/show_bug.cgi?id=409888 for the bug where we
>> changed to the current behavior.  I believe the spec says what it says
>> based on our implementation experience here...
>
> Yeah, the spec's behaviour is intentional here. The error in the spec was
> just that it still fired unload again. I've fixed that.
>
>
> On Wed, 13 Jun 2012, James Graham wrote:
>>
>> That seems to be true. On the other hand it appears that gecko will
>> still respect navigation from unload even if the unload was triggered by
>> explicit user interaction (e.g. by editing the address bar), as long as
>> all the origins match, so you can end up at a different page to the one
>> you expected. That is very surprising behaviour (although I see that you
>> can argue that it is possible in other ways).
>
> When it's same origin, you really have no way to know what's going on. The
> page could trivially pushState() a continuously changing URL, for example,
> and could serve random files from the server for any URL.
>
>
> On Thu, 14 Jun 2012, James Graham wrote:
>> On 06/13/2012 11:18 PM, Ian Hickson wrote:
>> > On Fri, 20 Apr 2012, Henri Sivonen wrote:
>> > > >
>> > > > * Should window.stop() really not abort the parser like the spec
>> > > > seems to suggest?
>> > >
>> > > Looks like Opera is alone with the non-aborting behavior. The spec
>> > > is wrong.
>> >
>> > Can you elaborate on this? How can you tell?
>>
>> I presume the TC is something like
>>
>> <!doctype html>
>> Before stop
>> <script>
>> window.stop()
>> </script>
>> After stop
>>
>> Only Opera displays "after stop" here. We are planning to change this
>> behaviour, so that window.stop is much more like the "abort the
>> document" (I haven't yet closely studied how this interacts with the
>> readystate and other things that Henri has been looking at).
>
> The spec now clearly requires the parser-stopping behaviour.
>
> See also this bug where I'm tracking an issue with the word "cancel":
>    https://www.w3.org/Bugs/Public/show_bug.cgi?id=16801
>
>
> On Fri, 15 Jun 2012, James Graham wrote:
>>
>> FWIW I think the conceptually simplest solution here is for aborting the
>> document to go through "The End", so that defer scripts are run,
>> DOMContentLoaded and load events fire, and the readyState changes in the
>> normal way. This isn't quite like the behaviour of Gecko or WebKit
>> today, but is spec-wise easy to understand, and hopefully no one is
>> relying too much on specific behaviour of window.stop().
>
> Aborting a document happens for many reasons other than stop(). For
> example, document.open(), navigation, the user hitting "STOP", going
> back() in history, etc. In particular, "The End" can block on network, so
> we definitely don't want to require that UAs do that when you close a tab,
> for example.
>
>
> On Wed, 15 Aug 2012, Glenn Maynard wrote:
>>
>> Should this alert on initial load?
>>
>> <!doctype html><body onpopstate="alert('xxx')">
>>
>> [1] says "After creating the Document object, but before any script
>> execution, certainly before the parser stops, the user agent must update
>> the session history with the new page."  That invokes [2] "update the
>> session history with the new page", which invokes [3] "Traverse the
>> history to the new entry", which fires popstate in step 14.
>>
>> However, "After creating the Document object, but before any script
>> execution" seems like it could happen before or after the <body> element
>> has been parsed, so the alert may or may not happen.
>
> Yeah, this is an oversight as specced. Fixed.
>
>
> On Sun, 16 Sep 2012, Justin Lebar wrote:
>>
>> Suppose an attack page evil.html controls a separate frame F (e.g.
>> evil.html frames F, evil.html opened F as a popup window, or vice
>> versa).
>>
>> We discovered that if evil.html causes F to
>>
>>   1. load a.html
>>   2. start loading b.html
>>   3. load a.html#h
>>
>> then step (3) cannot cancel the load of b.html.  That is, the final
>> session history from this sequence must be either
>>
>>   a.html  <-- oldest
>>   a.html#h
>>   b.html  <-- current
>>
>> or
>>
>>   a.html <-- oldest
>>   b.html <-- current.
>>
>> All browsers I tested gave one of the above two results.
>>
>> Doing anything else breaks the web (we shipped this in Firefox Nightly
>> and people were unable to log into ingdirect.com, for example).  I
>> didn't investigate too thoroughly, but I believe what happens is, some
>> sites use a link with href "#" and then navigate themselves in the
>> link's onclick handler, without cancelling the click event.  In that
>> case, we do precisely steps 1-3 above.
>>
>> As I read the spec, browsers are supposed to cancel the load of b.html
>> in step 3 above.  In the navigation algorithm [1], step 6 explicitly
>> cancels the load of b.html, because the load of b.html has not matured.
>> So if I understand correctly, the spec is dictating behavior that we
>> know won't work and that no browser implements.
>>
>> The presence of steps 6 and 8 in the algorithm suggest that the spec is
>> already trying to walk this line, so maybe I misunderstand what's going
>> on, either in my tests or in the spec.
>
> The existing text in the spec step 4 is attempting to prevent a page from
> having you click on a link to <a href="http://paypal.com/"> and in the
> unload change that to a location.href="http://paypa1.com/" navigation, or
> something similar but with the user typing in the location bar and the
> page hijacking that navigation.
>
> If it turns out that you can't ever block a cross-origin navigation,
> though, that's a lot easier to fix. :-)
>
> It's not that simple though. Browsers agree on this page that we should go
> to the second of the two cross-origin navigations (replace "false" with
> "1" in the script to run the test):
>
>    http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1778
>
> This one too (frame nav):
>
>    http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1780
>
> So this is presumably specific to fragment identifiers. And sure enough,
> when we change the latter one above to changing to a fragment identifier,
> it works as you describe:
>
>    http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1782
>
> (Things aren't so simple in this example (same-page nav):
>
>    http://software.hixie.ch/utilities/js/live-dom-viewer/saved/1784
>
> ...where Firefox no longer exhibits the restraint we're looking for here,
> but Chrome and Opera still do.)
>
> Anyway, yeah, looks like step 6 is just bogus. I've removed it. This now
> means that fragment identifier navigations just happen without screwing
> around with ongoing loads.
>
>
>> == Issue #2 ==
>>
>> Suppose again that evil.com controls a frame F, and evil.com causes F to
>>
>>   1. load a.html
>>   2. load a.html#h
>>   3. start loading b.html
>>   4. go back
>>
>> When we go back, we traverse the history [2] from a.html#h to a.html.
>> Per the spec, this doesn't cancel the load of b.html.
>>
>> This caused a problem for us in Firefox because we create a session
>> history entry for b.html at the beginning of step 3 and insert it after
>> the current one.  Then, when the load of b.html completes, we use
>> whichever session history entry happens to be after the current one,
>> assuming that it was the session history entry we created earlier. [...]
>>
>> The fix for this bug is not as simple as merely ensuring that the
>> session history entry's URL matches the document's URL.  Due to hash
>> navigations and pushstate, these URLs may not match even when we're
>> behaving correctly.
>>
>> We fixed this bug by cancelling the load of b.html when you go back.
>> This matches Chrome's behavior in my tests [3].
>>
>> Notice that this means we're cancelling an outstanding network load due
>> to a synchronous same-document load, which I said in part 1 breaks the
>> web.  But based on the (lack of) feedback we've received from our test
>> audience, it seems that cancelling the load of b.html does /not/ break
>> the web if the navigation from a.html to a.html#h is a history
>> navigation.
>>
>> The right thing to do is probably to load b.html after a.html, so the
>> final session history is
>>
>>   a.html <-- oldest
>>   b.html <-- current.
>>
>> I /think/ this is what the spec says should happen, but I'm not sure.
>> But matching the spec here would be difficult in our current
>> architecture, and anyway wouldn't match the one other browser I was able
>> to test, so perhaps a spec should be changed to match.
>
> The way the spec is written, if I'm not mistaken, you only create the new
> session history entry when you're ready to make it active. So I don't
> think the spec has the problem you ran into; as you describe, it just
> works.
>
> However, if it doesn't match browsers, that's of little comfort.
>
> I've changed the spec so that traversing the history by a delta always
> cancels any pending navigations unless you're in the middle of an unload,
> in which case it just aborts the algorithm entirely.
>
> I've also made back()/forward()/go() not work during the document's unload
> handler, since that could be used for griefing. I'm tempted to disable it
> entirely for all docs a la alert(), but I've no idea if that's Web-
> compatible and I suspect not.
>
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 19 September 2012 01:33:45 UTC