Re: Cancellation architectural observations from Alexander Fritze on 2015-03-04 (public-script-coord@w3.org from January to March 2015)

From: Alexander Fritze <alex@onilabs.com>
Date: Wed, 4 Mar 2015 09:41:25 -0600
To: es-discuss <es-discuss@mozilla.org>
Cc: "public-script-coord@w3.org" <public-script-coord@w3.org>
Message-ID: <CAGso3h5h4h08N1-XjZXeBwSmQZTAo6xN8CBu8S90raGA+OWuuA@mail.gmail.com>
I know ES is taking a different route here - keeping async sitting on
top of the language - but I can't resist wading in on the cancellation
discussion and give a perspective from our experience with Stratified
JavaScript (SJS).

One of the most surprising things we found is how crucial cancellation
is to make concurrent logic composable, and how easy it is to fold it
into the existing JavaScript base language.

TL;DR: If asynchronous code is recast as blocking code, then handling
cancellation/aborting cleanup is straightforward by leveraging and
extending the try/catch/finally construct.


**** Background

SJS (http://stratifiedjs.org) is a project that has been going on for
the past 5 years or so, with the goal of folding async into the base
JS language, and building a homogeneous JS client/server framework on
top of it (https://conductance.io). We've gained a great deal of
experience with it, and we've been using it to build large web apps.

The basic idea behind SJS is to recast asynchronous code as blocking
code. E.g. we can make a blocking 'pause' function with the following
code:

  function pause(delay) {
    waitfor() {
      setTimeout(resume, delay);
    }
  }

(for more details on the waitfor-resume construct see
https://conductance.io/reference#sjs:%23language/syntax::waitfor-resume)

We can now use 'pause' to code blocking logic in a straightforward way, e.g.:

  for (var i=0; i<100; ++i) {
    document.getElementById('foo').left = i;
    pause(100); // wait 100ms
  }

Similarly to 'pause', we can cast other asynchronous functions into
blocking form. E.g. a function that waits for a DOM event could look
like this:

  function waitforEvent(elem, event) {
    waitfor(var rv) {
      elem.addEventListener(event, resume, true);
    }
    return rv;
  }

And a function that asynchronously fetches a document via
XMLHttpRequest like this:

  function httpGet(url) {
    var request = new XMLHttpRequest()
    request.open("GET", url, true);
    waitfor() {
      request.onreadystatechange = function() {
        if (request.readyState == 4) resume();
      };
      request.send(null);
    }
    return request.responseText;
  }


These can then be used as part of any normal JS control flow, e.g.:

  if (waitforEvent(document, 'click').target == some_dom_element) {
    do_something_with_response(httpGet('http://foo.bar'))
  }



**** Adding in concurrency

For performing multiple asynchronous codepaths simultaneously and
orchestrating their interactions, SJS adds a couple of 'structured
concurrency combinators' to JS: waitfor-and & waitfor-or.

E.g. simultanously kicking off a request to cnn or bbc and then
proceeding with the first result, looks something like this:

  var news;

  waitfor {
    news = httpGet('http://cnn.com');
  }
  or {
    news = httpGet('http://bbc.com');
  }

  ... now do something with the news ...


These constructs work very well with functional abstraction &
composition. E.g. defining the code above as 'function getNews()', we
can layer more concurrency around it:

  var result;

  waitfor {
    result = getNews();
  }
  or {
    waitforEvent(cancel_button, 'click');
    result = 'The user cancelled!';
  }
  or {
    pause(10000);
    result = 'Timeout!';
  }

  ...

Being able to abstract and meaningfully compose concurrent strands of
logic that themselves might be composed of complicated strands of
concurrent logic is very powerful: It allows us to build large
concurrent programs using a tractable functional decomposition
approach.



**** Now the interesting bit: Retraction/Cancellation

Now, the problem is that with the 'pause', 'waitforEvent' and
'httpGet' functions defined as above, this piece of code will not
properly clean up after itself.
If e.g. the user aborts, the requests to cnn and bbc will still be
pending even though they are not needed any more, potentially wasting
server time and network resources.
Also, no matter which of the code paths 'wins', we'll always leak the
click event listener on the cancel_button.

But as it turns out, it is quite easy to fold
cancellation/abortion/cleanup into the language when asynchronous code
is sequentialized as it is in SJS.
..
The key is to realize two things:

1. 'being cancelled' is just another mode of exiting a block of code -
the others being via normal control flow or by an exception being
thrown, and

2. cancellation flow is similar to exception propagation. We want to
handle cancellation starting from the blocked site propagating upwards
along the call stack. A cancellation is really very similar to an
exception, the main difference being that an exception is generated at
the bottom of the call stack, whereas a cancellation is generated at
the top of the call stack.


Now, JS already has the try/catch/finally construct that handles
cleanup after an exception (catch clause), or unconditional cleanup
(finally clause), and this construct can be extended to handle the
cancellation case too:

SJS just ensures that 'finally' clauses are also honored when code is
being cancelled, thereby preserving the semantics that try/finally
cleans up unconditionally, no matter how a block of code is exited.

Furthermore, SJS adds a new 'retract' clause to try/catch/finally.
Much like a 'catch' clause is executed when an exception is thrown,
the 'retract' clause is executed when a block of code is cancelled (or
'retracted' as we say in SJS) from 'the outside'.

With try/finally/retract, it becomes straight-forward to add cleanup
code to our blocking functions:

  function pause(delay) {
    try {
      waitfor() {
        var id = setTimeout(resume, delay);
      }
    }
    retract {
      clearTimeout(id);
    }
  }


  function waitforEvent(elem, event) {
    waitfor(var rv) {
      elem.addEventListener(event, resume, true);
    }
    finally {
      elem.removeEventListener(event, resume, true);
    }
    return rv;
  }

  function httpGet(url) {
    var request = new XMLHttpRequest()
    request.open("GET", url, true);
    waitfor() {
      request.onreadystatechange = function() {
        if (request.readyState == 4) resume();
      };
      request.send(null);
    }
    retract {
      request.abort();
    }
    return request.responseText;
  }


(In the last two function we used an SJS shorthand, whereby a
'finally' or retract block can be tacked right onto a waitfor(){...}
block, without wrapping it in a try {...})


With our blocking functions amended in this way, we can now compose
them without caring about cancellation. Appropriate cleanup will be
handled automatically for all code paths.



On Mon, Mar 2, 2015 at 1:06 AM, Dean Tribble <tribble@e-dean.com> wrote:
> Another thread here brought up the challenge of supporting cancellation in
> an async environment. I spent some time on that particular challenge a few
> years ago, and it turned out to be bigger and more interesting than it
> appeared on the surface. In the another thread, Ron Buckton pointed at the
> .Net approach and it's use in JavaScript:
>
>>
>> AsyncJS (http://github.com/rbuckton/asyncjs) uses a separate abstraction
>> for cancellation based on the .NET CancellationTokenSource/CancellationToken
>> types. You can find more information about this abstraction in the MSDN
>> documentation here:
>> https://msdn.microsoft.com/en-us/library/dd997364(v=vs.110).aspx
>
>
> It's great that asyncjs already has started using it. I was surprised at how
> well the cancellationToken approach worked in both small applications and
> when extended to a very large async system. I'll summarize some of the
> architectural observations, especially from extending it to async:
>
> Cancel requests, not results
> Promises are like object references for async; any particular promise might
> be returned or passed to more than one client. Usually, programmers would be
> surprised if a returned or passed in reference just got ripped out from
> under them by another client. this is especially obvious when considering a
> library that gets a promise passed into it. Using "cancel" on the promise is
> like having delete on object references; it's dangerous to use, and
> unreliable to have used by others.
>
> Cancellation is heterogeneous
> It can be misleading to think about canceling a single activity. In most
> systems, when cancellation happens, many unrelated tasks may need to be
> cancelled for the same reason. For example, if a user hits a stop button on
> a large incremental query after they see the first few results, what should
> happen?
>
> the async fetch of more query results should be terminated and the
> connection closed
> background computation to process the remote results into renderable form
> should be stopped
> rendering of not-yet rendered content should be stopped. this might include
> retrieval of secondary content for the items no longer of interest (e.g.,
> album covers for the songs found by a complicated content search)
> the animation of "loading more" should be stopped, and should be replaced
> with "user cancelled"
> etc.
>
> Some of these are different levels of abstraction, and for any non-trivial
> application, there isn't a single piece of code that can know to terminate
> all these activities. This kind of system also requires that cancellation
> support is consistent across many very different types of components. But if
> each activity takes a cancellationToken, in the above example, they just get
> passed the one that would be cancelled if the user hits stop and the right
> thing happens.
>
> Cancellation should be smart
> Libraries can and should be smart about how they cancel. In the case of an
> async query, once the result of a query from the server has come back, it
> may make sense to finish parsing and caching it rather than just reflexively
> discarding it. In the case of a brokerage system, for example, the round
> trip to the servers to get recent data is the expensive part. Once that's
> been kicked off and a result is coming back, having it available in a local
> cache in case the user asks again is efficient. If the application spawned
> another worker, it may be more efficient to let the worker complete (so that
> you can reuse it) rather than abruptly terminate it (requiring discarding of
> the running worker and cached state).
>
> Cancellation is a race
> In an async system, new activities may be getting continuously scheduled by
> asks that are themselves scheduled but not currently running. The act of
> cancelling needs to run in this environment. When cancel starts, you can
> think of it as a signal racing out to catch up with all the computations
> launched to achieve the now-cancelled objective. Some of those may choose to
> complete (see the caching example above). Some may potentially keep
> launching more work before that work itself gets signaled (yeah it's a bug
> but people write buggy code). In an async system, cancellation is not
> prompt. Thus, it's infeasible to ask "has cancellation finished?" because
> that's not a well defined state. Indeed, there can be code scheduled that
> should and does not get cancelled (e.g., the result processor for a pub/sub
> system), but that schedules work that will be cancelled (parse the
> publication of an update to the now-cancelled query).
>
> Cancellation is "don't care"
> Because smart cancellation sometimes doesn't stop anything and in an async
> environment, cancellation is racing with progress, it is at most "best
> efforts". When a set of computations are cancelled, the party canceling the
> activities is saying "I no longer care whether this completes". That is
> importantly different from saying "I want to prevent this from completing".
> The former is broadly usable resource reduction. The latter is only usefully
> achieved in systems with expensive engineering around atomicity and
> transactions. It was amazing how much simpler cancellation logic becomes
> when it's "don't care".
>
> Cancellation requires separation of concerns
> In the pattern where more than one thing gets cancelled, the source of the
> cancellation is rarely one of the things to be cancelled. It would be a
> surprise if a library called for a cancellable activity (load this image)
> cancelled an unrelated server query just because they cared about the same
> cancellation event. I find it interesting that the separation between
> cancellation token and cancellation source mirrors that separation between a
> promise and it's resolver.
>
> Cancellation recovery is transient
> As a task progresses, the cleanup action may change. In the example above,
> if the data table requests more results upon scrolling, it's cancellation
> behavior when there's an outstanding query for more data is likely to be
> quite different than when it's got everything it needs displayed for the
> current page. That's the reason why the "register" method returns a
> capability to unregister the action.
>
>
> I don't want to derail the other threads on the topic, but thought it useful
> to start articulating some of the architectural background for a consistent
> async cancellation architecture.
>
> _______________________________________________
> es-discuss mailing list
> es-discuss@mozilla.org
> https://mail.mozilla.org/listinfo/es-discuss
>
Received on Thursday, 5 March 2015 12:17:19 UTC