Re: Web workers: synchronously handling events from Jonas Sicking on 2010-12-29 (public-webapps@w3.org from October to December 2010)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 29 Dec 2010 01:56:48 -0800
To: Glenn Maynard <glenn@zewt.org>
Cc: public-webapps@w3.org
Message-ID: <AANLkTi=5Dpme5Wmb8hjCtOS+vr8D7yTrOkaNvvZp8Vvo@mail.gmail.com>
On Sun, Dec 26, 2010 at 4:29 PM, Glenn Maynard <glenn@zewt.org> wrote:
> Havn't been able to find this in the spec: is there a way to allow
> processing messages synchronously during a number-crunching worker
> thread?
>
> A typical use case is a CPU-intensive task that needs to be aborted
> due to user action.  For example, it might take 250ms to run a search
> on a local database.  This can be triggered on each keystroke,
> updating as the user types.  If the user enters another letter into
> the text field before the previous search completes, that search may
> no longer be needed; the running task should be cancelled so it can
> start on the new text.  However, if a "cancel" message is sent to the
> thread, it won't receive it until it returns from the work it's doing.
>
> Creating a cancellation message port to run periodically would deal
> with this nicely, where a port.dispatch() function would dispatch the
> first waiting event that came from that port, if any:
>
>    var cancelling = false;
>    cancellation_port.onmessage = function(event) { cancelling = true; }
>    while(!finished && !cancelling)
>    {
>        if(cancellation_port.dispatch())
>            continue; /* if a message was received, cancelling may
> have been modified */
>        /* do work */
>    }
>
> Terminating the whole worker thread is the blunt way to do it; that's
> no good since it requires starting a new thread for every keystroke,
> and there may be significant startup costs (eg. loading search data).
>
> The only way I know to do this currently is to return periodically to
> allow events to run, and to resume the work with setTimeout(f, 1).
> That's ugly and requires writing algorithms in a specific,
> inconvenient way that shouldn't be required in a thread.  Browsers
> clamping timeouts to a minimum of 5-10ms also breaks this approach;
> hopefully they won't do that from worker threads, but I have a feeling
> they will.
>
> I'd hate to be stuck with ugly messaging hacks to achieve this, eg.
> having to use other mechanisms not meant for cross-thread messaging,
> like a database.  Am I missing something in the API?

I definitely agree that workers need more features to take advantage
of the fact that they are running on their own event loop. One of
which is the one you are asking for.

We could add something like:

boolean checkPendingMessages();

which would return true if there are pending messages. The script
running in the worker could use this information to return to the
event loop to process these messages only when needed. One downside
with this API is that there is a risk that people could write:

if (checkPendingMessages()) {
  doStuff();
}
... code here ...
if (checkPendingMessages()) {
  continueToDoStuff();
}

And expect that either none of the if statements are entered, or both
are. Not realizing that the return value from checkPendingMessages
could change at any point. I'm not terribly worried about this risk,
but it's definitely there.

An alternative solution would be something like:

MessageInfo getMessageIfExists();

which would return an object containing the message data if a message
was pending and remove the message from the queue of pending messages.
If there are no pending messages null is returned and the message
queue remains empty. This makes it significantly harder to write code
like the above. However it might make coding somewhat more awkward
since you likely will have to deal with messages arriving two ways,
through the normal event loop and through getMessageIfExists.


Unrelated to the above question, I do think we should add an API like

MessageInfo getMessage();

which would return information about the next pending message and
remove said message from the queue. If no message is pending when the
function is called, the function waits until a message arrives and
only then returns. This allows writing code which could be
considerably cleaner and easier to write since you don't have to
return to the main event loop to retrieve additional data. It doesn't
however let you solve the use case described in the initial email in
this thread. But since it's related I thought it was worth bringing up
here.

/ Jonas
Received on Wednesday, 29 December 2010 09:57:42 UTC