Re: Sync API for workers from Jonas Sicking on 2012-09-03 (public-webapps@w3.org from July to September 2012)

From: Jonas Sicking <jonas@sicking.cc>
Date: Mon, 3 Sep 2012 14:32:55 -0700
To: David Bruant <bruant.d@gmail.com>
Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <CA+c2ei8Z-CiyTA57MwYdVKb_0F6HDgzMhEGuKa82k13LQBhVAw@mail.gmail.com>
Hi All,

I'd like to start by clearing up some confusion here. That's why I'm
responding to the first email in this thread.

We at mozilla have no interest in creating an API which runs the risk
of causing dead-locks. I would expect this to be true of other browser
vendors too, though obviously I can't speak for them.

This is why all the proposals that we have been discussing have been
to allow *dedicated* workers block on receiving messages *only* from
their parent. Dedicated workers always create tree-like structures,
and so if children can only block on their parents, you can't end up
with a situation where two actors are blocked on each other.

It seems hard to ensure that deadlocks can't happen if we try to allow
blocking calls on generic MessagePorts, this is why we haven't been
interested in doing that. I'm not saying it's impossible, but if
someone wants to propose this, please keep in mind that we're not
interested in proposals which allow deadlocks, so you'll need to prove
that your proposal can't cause deadlocks.

It's been mentioned in this thread, and elsewhere, that we need to
take caution to not allow deadlocks to happen. That's exactly what we
are doing by only allowing blocking calls from dedicated workers to
their parents.

The other thing that I wanted to talk about is use cases. It has been
claimed in this thread that synchronous message passing isn't needed
and that people can just write code using async patterns. While this
is absolutely true, I would absolutely say that writing asynchronous
code is dramatically more complicated than writing synchronous code.
This is one of the big reasons that we have workers at all. If writing
asynchronous callbacks was almost as easy as writing blocking code,
then we could simply ask people to return to the event loop and
asynchronously continue their computation through a callback. There
are other reasons for workers to exist, but this is one of them.

So yes, it's definitely the case that synchronous blocking code
doesn't allow any new use cases that were impossible before. But it
makes certain code dramatically easier to write, which is of big value
to authors.

There is also another use-case which has been brought up. As the web
platform is becoming more powerful, people have started converting
code written for other platforms to javascript+html. For example the
emscipten[1] and mandreel[2] allow recompiling C++ code to javascript
which is then run in a web browser. Many times such code relies on
APIs or libraries which contain blocking calls. Technically it might
be possible to automatically rewrite such code to use asynchronous
coding patterns, but so far I don't think anyone has managed to do
that.

One of the big use cases I am interested in solving (though I can't
speak for other people at mozilla) is to allow libraries to be written
and imported into workers which expose easy-to-use synchronous APIs,
and whose implementation makes blocking calls to a the parent in order
to implement the API. Such a library would of course require part of
the library to also be running in the parent so that it could handle
the incoming messages.

For example you could imagine a library which implements the
synchronous IndexedDB API since browsers so far has not implemented
it. Or a library which implements a DOM which allows a worker to
modify part of a document rendered by the parent window.

So with that in mind, let me express some opinions on the three
proposals Olli mentioned in [3]

The 3 proposal, i.e. a blocking waitForMessage() function which
returns the next message event is something which has come up several
times in the past. There's certainly a lot of logic to it, however
there's some pretty important problems with it.

Consider the following scenario:

1. Worker starts running a task, say a messagehandler in response to a
websocket message.
2. Main thread sends async messages A and B to the worker. This
message is added to the worker's message queue.
3. While still inside of the task started in step 1, the worker
decides that it needs to send a synchronous message to the main
thread. So it sends an asynchronous message, X, and starts polling
messages using waitForMessage().
4. It first receives messages A and B, but since they aren't the reply
to the message X sent in step 3, it keeps polling. Messages A and B
end up in the local "events" array.
5. Message X arrives in the main thread and the main thread performs
the calculation and responds with message X'. X' is added to the
worker's event queue.
6. Main thread sends async message C to the worker. This message is
added to the worker's message queue.
7. The worker keeps polling and now gets message X', so it stops
polling and uses the data in X' as result.
8. The worker keeps running the task and eventually gets to the
while-loop which processes the "events" array.
9. The first message in the array is A which the worker dispatches,
causing the handler for A to start running.
10. The handler for A also wants to send a synchronous message to the
main thread. So it sends an asynchronous message Y and starts polling
message using waitForMessage().
11. It first receives message C, but since this isn't the reply to Y,
it keeps polling. C ends up in the local "events" array. Note that
this is a different "events" variable from the one in step 4 since
they are both local variables in different stack frames.
12. The main thread receives Y and sends message Y' back.
13. The worker keeps polling and now gets message X', so it stops
polling and uses the data in Y' as result.
14. The worker keeps running the task and eventually gets to the
while-loop which processes the "events" array. This is the array from
step 11.
15. The first and only message in the array is C which the worker
dispatches, causing the handler for C to start running.

Note that here we end up running the handlers for the C message while
we still haven't handled the B message, despite them being dispatched
in a different order from the main thread, and despite neither of them
being related to the synchronous communication that the worker wanted
to do.

The problem is that waitForMessage basically forces you to pull out
events and then do something which amounts to spinning an event loop.
As soon as you start having multiple event loops like this, you run
into lots of complexities.

Another problem you have is that the A, B and C events aren't run from
the event loop like normal events. They are instead run from whatever
callstack existed when someone decided to make synchronous call to the
parent. This will give web developers exactly the same problem as
we've had with Gecko code spinning the event loop. When doing
something like that, you have to be absolutely sure that all code
which exists up your call stack can deal with all of these messages
getting dispatched. And all of those messages have to be able to deal
with being dispatched under the existing callstack.

I think anyone which has worked on large codebases can testify to the
evils of having arbitrary points in the code which can spin the event
loop. In Gecko some of the few places where we spin the event loop is
for synchronous XMLHttpRequest and for dialogs like alert() and
input(). What makes the situation somewhat more tolerable is that all
of those cases only happen when we are running "webpage JS", and we
know that "webpage JS" can basically do anything and so we always try
to ensure that we are in a stable state before calling webpage JS.
Hence it's not as big of a deal when the page spins the event loop.
Even so we've run into plenty of bugs due to this even loop spinning.
And I would imagine there are lots of pages out there which break on
an intermittent basis because things like network events can fire
while an alert() dialog is being displayed.

I would not want to force javascript developers to deal with the
complexities of risking that the event loop might get spun any time
they call into a library. They don't stand the same risk as Gecko has
of exploitable crashes. But they will have exactly the same risks of
getting bugs as Gecko has. I.e. they won't expect that some message
handler in the worker runs in the middle of them calling into a jQuery
function.

The only way to solve these problems is to redo all message
dispatching and create a global "events" array which is processed from
a point in the code when you know that your callstack is shallow. This
would also solve the B and C reordering issue. However it requires
that people completely reorder their code to do all event dispatch,
rather than using the normal onmessage/addEventListener callback.

To put it another way, waitForMessage() makes it impossible to write
an independent library that sends synchronous messages to the main
thread. You are instead forced to use frameworks which take over all
your message handling. This alone makes this solution a non-starter
for me since it means not solving the use case of libraries which use
synchronous messages in their implementation.

We certainly could say that we don't want to solve the use case of
libraries using synchronous messages and force anyone that uses
waitForMessage() to rewrite their code to use some sort of framework,
i.e. essentially saying that you can't use "onmessage" or
"addEventListener" for message handling as soon as you have code that
uses waitForMessage(). However that still would leave a big risk that
people would write code like the one in the last example in [3]
causing their pages to intermittently be buggy due to the way events
get reordered and handled.

Another problem, in addition to the problems mentioned above, is that
the code ends up prioritizing message events over other types of tasks
that is in the workers event queue. Basically the waitForMessages()
function pulls out all events out of the event queue which means that
they lose their relative order compared to pending timeouts, XHR
events, IndexedDB events, WebSocket events, etc.

Using the global "events" array and a messaging framework you can make
sure to interleave incoming messages with other tasks. But this adds
yet more complexity that pages would have to implement as soon as they
want to do synchronous messages. And messages would still have forever
lost the order they originally had in the event queue compared to
other tasks. That part can't be solved without introducing yet more
complex APIs.

So in short, I don't think a waitForMessage() function is a workable solution.

1.1 is nifty in that it allows us to use events while dealing with
replies from multiple handlers. But it seems like it adds a feature
that doesn't have any good use cases (at least I haven't heard any),
solely for the purpose of giving us a good reason for using events.
The result is both more code for us, and more API surface and syntax
for developers.

So I strongly prefer doing proposal 1 or 2 instead.

I don't think that the event reordering that they are causing is a big
problem. When sending synchronous message to a parent you *want* the
return message to be given priority over other incoming messages. That
is even what the code in proposal 3 does, it just forces developers to
do so themselves. I think the subtle problems that proposal 3 causes,
as explained above, are much bigger problems.

I don't have a strong preference between 1 or 2 though. The way I see it is:

Proposal 1 advantages:
* More consistent with messaging syntax for async messages.
* Allows attaching additional listeners for incoming syncmessages
which do things like debug logging.

Proposal 2 advantages:
* Takes care of the message-type handling automatically which both
makes it easier for developer to do so, and means that they won't
forget to do so.
* No risk that multiple replies are sent.

The first bullet in "proposal 2 advantages" is especially nice given
that one of the goals of syncmessages is IMO to allow libraries to run
in workers and implement synchronous (i.e. author friendly) APIs which
are implemented using syncmessages. So I would expect multiple types
of sync messages to be common.

[1] https://github.com/kripken/emscripten
[2] http://www.mandreel.com/
[3] http://lists.w3.org/Archives/Public/public-webapps/2012JulSep/0632.html

/ Jonas
Received on Monday, 3 September 2012 21:33:53 UTC