- From: Jonas Sicking <jonas@sicking.cc>
- Date: Mon, 3 Sep 2012 14:32:55 -0700
- To: David Bruant <bruant.d@gmail.com>
- Cc: "public-webapps@w3.org" <public-webapps@w3.org>
Hi All, I'd like to start by clearing up some confusion here. That's why I'm responding to the first email in this thread. We at mozilla have no interest in creating an API which runs the risk of causing dead-locks. I would expect this to be true of other browser vendors too, though obviously I can't speak for them. This is why all the proposals that we have been discussing have been to allow *dedicated* workers block on receiving messages *only* from their parent. Dedicated workers always create tree-like structures, and so if children can only block on their parents, you can't end up with a situation where two actors are blocked on each other. It seems hard to ensure that deadlocks can't happen if we try to allow blocking calls on generic MessagePorts, this is why we haven't been interested in doing that. I'm not saying it's impossible, but if someone wants to propose this, please keep in mind that we're not interested in proposals which allow deadlocks, so you'll need to prove that your proposal can't cause deadlocks. It's been mentioned in this thread, and elsewhere, that we need to take caution to not allow deadlocks to happen. That's exactly what we are doing by only allowing blocking calls from dedicated workers to their parents. The other thing that I wanted to talk about is use cases. It has been claimed in this thread that synchronous message passing isn't needed and that people can just write code using async patterns. While this is absolutely true, I would absolutely say that writing asynchronous code is dramatically more complicated than writing synchronous code. This is one of the big reasons that we have workers at all. If writing asynchronous callbacks was almost as easy as writing blocking code, then we could simply ask people to return to the event loop and asynchronously continue their computation through a callback. There are other reasons for workers to exist, but this is one of them. So yes, it's definitely the case that synchronous blocking code doesn't allow any new use cases that were impossible before. But it makes certain code dramatically easier to write, which is of big value to authors. There is also another use-case which has been brought up. As the web platform is becoming more powerful, people have started converting code written for other platforms to javascript+html. For example the emscipten[1] and mandreel[2] allow recompiling C++ code to javascript which is then run in a web browser. Many times such code relies on APIs or libraries which contain blocking calls. Technically it might be possible to automatically rewrite such code to use asynchronous coding patterns, but so far I don't think anyone has managed to do that. One of the big use cases I am interested in solving (though I can't speak for other people at mozilla) is to allow libraries to be written and imported into workers which expose easy-to-use synchronous APIs, and whose implementation makes blocking calls to a the parent in order to implement the API. Such a library would of course require part of the library to also be running in the parent so that it could handle the incoming messages. For example you could imagine a library which implements the synchronous IndexedDB API since browsers so far has not implemented it. Or a library which implements a DOM which allows a worker to modify part of a document rendered by the parent window. So with that in mind, let me express some opinions on the three proposals Olli mentioned in [3] The 3 proposal, i.e. a blocking waitForMessage() function which returns the next message event is something which has come up several times in the past. There's certainly a lot of logic to it, however there's some pretty important problems with it. Consider the following scenario: 1. Worker starts running a task, say a messagehandler in response to a websocket message. 2. Main thread sends async messages A and B to the worker. This message is added to the worker's message queue. 3. While still inside of the task started in step 1, the worker decides that it needs to send a synchronous message to the main thread. So it sends an asynchronous message, X, and starts polling messages using waitForMessage(). 4. It first receives messages A and B, but since they aren't the reply to the message X sent in step 3, it keeps polling. Messages A and B end up in the local "events" array. 5. Message X arrives in the main thread and the main thread performs the calculation and responds with message X'. X' is added to the worker's event queue. 6. Main thread sends async message C to the worker. This message is added to the worker's message queue. 7. The worker keeps polling and now gets message X', so it stops polling and uses the data in X' as result. 8. The worker keeps running the task and eventually gets to the while-loop which processes the "events" array. 9. The first message in the array is A which the worker dispatches, causing the handler for A to start running. 10. The handler for A also wants to send a synchronous message to the main thread. So it sends an asynchronous message Y and starts polling message using waitForMessage(). 11. It first receives message C, but since this isn't the reply to Y, it keeps polling. C ends up in the local "events" array. Note that this is a different "events" variable from the one in step 4 since they are both local variables in different stack frames. 12. The main thread receives Y and sends message Y' back. 13. The worker keeps polling and now gets message X', so it stops polling and uses the data in Y' as result. 14. The worker keeps running the task and eventually gets to the while-loop which processes the "events" array. This is the array from step 11. 15. The first and only message in the array is C which the worker dispatches, causing the handler for C to start running. Note that here we end up running the handlers for the C message while we still haven't handled the B message, despite them being dispatched in a different order from the main thread, and despite neither of them being related to the synchronous communication that the worker wanted to do. The problem is that waitForMessage basically forces you to pull out events and then do something which amounts to spinning an event loop. As soon as you start having multiple event loops like this, you run into lots of complexities. Another problem you have is that the A, B and C events aren't run from the event loop like normal events. They are instead run from whatever callstack existed when someone decided to make synchronous call to the parent. This will give web developers exactly the same problem as we've had with Gecko code spinning the event loop. When doing something like that, you have to be absolutely sure that all code which exists up your call stack can deal with all of these messages getting dispatched. And all of those messages have to be able to deal with being dispatched under the existing callstack. I think anyone which has worked on large codebases can testify to the evils of having arbitrary points in the code which can spin the event loop. In Gecko some of the few places where we spin the event loop is for synchronous XMLHttpRequest and for dialogs like alert() and input(). What makes the situation somewhat more tolerable is that all of those cases only happen when we are running "webpage JS", and we know that "webpage JS" can basically do anything and so we always try to ensure that we are in a stable state before calling webpage JS. Hence it's not as big of a deal when the page spins the event loop. Even so we've run into plenty of bugs due to this even loop spinning. And I would imagine there are lots of pages out there which break on an intermittent basis because things like network events can fire while an alert() dialog is being displayed. I would not want to force javascript developers to deal with the complexities of risking that the event loop might get spun any time they call into a library. They don't stand the same risk as Gecko has of exploitable crashes. But they will have exactly the same risks of getting bugs as Gecko has. I.e. they won't expect that some message handler in the worker runs in the middle of them calling into a jQuery function. The only way to solve these problems is to redo all message dispatching and create a global "events" array which is processed from a point in the code when you know that your callstack is shallow. This would also solve the B and C reordering issue. However it requires that people completely reorder their code to do all event dispatch, rather than using the normal onmessage/addEventListener callback. To put it another way, waitForMessage() makes it impossible to write an independent library that sends synchronous messages to the main thread. You are instead forced to use frameworks which take over all your message handling. This alone makes this solution a non-starter for me since it means not solving the use case of libraries which use synchronous messages in their implementation. We certainly could say that we don't want to solve the use case of libraries using synchronous messages and force anyone that uses waitForMessage() to rewrite their code to use some sort of framework, i.e. essentially saying that you can't use "onmessage" or "addEventListener" for message handling as soon as you have code that uses waitForMessage(). However that still would leave a big risk that people would write code like the one in the last example in [3] causing their pages to intermittently be buggy due to the way events get reordered and handled. Another problem, in addition to the problems mentioned above, is that the code ends up prioritizing message events over other types of tasks that is in the workers event queue. Basically the waitForMessages() function pulls out all events out of the event queue which means that they lose their relative order compared to pending timeouts, XHR events, IndexedDB events, WebSocket events, etc. Using the global "events" array and a messaging framework you can make sure to interleave incoming messages with other tasks. But this adds yet more complexity that pages would have to implement as soon as they want to do synchronous messages. And messages would still have forever lost the order they originally had in the event queue compared to other tasks. That part can't be solved without introducing yet more complex APIs. So in short, I don't think a waitForMessage() function is a workable solution. 1.1 is nifty in that it allows us to use events while dealing with replies from multiple handlers. But it seems like it adds a feature that doesn't have any good use cases (at least I haven't heard any), solely for the purpose of giving us a good reason for using events. The result is both more code for us, and more API surface and syntax for developers. So I strongly prefer doing proposal 1 or 2 instead. I don't think that the event reordering that they are causing is a big problem. When sending synchronous message to a parent you *want* the return message to be given priority over other incoming messages. That is even what the code in proposal 3 does, it just forces developers to do so themselves. I think the subtle problems that proposal 3 causes, as explained above, are much bigger problems. I don't have a strong preference between 1 or 2 though. The way I see it is: Proposal 1 advantages: * More consistent with messaging syntax for async messages. * Allows attaching additional listeners for incoming syncmessages which do things like debug logging. Proposal 2 advantages: * Takes care of the message-type handling automatically which both makes it easier for developer to do so, and means that they won't forget to do so. * No risk that multiple replies are sent. The first bullet in "proposal 2 advantages" is especially nice given that one of the goals of syncmessages is IMO to allow libraries to run in workers and implement synchronous (i.e. author friendly) APIs which are implemented using syncmessages. So I would expect multiple types of sync messages to be common. [1] https://github.com/kripken/emscripten [2] http://www.mandreel.com/ [3] http://lists.w3.org/Archives/Public/public-webapps/2012JulSep/0632.html / Jonas
Received on Monday, 3 September 2012 21:33:53 UTC