Re: Sync API for workers from Glenn Maynard on 2012-09-06 (public-webapps@w3.org from July to September 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Wed, 5 Sep 2012 22:07:59 -0500
To: Jonas Sicking <jonas@sicking.cc>
Cc: Andrea Marchesini <amarchesini@mozilla.com>, David Bruant <bruant.d@gmail.com>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <CABirCh8gV1n7ZXtCL_U+haftTG0Ga9Vzm4uOj+Yppc79ctksew@mail.gmail.com>
On Wed, Sep 5, 2012 at 2:49 AM, Jonas Sicking <jonas@sicking.cc> wrote:

>  The problem with a "Only allow blocking on children, except that
> window can't block on its children" is that you can never block on a
> computation which is implemented in the main thread. I think that cuts
> out some major use cases since todays browsers have many APIs which
> are only implemented in the main thread.
>

You can't have both--you have to choose one of 1: allow blocking upwards,
2: allow blocking downwards, or 3: allow deadlocks.  (I believe #1 is more
useful than #2, but each proposal can go both ways.  I'm ignoring more
complex deadlock detection algorithms that can allow both #1 and #2, of
course, since that's a lot harder.)

 The fact that all the examples that people have used while we have
> been discussing synchronous messaging have spun event loops in
> attempts to deal with messages that couldn't be handled by the
> synchronous poller makes me very much think that so will web
> developers.
>

getMessage doesn't spin the event loop.  "Spinning the event loop" means
that tasks are run from task queues (such as asynchronous callbacks) which
might not be expecting to run, and that tasks might be run recursively;
none of that that happens here.  All this does is block until a message is
available on a specified port (or ports), and then returns it--it's just a
blocking call, like sync XHR or FileReaderSync.

I also wonder if what you are describing doesn't make more sense when
> communicating with a child worker and blocking on receiving a response
> from it.
>

That's what I meant, based on a use case you brought up: "a library which
implements the synchronous IndexedDB API".  I think that's by far the most
interesting category of use cases raised for this feature so far, the
ability to implement sync APIs from async APIs (or several async APIs).

Another trait that this looses is the ability to terminate a worker as
> soon as we know that a synchronous response can't be sent. I.e. in
> proposal 1 and 2 the implementation can terminate the worker as soon
> as the object with the .reply() function is GCed. Note that this
> doesn't expose any GC behavior since a "forever blocked" worker
> behaves exactly the same as a terminated worker. I.e. neither will
> ever execute any code.
>

It's the same: terminate when the other MessagePort is GC'd or its
port.close is called.

(The MessagePort cross-process GC issues might sometimes prevent that, but
that's just another instance of the issue that already exists.  By the way,
do you happen to remember where that issue was last discussed in detail?
I'd like to refresh my memory on the details of this problem.)

 Fewer APIs isn't the same thing as a simpler API. On the contrary, I
> think trying to fit too much functionality into the same set of
> functions can easily result in more complexity.
>

Sure, but I do think they're the same in this case.

I think this is fairly well illustrated by the set of rules that you
> ended up having to set up in order to make the "blocking permitted"
> flags work out correctly.


Explaining this to users is simple: "if you want to block on a port, it
needs to only ever be transferred above its other side, not below".

And your algorithm produces weird edge
> cases, such as that it matters if someone sets up a message "proxy"
> which forwards all messages from one channel to another, rather than
> just passes on one end of a channel. With such a "proxy" your ports
> end up touching more threads and so are more likely to clear the
> "blocking permitted" flag. In all other cases such a proxy is
> transparent.
>

The previous proposals allow nothing *but* blocking on your parent (or
child), so if you have threads A <-> B <-> C and you want to pass messages
from C to A, you *have* to proxy messages across (and keep thread B alive
forever as a result).  That's a big part of why we have MessagePorts to
begin with.  That aside, the iteration below removes most of the cases
where you can't block, eg. passing a port up to your parent and then down
to a sibling.



This is a bit wordier, but I think it's easier to understand, since all it
cares about is where the ports was originally created.

- Add a "direction" flag to ports, which may have the values "up", "down",
"disallowed" and "initial", and is initially "initial".
- Add an "original thread" value to ports, which is set to the current
thread at MessageChannel creation time.  This value is preserved across
structured clone.
- When a thread receives a port, compare its "original thread" with the
current thread.
  - If the "root" thread of "original thread" is not on the same as that of
the current thread, mark the port as "disallowed".
  - Otherwise, if "original thread" is the current thread, mark the port as
"initial".
  - Otherwise, if "original thread" is a descendant of the current thread,
mark the port as "up".
  - Otherwise, mark the port as "down".

The UI thread and shared workers are "root" threads.  The root thread of a
dedicated worker is its one ancestor thread which is a root thread.

When required to "mark a port as X", do the following:
 - If the port's "direction" flag is not equal to "initial", let X be
"disallowed".
 - If the port's "direction" flag is already equal to X, terminate these
steps.
 - If X is equal to "initial", let X be "disallowed".
 - If X is "up", let opposite be "down".  If X is "down", let opposite be
"up".  Otherwise, let opposite be "disallowed".
 - Set port's "direction" flag be X.
 - Signal the other side of the port, instructing the thread to mark that
port as opposite.

If a blocking getMessage is invoked:
 - If "direction" is not "up", throw an exception.  If, during a blocking
call, the port is no longer marked "up", throw an exception.

This can be summed up for users simply: "to allow blocking on a port, only
transfer it up from where it starts, and only transfer the other side down
from where it starts".  Aside from that, and staying within the same thread
tree, you can pass the ports around however you like.

This also means that for the basic case of libraries that create a
black-boxed thread and return a port to talk to it, you don't really have
to care about any of this--within that tree, you can toss it around all you
want.

I also changed the "ascendant/descendant"-based scheme into "descendant or
not descendant".  This has two big advanges.  First, it means you can pass
ports anywhere up the tree, even to sibling and "uncle" nodes--anything
that isn't a descendant.  This means that in a lot of cases (where ports
come from a "leaf" worker), you don't have to care about this at all.
Second, this allows shared workers to block on their own dedicated
workers.  This means shared workers can use things like "simulated sync
IndexedDB" implementations, too.

To allow blocking on parents instead of children, if that's what we end up
wanting, just flip the "direction is not up" step to "direction is not
down".


On Wed, Sep 5, 2012 at 9:03 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> The part that I dislike about having single channel used for both sync
> and async messaging is that you end up with one or more async
> listeners which expect to get notified about all incoming messages,
> but then you have an API which "steals" a message away from those
> listeners. On top of that it has to do that stealing without any way
> of ensuring ensuring that it actually steals the right message.


That's exactly the reason to use MessagePorts: to categorize messages.

But being (mostly) agnostic to if the other side is using sync
> messages or not doesn't mean that the other side uses both sync and
> async messaging!


But you still have to do extra work on the sending side to support both
sync and async receiving, since you have to hand it the right type of
channel.

Couldn't we just make calling getMessage permanently disable .onmessage
dispatching (perhaps until the port is posted again)?  That would make it
very hard to accidentally use both, while encapsulating knowledge about
which way it's being used to the receiver, so the sender doesn't need to
carefully send the receiver the right "type" of MessageChannel.  (I really
don't feel this is necessary, but I'd prefer it to multiple MessagePort
interfaces.)

-- 
Glenn Maynard
Received on Thursday, 6 September 2012 03:08:28 UTC