Re: Sync API for workers from Glenn Maynard on 2012-09-04 (public-webapps@w3.org from July to September 2012)

From: Glenn Maynard <glenn@zewt.org>
Date: Mon, 3 Sep 2012 22:55:29 -0500
To: Jonas Sicking <jonas@sicking.cc>
Cc: Andrea Marchesini <amarchesini@mozilla.com>, David Bruant <bruant.d@gmail.com>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <CABirCh-EGZcK_cC_Kn0+zGX9-GH8SvM0zzKCaS0PWvCLT7MO3Q@mail.gmail.com>
On Mon, Sep 3, 2012 at 9:30 PM, Jonas Sicking <jonas@sicking.cc> wrote:

> We can't generically block on children since we can't let the main
> window block on a child. That would effectively permit synchronous IO
> from the main thread which is not something that we want to allow.
>

The UI thread would never be allowed to block, of course.  The "getMessage"
API itself would never even be exposed in the UI thread, regardless of the
state of this flag.  I picture this being methods on
DedicatedWorkerGlobalScope and SharedWorkerGlobalScope.
(SharedWorkerGlobalScope's version would only get the zero-timeout polling
version.)

Also, the last "Otherwise, the "blocking permitted" flag of both
> "port" and its entangled port are cleared." has to apply any time when
> sending a port through a generic port rather than through a dedicated
> worker parent/child? When communicating with a generic port we never
> have any idea what is on the receiving end. And what is on the
> receiving end can change between the time when a message is sent, and
> when it is received.
>

Well, you can simplify the algorithm by always clearing the flag when
posting through anything but a dedicated worker's port.  That's
straightforward since you always know at send time what the relationship to
the receiver is--if it's through worker.postMessage it's to a child, and if
it's through postMessage on DedicatedWorkerGlobalScope it's to the parent.
It would be nice to do this in the more generic way, but this would be
enough for a lot of cases.

I suspect there's a way to make the general-case version work, though.  For
example, when a worker is transferred to another thread, include the thread
ID sending the port, as part of the metadata of the transfer.  The receiver
then knows where the port came from, and it knows itself, so it can see
where it lies relative to the sender to determine whether it was an "up",
"down" or a transfer that always invalidates both sides (sent to a worker
that is neither an ancestor nor a descendant).  If it determines that the
transfer invalidated the other side, then it sends a message across the
pipe saying "whoever you are, you need to clear your blocking-permitted
flag".  This would apply even if the other side changes hands in the
meantime, since once that flag is set, it's set permanently.

All that aside, do any implementations actually put dedicated workers in a
different process than their creator (if so, I'm curious as to why)?  This
should all be very simple if you don't do that, and I can't think of any
reason to--and good reasons not to, eg. fast ArrayBuffer transfers become
much harder.  Shared workers may cross processes, but dedicated workers?

Your proposal makes it possible for pages to avoid the problems
> described in my email by setting up a separate channel used for
> synchronous messages. But some of the problems still remain. As soon
> as a message channel is used for both synchronous and asynchronous
> messages you can easily get into trouble. If someone calls the
> blocking waitForMessage() function and receive a message which was
> intended to be delivered asynchronously there is no good recourse.
> Basically any time that happens there are only bad options available,
> many of which have subtle problems that only happen intermittently
> like the ones I described in my initial email.
>
> Since that is the case, I think the best solution is to always force
> separate channels to be used for synchronous and asynchronous
> messages.
>

If you have messages that must be received synchronously, and other
messages that must be received asynchronously, then that's precisely a time
when you'd want to use MessagePorts to separate them.  That's what they're
for.  It's the same as using separate MessagePorts when you have two
unrelated libraries receiving their own messages, so each library only sees
messages intended for it.

I agree that APIs that encourage people to write brittle code should be
squinted at carefully, and we should definitely examine all APIs for that
problem, but really I don't think it's the case here.

It seems much simpler to me to have only one kind of MessagePort, each
representing only one message channel.  Importantly, the sending side
doesn't have to know whether the receiving side is using a sync API to
receive it or not--in other words, that information doesn't have to be part
of the user's messaging protocol.  As a simple example, you can have a
worker thread whose protocol is simply:

- Send a message to the worker's port with a word.
- The worker sends a message to its parent with the word's definition.
(The mechanism of this lookup is black-boxed--it might be IndexedDB, or a
network request, or a complex combination.)

This means that the caller can use this worker's API synchronously or
asynchronously, without needing to define two interfaces and without the
child knowing the difference.  You can use it synchronously (if you're in a
worker yourself):

var worker = createDictionaryWorker();
worker.postMessage("elephant");
var definition = getMessage(worker); // wait for the answer

or asynchronously (from a worker or the UI thread):

var worker = createDictionaryWorker();
worker.postMessage("elephant");
worker.onmessage = function(e) { var definition = e.message; }

The worker doesn't care:

onmessage = function(e) {
    if(e.message == "elephant")
        postMessage("a large animal");
    else
        postMessage("don't know"); // worst dictionary ever
}

All in all this is a much more complicated setup though. I think it'd
> be worth keeping the simpler API like the 1 or 2 proposals even if we
> do introduce SyncMessageChannel since that likely covers the majority
> of use cases.
>

Those proposals seem much more complex to me.  You can't send a message
that will be received synchronously unless the other side prompts you for
one first; you have to care whether the other side is acting synchronously
or asynchronously.  It's a bunch of new concepts ("synchronous messages",
"message replies"), instead of a simple (to users, at least) addition to
MessagePorts.

-- 
Glenn Maynard
Received on Tuesday, 4 September 2012 03:55:58 UTC