[whatwg] Web Workers and MessagePort feedback from Jonas Sicking on 2008-08-05 (public-whatwg-archive@w3.org from August 2008)

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 05 Aug 2008 14:52:39 -0700
Message-ID: <4898CBA7.2080801@sicking.cc>
Aaron Boodman wrote:
> I'm still digesting the Web Worker proposal, but here is some
> feedback. Sorry it is a bit long.
> 
> 
> Structural API stuff:
> 
> - I still haven't really internalized the need to either have workers
> speak directly to anyone other than the person who created them, or
> the other use cases that MessageChannels are intended for. There is a
> lot of complexity here. I think we need to add some requirements and
> motivations to the top of the doc, and some code samples showing the
> intended usage before implementors can really decide whether it's
> worth taking on.
> 
> - It seems like we might want an object that represents workers. This
> would allow us put the 'onload' and 'onerror' events from MessagePort
> there, instead of on MessagePort, which makes more sense to me (I
> don't know what it means for a MessagePort to 'load' or 'error'
> outside of the context of a worker). MessagePort.onunload could then
> change to 'onclose' to go with the close() method.
> 
> It seems like over time, we might want to be able to perform other
> operations on a worker and having the worker object might be handy.

Agreed.

> I know this is weird wrt GC when combined with MessagePorts, and I
> don't have a proposed solution.

I don't think we should say much regarding GC at all. All we should say 
is that GC should not affect the operation of the page. I.e. it is not 
allowed to GC an Worker that someone still has references to, or a 
Worker that has XHR loads in progress or timers pending.

Very few other specs mention GC and I haven't noticed that ever being a 
problem. For example everyone agrees that it's a bug that gecko 
sometimes GCs the parent of a node, if you're not actively holding any 
references to anything in the parent chain.

> - It's odd to me that the way to establish a channel to a worker
> depends on whether you are the creator of the worker or not. The
> creator gets a MessagePort to a new channel back from createWorker(),
> but any other function must pass a new MessagePort over the original
> one, and the worker must know to use that secondary port to talk back.
> 
> I would prefer to see something like:
> 
> void Worker.postMessage(DOMString message)
> void Worker.postMessage(DOMString message, MessagePort port)
> 
> That way the way to establish a new channel is the same for all
> callers. It also has the advantage of looking similar to a window's
> postMessage API.

Agreed.

> Here is how the previous two suggestions would look together:
> 
> var worker = new Worker("foo.js");
> worker.onload = function() { ... }
> worker.onerror = function() { ... }
> worker.onunload = function() { ... }  // called when the worker shuts down
> worker.sendMessage("hello!");

So I really like this API. However it makes it completely impossible to 
ever pass worker objects across threads. I.e. we could never allow:

worker1.postMessage("...", worker2);

This would be very strange if we had .onload, .onerror etc on the worker 
object itself since those properties wouldn't make much sense living in 
multiple "threads" at once.

While I agree direct communication between sibling workers is an 
edgecase, it's something I would prefer to not make impossible for 
future versions of the spec.

Though I just realized that we could cover that case using only 
MessagePorts. So we say that you can only communicate with your creator, 
and any children using direct .postMessage. If you want to more complex 
communication patterns then set up MessagePorts.

> - The spec says that as soon as a worker is not reachable (determined
> via GC) from any MessagePort, it is eligible for shutdown. Shutdown
> would attempt to finish all queued messages, but not allow any new
> ones.
> 
> This concerns me because it means that workers will have different
> behavior depending on GC timing. If a worker is not referenced from
> any port, and it sends an XHR, that XHR may or may not be sent
> depending on when GC runs. This is different than how XHR behaves
> normally. Typically, XHR objects that have outstanding IO but no
> referers will not be GC'd until they complete or fail.
> 
> Finally this does not allow use cases such as creating a worker to
> synchronize a local database with the network without ever sending
> notifications back to the parent.
> 
> Maybe workers should stay alive as long as any of the following are true:
> 
> - There is script running in them
> - There are messages to them queued
> - There is a messageport alive anywhere that could send messages to them
> - There are "asynchronous operations" (xhr, timers, database
> operations) inside them outstanding

Agreed. Like I said above, I think the less we say about GC the better. 
GC effects should not be noticeable to the page.

> - Why is there an ownerWindow property on MessagePort? If I understand
> correctly, this is just a synonym for the 'window' object of the
> currently executing script context.  I think it should go away.

If we put postMessage directly on the Worker object we don't need to 
mention MessagePorts in the Web workers spec at all. They can just be an 
orthogonal specs.

> - The purpose of 'import()' on WindowWorker was not immediately
> obvious to me from its name. Should it be 'importScript()'? or
> 'includeScript()' maybe?
> 
> - Should import() accept an array of URLs, so that the UA can fetch
> them in parallel if it has the ability to do that?

Agreed on both.

> - The string URL property on the WindowWorker interface is less useful
> than the parsed structure that window.location has. Can we use
> something like this instead, except making it read-only?

Why do we need it at all?

If we do think it's useful, most of the uses that I've seen for the 
parsed URL structure has been to set the .hash in order to scroll around 
on a page or communicate between iframes of different origins (ugh!!). 
Neither of these applies here I'd say.

> - The "front-line" nomenclature was a bit weird to me. How about "top-level"?

I didn't try to grokk this part yet. Is it just about estabilishing 
lifetime of the worker objects? If so, see my previous comments about GC.

> - Would it be too weird to have createWorker overloaded to take an
> optional name parameter? This would make the behavior similar to
> window.open(), which either opens a new window or reuses an existing
> window with the same name.

What would it be used for? window.open uses the name so you can target 
links at it which doesn't seem like it applies here either.

/ Jonas
Received on Tuesday, 5 August 2008 14:52:39 UTC