[whatwg] Workers feedback from Ian Hickson on 2008-08-06 (public-whatwg-archive@w3.org from August 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 6 Aug 2008 11:24:53 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0808052357450.5140@hixie.dreamhostps.com>
Summary:

 * I've written an intro section which shows how the API is expected to be 
   used. I've tried to illustrate each use case that people raised. I will 
   add more tomorrow.

 * I've completely decoupled workers and Window objects.

 * I've moved APIs to a "utils" object, so that we rarely, if ever, have 
   to add members to the global scope (reduces chances of future 
   collisions).

 * I've simplified the way message channels and ports work.

 * I've replaced the URL string with a Location object.


On Mon, 4 Aug 2008, Jonas Sicking wrote:
> 
> So the first comment is the 'window' and 'self' properties. I don't see 
> a reason for these.

We need some self-reference so that people can check for the presence of 
members on the global scope.

'window' was there to allow library re-use. I've now removed it, leaving 
only 'self'.

I have also simplified the spec to remove the Window concepts from the 
workers.

I also removed all the APIs to a "utils" object, leaving the global scope 
with only:

  self - self reference for checking presence of APIs
  location - the address of the script
  name - the name of the worker, if it is shared
  closing - whether the worker is shutting down
  close() - to shut the worker down
  utils - all the APIs
  onconnect - to receive new connections
  onunload - to run any shut down code
  port - the first connection

I can move more of this to 'utils' if people want. Opinions?


> The fact that the only way to communicate between workers and the main 
> browser context is through MessagePorts seems unnecessarily complex as 
> well as differing from how windows communicates using postMessage.

We can't use the Window object postMessage() communication method, because 
it relies on the objects being able to have references to each other. 

I've tried to simplify the MessagePort interface as follows:

 * messages are now queued, and won't be delivered until either the 
   'start()' method on the port is called, or the 'onmessage' attribute is 
   set to some value.

 * messages are now queued, instead of a port becoming inactive when its 
   other side is suspended.

 * I've made the worker receive its first port as a property of the global 
   object (port) instead of having to listen to the 'connect' event 
   (though the connect event still fires, so you can do shared workers).


On Mon, 4 Aug 2008, Aaron Boodman wrote:
> 
> So for example, I would be for moving over a subset of the navigator and 
> location objects as-is (these seem to work well), but against moving 
> over the document.cookie "interface" (it works poorly).

I agree with porting some subset of 'navigator' over, though since the 
relevant parts of 'navigator' aren't defined even for HTML5 yet, I haven't 
yet done this. There's an issue marker in the spec about this. What bits 
would you like defined?


On Tue, 5 Aug 2008, Aaron Boodman wrote:
> 
> The protocol, host, hostname, port, pathname, and search properties are 
> all very useful. An application might want to compare the origin of a 
> message it receives with it's own host and port, for example.

Ok, I've provided a castrated Location interface.


On Tue, 5 Aug 2008, Aaron Boodman wrote:
> 
> - It seems like we might want an object that represents workers. This 
> would allow us put the 'onload' and 'onerror' events from MessagePort 
> there, instead of on MessagePort, which makes more sense to me (I don't 
> know what it means for a MessagePort to 'load' or 'error' outside of the 
> context of a worker).

The main reason for not having a separate Worker object is that I couldn't 
find anything that would go on it other than the port. You'd still want 
the unload messages going to whoever "owns" the port, not whoever created 
the worker, if you passed the port around. Basically, adding a Worker 
object just seemed like it would double the number of objects, and 
potentially the complexity if we also allow Worker objects to be sent 
along channels, without really providing any new features.


> MessagePort.onunload could then change to 'onclose' to go with the 
> close() method.

The main reason I used 'unload' and 'close' is consistency with how the 
rest of the platform works. (With a Window, you call window.close() to 
invoke window.onunload.) I can change that if people want, though I do 
think consistency is worth keeping here.


> - It's odd to me that the way to establish a channel to a worker depends 
> on whether you are the creator of the worker or not. The creator gets a 
> MessagePort to a new channel back from createWorker(), but any other 
> function must pass a new MessagePort over the original one, and the 
> worker must know to use that secondary port to talk back.

In the old mechanism, from the worker's point of view there was only one 
way to get a new connection, onconnect. The changes to simplify the 
mechanism actually introduced a new mechanism, so it is true that we now 
have two mechanisms, one for the initial creation and one for others. Is 
that a problem?

(I don't think it is, you can use either mechanism, as I will show 
tomorrow in the shared worker examples.)


> I would prefer to see something like:
> 
> void Worker.postMessage(DOMString message)
> void Worker.postMessage(DOMString message, MessagePort port)
> 
> That way the way to establish a new channel is the same for all callers. 
> It also has the advantage of looking similar to a window's postMessage 
> API.

With the exception of Worker being called MessagePort, that's exactly the 
API we have now.


> Here is how the previous two suggestions would look together:
> 
> var worker = new Worker("foo.js");
> worker.onload = function() { ... }
> worker.onerror = function() { ... }
> worker.onunload = function() { ... }  // called when the worker shuts down
> worker.sendMessage("hello!");
> 
> var channel = new MessageChannel();
> channel.port1.onmessage = function(e) { ... }
> worker.sendMessage("please return my call", channel.port2);
>
> // called when the channel is closed, either because the worker shut down taking
> // the other end of the port with it, or because the other end of the
> // port was GC'd, or because the other port was explicitly closed.
> channel.port1.onclose = function() { ... }

The above would in fact work right now, unchanged.


> - The spec says that as soon as a worker is not reachable (determined 
> via GC) from any MessagePort, it is eligible for shutdown. Shutdown 
> would attempt to finish all queued messages, but not allow any new ones.
> 
> This concerns me because it means that workers will have different 
> behavior depending on GC timing. If a worker is not referenced from any 
> port, and it sends an XHR, that XHR may or may not be sent depending on 
> when GC runs. This is different than how XHR behaves normally. 
> Typically, XHR objects that have outstanding IO but no referers will not 
> be GC'd until they complete or fail.

We could say that XHR network I/O must complete, but are you saying you 
want the callbacks to fire as well? If so, what prevents an evil site from 
just setting up an infinite sequence of callbacks and having this 
invisible worker do work forever on the user's machine without the user's 
knowledge?

Note that I've simplified a lot of the GC-related stuff. I couldn't remove 
it all, as this does interact with GC to some extent.


> Finally this does not allow use cases such as creating a worker to 
> synchronize a local database with the network without ever sending 
> notifications back to the parent.

This is addressed now, in that a worker automatically keeps a reference to 
its original creator (in the 'port' property of the global object), so 
these won't get GC'ed even if they never communicate, until the creator 
dies.


> Maybe workers should stay alive as long as any of the following are true:
> 
> - There is script running in them
> - There are messages to them queued
> - There is a messageport alive anywhere that could send messages to them
> - There are "asynchronous operations" (xhr, timers, database
> operations) inside them outstanding

The latter is easy to abuse to keep the workers alive hours beyond the 
point at which the user things they are dead. I think this is a serious 
problem.


> API nitpickery
> 
> - Why is there an ownerWindow property on MessagePort? If I understand 
> correctly, this is just a synonym for the 'window' object of the 
> currently executing script context.  I think it should go away.

It's gone.


> - I'm curious as to why MessagePort and WindowWorker do not implement 
> EventTarget. It seems like we may as well reuse it. And at least for 
> WindowWorker, it seems like the same problems of having multiple 
> functions clobber each other that motivated EventTarget would apply.

They do.


> - The purpose of 'import()' on WindowWorker was not immediately obvious 
> to me from its name. Should it be 'importScript()'? or 'includeScript()' 
> maybe?

Changed to importScript().


> - Should import() accept an array of URLs, so that the UA can fetch them 
> in parallel if it has the ability to do that?

We could do that if you like. Is it needed?


> - The "front-line" nomenclature was a bit weird to me. How about 
> "top-level"?

I wanted to avoid "top-level" because I used that with browsing contexts. 
I agree that "front-line" isn't a good term.


> - Would it be too weird to have createWorker overloaded to take an 
> optional name parameter? This would make the behavior similar to 
> window.open(), which either opens a new window or reuses an existing 
> window with the same name.

People seem to dislike overloading in general, but I don't mind. Anyone 
against this?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 6 August 2008 04:24:53 UTC