[whatwg] Workers feedback from Aaron Boodman on 2008-08-07 (public-whatwg-archive@w3.org from August 2008)

From: Aaron Boodman <aa@google.com>
Date: Wed, 6 Aug 2008 19:12:57 -0700
Message-ID: <278fd46c0808061912i46b6564g3512176eb8b50776@mail.gmail.com>
On Wed, Aug 6, 2008 at 4:24 AM, Ian Hickson <ian at hixie.ch> wrote:
>  * I've written an intro section which shows how the API is expected to be
>   used. I've tried to illustrate each use case that people raised. I will
>   add more tomorrow.

Thanks, that helps a lot.

>  * I've moved APIs to a "utils" object, so that we rarely, if ever, have
>   to add members to the global scope (reduces chances of future
>   collisions).

I am opposed to the utils object. I don't see any precedent for this
anywhere, and it just feels ugly to me. I liked it the way you had it
before, with these APIs in a shared base interface.

>  * I've replaced the URL string with a Location object.

Thanks :).

> I've tried to simplify the MessagePort interface as follows:
>
>  * messages are now queued, and won't be delivered until either the
>   'start()' method on the port is called, or the 'onmessage' attribute is
>   set to some value.
>
>  * messages are now queued, instead of a port becoming inactive when its
>   other side is suspended.

Can you explain the rationale for these two changes?

>  * I've made the worker receive its first port as a property of the global
>   object (port) instead of having to listen to the 'connect' event
>   (though the connect event still fires, so you can do shared workers).

I liked it the way you had it, before. I'd rather the first connection
to a worker wasn't a special case, either for the worker or for the
worker's creator.

That's also one reason why I like having a separate Worker object and
having the two-step process of creating the worker, then sending it a
message. It means that creating a new channel to a worker is always
the same.

> On Mon, 4 Aug 2008, Aaron Boodman wrote:
>>
>> So for example, I would be for moving over a subset of the navigator and
>> location objects as-is (these seem to work well), but against moving
>> over the document.cookie "interface" (it works poorly).
>
> I agree with porting some subset of 'navigator' over, though since the
> relevant parts of 'navigator' aren't defined even for HTML5 yet, I haven't
> yet done this. There's an issue marker in the spec about this. What bits
> would you like defined?

The ones that are most often used for browser detection are most important, so:

- appName
- appCodeName
- appVersion
- platform
- userAgent

I know the whole business of browser detection is a big mess right
now, so if you're working on defining something better, I'd be open to
having some combination of the old navigator object and that new thing
in workers. But there is a lot of code that is very carefully crafted
to analyze the navigator object, so maybe it's best not to mess with
that too much.

> On Tue, 5 Aug 2008, Aaron Boodman wrote:
>>
>> - It seems like we might want an object that represents workers. This
>> would allow us put the 'onload' and 'onerror' events from MessagePort
>> there, instead of on MessagePort, which makes more sense to me (I don't
>> know what it means for a MessagePort to 'load' or 'error' outside of the
>> context of a worker).
>
> The main reason for not having a separate Worker object is that I couldn't
> find anything that would go on it other than the port. You'd still want
> the unload messages going to whoever "owns" the port, not whoever created
> the worker, if you passed the port around. Basically, adding a Worker
> object just seemed like it would double the number of objects, and
> potentially the complexity if we also allow Worker objects to be sent
> along channels, without really providing any new features.

I think that 'load', 'error', and 'unload' could go on the worker. As
far as I can tell, the only thing 'load' and 'error' are used for is
telling the creator of a worker that the worker loaded or failed to
load. In that case, it seems wrong to throw them on MessagePort, since
MessagePorts are also used for many other things.

I also still think that Workers could have their own sendMessage. The
messages sent to this would be delivered to the worker as 'message'
events targeted at WorkerGlobalObject (eliminating the need for
onconnect?). This would make Workers and postMessage very similar to
Window and postMessage, which seems nice to me.

>> MessagePort.onunload could then change to 'onclose' to go with the
>> close() method.
>
> The main reason I used 'unload' and 'close' is consistency with how the
> rest of the platform works. (With a Window, you call window.close() to
> invoke window.onunload.) I can change that if people want, though I do
> think consistency is worth keeping here.

I think the concept of a port becoming inactive is interesting in all
the cases MessagePorts are used, so this should stay. In fact, should
it be called 'oninactive'?

>> I would prefer to see something like:
>>
>> void Worker.postMessage(DOMString message)
>> void Worker.postMessage(DOMString message, MessagePort port)
>>
>> That way the way to establish a new channel is the same for all callers.
>> It also has the advantage of looking similar to a window's postMessage
>> API.
>
> With the exception of Worker being called MessagePort, that's exactly the
> API we have now.
>
>> Here is how the previous two suggestions would look together:
>>
>> var worker = new Worker("foo.js");
>> worker.onload = function() { ... }
>> worker.onerror = function() { ... }
>> worker.onunload = function() { ... }  // called when the worker shuts down
>> worker.sendMessage("hello!");
>>
>> var channel = new MessageChannel();
>> channel.port1.onmessage = function(e) { ... }
>> worker.sendMessage("please return my call", channel.port2);
>>
>> // called when the channel is closed, either because the worker shut down taking
>> // the other end of the port with it, or because the other end of the
>> // port was GC'd, or because the other port was explicitly closed.
>> channel.port1.onclose = function() { ... }
>
> The above would in fact work right now, unchanged.

Fair enough. I still think the distinction between Workers and
MessagePorts may be important.

>> - The spec says that as soon as a worker is not reachable (determined
>> via GC) from any MessagePort, it is eligible for shutdown. Shutdown
>> would attempt to finish all queued messages, but not allow any new ones.
>>
>> This concerns me because it means that workers will have different
>> behavior depending on GC timing. If a worker is not referenced from any
>> port, and it sends an XHR, that XHR may or may not be sent depending on
>> when GC runs. This is different than how XHR behaves normally.
>> Typically, XHR objects that have outstanding IO but no referers will not
>> be GC'd until they complete or fail.
>
> We could say that XHR network I/O must complete, but are you saying you
> want the callbacks to fire as well? If so, what prevents an evil site from
> just setting up an infinite sequence of callbacks and having this
> invisible worker do work forever on the user's machine without the user's
> knowledge?
>
> Note that I've simplified a lot of the GC-related stuff. I couldn't remove
> it all, as this does interact with GC to some extent.
>
>> Finally this does not allow use cases such as creating a worker to
>> synchronize a local database with the network without ever sending
>> notifications back to the parent.
>
> This is addressed now, in that a worker automatically keeps a reference to
> its original creator (in the 'port' property of the global object), so
> these won't get GC'ed even if they never communicate, until the creator
> dies.
>
>> Maybe workers should stay alive as long as any of the following are true:
>>
>> - There is script running in them
>> - There are messages to them queued
>> - There is a messageport alive anywhere that could send messages to them
>> - There are "asynchronous operations" (xhr, timers, database
>> operations) inside them outstanding
>
> The latter is easy to abuse to keep the workers alive hours beyond the
> point at which the user things they are dead. I think this is a serious
> problem.

We talked about this out on IRC, but for those playing along at home:

The idea of having a special relationship between a worker and the
page that created it bothers me. Especially since otherwise, your
proposal elegantly makes all workers independent of particular pages.
One example of how this stinks is that page A could create a worker
and then share it with page B. If the worker creates an XHR, and then
page A shuts down before page B, the behavior is different than if the
pages shut down in the reverse order.

So, I propose the following:

Workers should stay alive as long as:
- They are running script
- They have messages queued for them
- They have a MessagePort alive somewhere that could potentially send
them a message
- They have 'asynchronous tasks' pending inside them (timers, xhr,
database transactions, etc)

The last point could allow workers to stay alive forever, past the
point when any page using the worker has been closed, which would be
bad. So, the browser should forcibly shoot any worker that is, for any
reason, still running once every page or worker that has ever used it
unloads.

The tricky bit is that for this model to be completely consistent, I
think that workers themselves have to also be considered 'asynchronous
tasks'. This is OK I think, since at the end of the day, pages will be
unloaded and everything will get taken down. But I haven't thought it
through all the way, and I admit it does feel really complex. Better
ideas welcome.

>> - I'm curious as to why MessagePort and WindowWorker do not implement
>> EventTarget. It seems like we may as well reuse it. And at least for
>> WindowWorker, it seems like the same problems of having multiple
>> functions clobber each other that motivated EventTarget would apply.
>
> They do.

They didn't, but you've fixed it now :)

>> - Should import() accept an array of URLs, so that the UA can fetch them
>> in parallel if it has the ability to do that?
>
> We could do that if you like. Is it needed?

With the connection limits being upped in all the browsers, I think
this would be a good thing to have from the beginning.

>> - Would it be too weird to have createWorker overloaded to take an
>> optional name parameter? This would make the behavior similar to
>> window.open(), which either opens a new window or reuses an existing
>> window with the same name.
>
> People seem to dislike overloading in general, but I don't mind. Anyone
> against this?

On Wed, Aug 6, 2008 at 11:53 AM, Chris Prince <cprince at google.com> wrote:
> My current thinking is that the best API design for createWorker() is:
>   MessagePort createWorker(worker_body, [WorkerOptions])
>
> The reason: workers are a powerful concept, and it's very likely we'll
> want to extend them over time.
>
> The 'name' option is just one such case.  Here are a few others:
>
>  - 'language' for non-JS workers (e.g. 'text/python' or 'application/llvm')
>  - 'isContent' to pass a string or Blob instead of a url
>  - 'lifetime' for running beyond the lifetime of a page
>  - etc.
>
> I'd say other options are likely to be just as 'important' as name, so
> I wouldn't special-case that parameter.  A 'WorkerOptions' parameter
> supports naming, but future expansion as well.

FWIW, Chris's suggestion is also fine with me. In general, I like
these options objects since they are easily extensible.

- a
Received on Wednesday, 6 August 2008 19:12:57 UTC