RE: Workers from Ian Hickson on 2008-08-27 (public-html@w3.org from August 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 27 Aug 2008 10:18:19 +0000 (UTC)
To: Justin James <j_james@mindspring.com>
Cc: public-html@w3.org
Message-ID: <Pine.LNX.4.62.0808270950350.14795@hixie.dreamhostps.com>
On Sun, 10 Aug 2008, Justin James wrote:
> > 
> > I was going to add a note, but then I noticed that it actually already 
> > says it twice -- once in the "create a worker" algorithm, and once in 
> > the "run a worker" algorithm. What should I add to make it clearer?
> 
> I think that making it step #1 in the enumerated list would do the 
> trick. The last time I looked at it, I realized that the reason that I 
> kept missing it, is because I was looking at the list to see what was 
> happening, but it is in the paragraph before the list. Since it *is* a 
> step in creating the working, I think that adding it to the list would 
> be reasonable.

Done.


> > > I agree that different platforms will have different cap/throttle 
> > > levels. But the code authors need to be able to check to see if they 
> > > hit it!
> > 
> > Why?
> 
> Because it is *very* common to take an "alternate" route if a thread 
> will not run immediately. Some use cases:
> 
> * For a critical task, if I've hit the limit, I may choose to *not* 
> create a separate thread, and instead choose to run it in the primary 
> thread:
>
> if (Window.WorkerLimitMet) {
>    eval(GetURL(url));
> } else {
>    createWorker(url);
> }

I don't really buy that example (you'll hit network limits long before CPU 
limits for I/O tasks), and I can't really think of any realistic ones, so 
I'm not convinced of this use case.


> * For a time-sensitive, but unimportant task (say, putting up graphic 
> "please wait" in response to user input that will only be on the screen 
> for a second or so), it is better to just bypass the logic altogether 
> than to wait on it:
>
> if (!Window.WorkerLimitMet) {
>    createWorker(url);
> }

You'd never use a worker for UI-related tasks, since the workers can't get 
to the UI. What realistic cases would there be for worker-level tasks that 
are unimportant enough that you could just not do them?


> * Some applications may very well wish to limit or restrict user input 
> until the queue can accept more work. For example:
>
> while (Window.WorkerLimitMet) {
>    Form1.SubmitButton.Enabled = false;
>    sleep(100);
> }

Users are quite capable of noticing when their computer is under load, I 
don't think it makes sense to artificially limit how much work the 
computer can do like this.


> If we can't dictate how many workers may run at once due to platform 
> limits, then developers need to know when they are at those limits.

We don't provide a way for applications to know when they hit other 
limits, and I don't really see this as special.


> Doing something onMouseOver() is a good example. If someone is wildly 
> waving their mouse, better to start dropping it than to queue up 
> workers. Think about this kind of code for a moment:
> 
> onMouseOver = "createWorker(urlToScript)"
> 
> user starts waving their mouse wildly...

I can't see _any_ valid reason to _ever_ create a worker from mouse 
movements. What possible use case could that have? Just create one worker 
and queue work up with it.


> > It could also create a worker, but run it slowly.
> 
> It *could*, but that would be supremely dumb behavior; each thread takes 
> up space in memory, regardless of whether or not it is running.

Workers aren't _that_ expensive. If a worker is using 100% CPU on a core, 
you'll run out of cores long before you run out of memory. Running workers 
slowly (sharing cores) seems much more reasonable than not running them at 
all.


> > I don't know how we would even go about testing such requirements.
> 
> That's why I suggest we define what a throttling mechanism is allowed to 
> do, and what it is not allowed to do, and provide a mechanism for 
> detecting throttle and an overload of createWorker() that accepts a 
> timeout value. There is a reason why implementations are various "thread 
> pool" type objects provide this functionality, and it isn't for the sake 
> of needed extra documentation. :)

This may be something we'll have to add in future, but for now I really 
don't see this as something critical enough for the first version.


> > > For example:
> > >
> > > for (i = 0; i <= 1000000; i++) {
> > > arrayOfMessagePorts[i] = createWorker(arrayOfURLs[i]);
> > > }
> > >
> > > Yes, I know that it is an extreme example (not really, if you want 
> > > to do something to the individual pixels of an image in 
> > > parallel...), but it illustrates the problem well.
> > 
> > How is this different to, say:
> > 
> >    for (i = 0; i <= 1000000; i++) {
> >      arrayOfDocuments[i] = document.implementation.createdocument(null,
> > null, null);
> >    }
> > 
> > ...?
> 
> It's the same at a technical level, but quite different from a 
> programmer's viewpoint. A programmer, writing what you wrote, has the 
> expectation that they are creating 1,000,000 objects, and knows it 
> before the code even runs, and can make the decision to do it based on 
> that information up front. A programmer writing what I wrote does not 
> know in advance how many objects they are creating (they know that 
> eventually 1,000,000 object will have been created, but has no idea how 
> many will be in scope at any given time), and depending on the UA, it 
> may or may not run well. So it's a matter of perception, not technical.

I don't buy that. If you are firing 1000000 workers back to back, you 
don't expect them to complete quickly enough that you only have 10 or so 
active at a time. The whole point of workers is you use them for long 
computation, if they could return so quickly, then using workers is just 
adding unnecessary overhead.


> I'm stating that the spec needs to explicitly state that this is 
> *undefined* and up to the UA.

It already does:

# User agents may impose implementation-specific limits on otherwise 
# unconstrained inputs, e.g. to prevent denial of service attacks, to 
# guard against running out of memory, or to work around 
# platform-specific limitations.
 -- http://www.whatwg.org/specs/web-workers/current-work/#conformance


> > This seems unlikely. All use cases I can think of for running many 
> > scripts will all be running the same one (or few) scripts, not many 
> > many different ones.
> 
> Since as far as I can tell, the only way to pass parameters to these 
> scripts is via the URL itself, I think that you are missing out.

You can pass parameters using postMessage().


> Let's say you want to do some image processing, so you're going through 
> the pixels of an image:
> 
> var sBaseURL = 'http://www.domain.com/scripts/pixelprocess.aspx?colorCode=';
> 
> for (x = 0; x < image.width; x++) {
>    for (y = 0; y < image.height; y++) {
>       messagePorts[x, y] = createWorker(sBaseURL + image.pixels[x,
> y].color);
>    }
> }

Good lord, don't do that.

Just shard the image into a few pieces and postMessage() the data from 
each shard to a worker. Creating one worker per pixel of an image is 
completely ridiculous.


> > This again is just a limitation of IE's implementation. (Though one 
> > has to wonder, why would you generate a URL of more than 32KB? 
> > Wouldn't it make more sense to generate the part that changes, and 
> > then fetch the rest as part of an importScripts() call?)
> 
> You wouldn't want to generate an *URL* of more than 32 KB, but you quite 
> often have a *script* of more than 32 KB!

You wouldn't have 32KB of script that changes each time. You'd just have a 
small bit of code changing each time, and the rest could be imported, and 
not part of the URL.


> I'm finding that an absolutely huge hole in this implementation is in 
> passing initial parameters. The only way I am seeing to pass parameters 
> in, is with the message port system. The example in the current draft 
> involves a whopping TEN (10), yes, TEN (10) lines of code in order to 
> extract TWO (2) parameters as initial input. That is simply 
> unacceptable.

This will be solved when we allow structured data passing later.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 27 August 2008 10:18:34 UTC