[whatwg] Limit on number of parallel Workers.

I include below, for the record, a set of e-mails on the topic of settings 
limits on Workers to avoid DOS attacks.

As with other such topics, the HTML5 spec allows more or less any 
arbitrary behaviour in the face of hardware limitations. There are a 
variety of different implementations strategies, and these will vary 
based on the target hardware. How to handle a million new workers will be 
different on a system with a million cores and little memory than a system 
with one core but terabytes of memory, or a system with 100 slow cores vs 
a system with 10 fast cores.

I have therefore not added any text to the spec on the matter. Please let 
me know if you think there should really be something in the spec on this.


On Tue, 9 Jun 2009, Dmitry Titov wrote:
> 
> In Chromium, workers are going to have their separate processes, at 
> least for now. So we quickly found that "while(true) foo = new 
> Worker(...)" quickly consumes the OS resources :-) In fact, this will 
> kill other browsers too, and on some systems the unbounded number of 
> threads will effectively "freeze" the system beyond the browser.
> 
> We think about how to reasonably place limits on the resources consumed 
> by 'sea of workers'. Obviously, one could just limit a maxumum number of 
> parallel workers available to page or domain or browser. But what do you 
> do when a limit is reached? The Worker() constructor could return null 
> or throw exception. However, that seems to go against the spirit of the 
> spec since it usually does not deal with resource constraints. So it 
> makes sense to look for the most sensible implementation that tries best 
> to behave.
> 
> Current idea is to let create as many Worker objects as requested, but 
> not necessarily start them right away. So the resources are not 
> allocated except the thin JS wrapper. As long as workers terminate and 
> the number of them decreases below the limit, more workers from the 
> "ready queue" could be started. This allows to support implementation 
> limits w/o exposing them.
> 
> This is similar to how a 'sea of XHRs' would behave. The test page 
> here<http://www.figushki.com/test/xhr/xhr10000.html> creates 10,000 
> async XHR requests to distinct URLs and then waits for all of them to 
> complete. While it's obviosuly impossible to have 10K http connections 
> in parallel, all XHRs will be completed, given time.
> 
> Does it sound like a good way to avoid the resource crunch due to high 
> number of workers?

On Tue, 9 Jun 2009, Oliver Hunt wrote:
>
> I believe that this will be difficult to have such a limit as sites may 
> rely on GC to collect Workers that are no longer running (so number of 
> running threads is non-deterministic), and in the context of mix source 
> content ("mash-ups") it will be difficult for any content source to be 
> sure it isn't going to contribute to that limit.  Obviously a UA 
> shouldn't crash, but i believe that it is up to the UA to determine how 
> to achieve this -- eg. having a limit to allow a 1:1 relationship 
> between workers and processes will have a much lower limit than an 
> implementation that has a worker per thread model, or an m:n 
> relationship between workers and threads/processes.  Having the 
> specification limited simply because one implementation mechanism has 
> certain limits when there are many alternative implementation models 
> seems like a bad idea.
> 
> I believe if there's going to be any worker related limits, it should 
> realistically be a lower limit on the number of workers rather than an 
> upper.

On Tue, 9 Jun 2009, Jonas Sicking wrote:
> 
> This is the solution that Firefox 3.5 uses. We use a pool of relatively 
> few OS threads (5 or so iirc). This pool is then scheduled to run worker 
> tasks as they are scheduled. So for example if you create 1000 worker 
> objects, those 5 threads will take turns to execute the initial scripts 
> one at a time. If you then send a message using postMessage to 500 of 
> those workers, and the other 500 calls setTimeout in their initial 
> script, the same threads will take turns to run those 1000 tasks (500 
> message events, and 500 timer callbacks).
> 
> This is somewhat simplified, and things are a little more complicated 
> due to how we handle synchronous network loads (during which we freeze 
> and OS thread and remove it from the pool), but the above is the basic 
> idea.

On Tue, 9 Jun 2009, Michael Nordman wrote:
> 
> Thats a really good model. Scalable and degrades nicely. The only 
> problem is with very long running operations where a worker script 
> doesn't return in a timely fashion. If enough of them do that, all 
> others starve. What does FF do about that, or in practice do you 
> anticipate that not being an issue?
> 
> Webkit dedicates an OS thread per worker. Chrome goes even further (for 
> now at least) with a process per worker. The 1:1 mapping is probably 
> overkill as most workers will probably spend most of their life asleep 
> just waiting for a message.

On Thu, 11 Jun 2009, Robert O'Callahan wrote:
> 
> You probably still want a global limit, or else malicious sites can DoS 
> your entire OS by spawning workers in many synthetic domains. Making the 
> limit per-eTLD instead of per-domain would help a bit, but maybe not 
> very much. Same goes for other kinds of resources; there's no really 
> perfect solution to DoS attacks against browsers, AFAICT.

On Wed, 10 Jun 2009, John Abd-El-Malek wrote:
>
> The current thinking would be a smaller limit per page (i.e. includes 
> all iframes and external scripts), say around 16 workers.  Then a global 
> limit for all loaded pages, say around 64 or 128.  The benefit of two 
> limits is to reduce the chance of pages behaving differently depending 
> on what other sites are currently loaded.
>
> We plan on increasing these limits by a fair amount once we are able to 
> run multiple JS threads in a process.  It's just that even when we do 
> that, we'll still want to have some limits, and we wanted to use the 
> same approach now.

On Wed, 10 Jun 2009, Jonas Sicking wrote:
> 
> We do see it as a problem, but not big enough of a problem that we
> needed to solve it in the initial version.
> 
> It's not really a problem for most types of calculations, as long as
> the number of threads is larger than the number of cores we'll still
> finish all tasks as quickly as the CPU is able to. Even for long
> running operations, if it's operations that the user wants anyway, it
> doesn't really matter if the jobs are running all in parallel, or
> staggered after each other. As long as you're keeping all CPU cores
> busy.
> 
> There are some scenarios which it doesn't work so well for. For
> example a worker that works more or less infinitely and produces more
> and more accurate results the longer it runs. Or something like a
> folding at home website which performs calculations as long as the user
> is on a website and submits them to the server.
> 
> If enough of those workers are scheduled it will block everything else.
> 
> This is all solveable of course, there's a lot of tweaking we can do. 
> But we figured we wanted to get some data on how people use workers 
> before spending too much time developing a perfect scheduling solution.

On Wed, 10 Jun 2009, Michael Nordman wrote:
> 
> I never did like the Gears model (1:1 mapping with a thread). We were 
> stuck with a strong thread affinity due to other constraints (script 
> engines, COM/XPCOM). But we could have allowed multiple workers to 
> reside in a single thread. A thread pool (perhaps per origin) sort of 
> arrangement, where once a worker was put on a particular thread it 
> stayed there until end-of-life.
> 
> Your FF model has more flexibility. Give a worker a slice (well where 
> slice == run-to-completion) on any thread in the pool, no thread 
> affinity whatsoever (if i understand correctly).

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 7 July 2009 16:59:48 UTC