[whatwg] Worker feedback from Ian Hickson on 2009-03-28 (public-whatwg-archive@w3.org from March 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Sat, 28 Mar 2009 01:23:57 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0903252223490.25082@hixie.dreamhostps.com>
On Thu, 5 Mar 2009, Jonas Sicking wrote:
> 
> Allowing cookie to be set would unfortunately create a synchronous 
> communication channel between the worker and the main window. This is 
> something that we need to avoid to prevent users from having to deal 
> with locking and other thread related issues.
> 
> For what it's worth, this is a problem that exists with the localStorage 
> API that is also exposed in current workers draft. Something that also 
> needs to be fixed.

I have continued to not include cookies in, and have removed localStorage 
from, the workers API.


> It seems like it should fine to allow reading cookies in dedicated 
> workers though.

Wouldn't this run into the same problems?


On Thu, 19 Mar 2009, Drew Wilson wrote:
>
> The WebWorkers spec states:
> "DedicatedWorkerGlobalScope objects act as if they had an implicit
> MessagePort associated with them"
> 
> MessagePorts will queue up events until the owner either explicitly 
> invokes start() on them, or sets the onmessage attribute. Is the intent 
> that dedicated workers also implement this same functionality for their 
> implicit port (i.e. if I create a dedicated worker, and immediately post 
> a message to it, but the worker doesn't actually set an onmessage 
> handler, should that event be queued until such a time as the worker 
> does set an onmessage handler)?

No (because we don't want to expose the start() method on workers). I've 
made this clear by expliciting saying when the port is opened.


> There's a similar issue with cross-window postMessage(). I've been 
> playing around with the current implementation in Chrome/WebKit, and 
> code like this:
> 
> function newWindow() {
>   var childWin = window.open();
>   childWin.document.location = "http://example.com/child.html";
> 
>   childWin.postMessage("hi new window", "*");
> 
> }
> 
> ...does not result in the message handler in the new window being 
> called, because the window isn't loaded at the time the message is 
> posted (it works just fine after the new window has loaded/executed its 
> script). I'm curious whether this is just a bug in the early 
> implementation, or if this is indeed the expected behavior -- if so, 
> then it makes it difficult to do cross-domain messaging as you have this 
> race condition on startup.

This is the expected behaviour, but there is no race condition; navigation 
happens as part of the event loop, and thus will always happen after the 
script has finished. (Specifically, the "update the session history with 
the new page" algorithm always gets run as a task.)


On Thu, 19 Mar 2009, Dmitry Titov wrote:
> On Tue, Mar 17, 2009 at 11:57 AM, Dmitry Titov <dimich at chromium.org> wrote:
> >
> > I can't find place in the spec which would define the text encoding 
> > used to decode the script of the Web Worker.
> >
> > For example, section 4.3.1, in "runnning a script", step 2, defines 
> > that for the <script> element - the encoding is 'inherited' from 
> > Document and can be overridden by 'charset' attribute or HTTP header.
> >
> > But for Workers, there is no specific instructions. I would assume 
> > (probably incorrectly) that the Web Workers behave like <script> 
> > element in this regard - so the encoding shoul dbe inherited from the 
> > 'parent Document', but this feels contradicting the specific 
> > requirements fro URLs in Workers to be encoded using UTF-8. It also 
> > feels the spirit of the Workers spec is leaning to UTF-8 everywhere :-)
> >
> > So in the absence of HTTP header, what text encoding should be used to 
> > decode the Worker scripts, including nested Workers and 
> > importScripts(...) targets?
>
> FYI: per IRC talk, the answer is the scripts should be using UTF-8 in 
> the absence of explicit override. Spec likely will reflect this more 
> cleanly at some point.

This is now defined here:

   http://www.whatwg.org/specs/web-workers/current-work/#decode-a-script-resource


On Thu, 19 Mar 2009, Anne van Kesteren wrote:
> 
> Can we get away with not following Content-Type at all? As we do for 
> text/event-stream and text/cache-manifest. Both which simply require 
> UTF-8.

text/javascript is defined to take a charset parameter.


On Thu, 19 Mar 2009, Jonas Sicking wrote:
>
> The problem is if we want to allow people to use already existing 
> scripts then they are likely often not in UTF-8.
> 
> Most scripts will probably not work out-of-the box anyway since there is 
> no access to DOM. But purely computational libraries should work.

It does seem reasonable to support non-UTF-8 scripts, given the legacy of 
existing scripts.


On Thu, 19 Mar 2009, Anne van Kesteren wrote:
> 
> Would such libraries have a lot of non-UTF-8 characters? Also, it's not 
> that hard to encode something as UTF-8 these days and the reduced 
> complexity would be a nice benefit.

We already have the complexity for scripts, so I don't think it's a huge 
deal. The spec does the same as for <script src> (except for not having an 
inline explicit override like charset="").


On Fri, 20 Mar 2009, Alexey Proskuryakov wrote:
> 20.03.2009, ? 1:43, Anne van Kesteren ???????(?):
> > 
> > Can we get away with not following Content-Type at all? As we do for 
> > text/event-stream and text/cache-manifest. Both which simply require 
> > UTF-8.
> 
> Good point - but to the contrary, I think that charset from Content-Type 
> should be always honored, adding special cases is an unnecessary 
> complication.
> 
> Formally, a proxy can re-encode any text/* resource and expect the 
> client to honor Content-Type charset over <meta> and built-in 
> preconceptions, although I think that such proxies are extremely rare in 
> this millennium.

I think, given text/css, text/html, and text/xml all have character 
encoding declarations inline, transcoding is not going to work in 
practice. I think the better solution would be to remove the rules that 
make text/* an issue in the standards world (it's not an issue in the 
"real" world).

For new formats, though, I think just supporting UTF-8 is a big win.


On Fri, 20 Mar 2009, Julian Reschke wrote:
> 
> An easy way to avoid this issue is not to use a text/* content type.

I think the abuse of application/* for text types has been widly shown to 
be a mistake. The "issue" doesn't, in practice, exist.


On Fri, 20 Mar 2009, Drew Wilson wrote:
> 
> Good point. Cookie-based auth is not a great use case, because as you 
> point out, you could just do this by passing credentials to the server 
> via an XHR request and have it set your cookies. I guess the motivation 
> for allowing cookies to be set from workers is the same as the 
> motivation for allowing web-page script to set cookies - perhaps this 
> motivation is deprecated now that we have localStorage but even 
> localStorage doesn't seem to have the nice cross-sub-domain sharing that 
> cookies allow.

I figured the use case for setting cookies was to track user preferences 
that the server might need to know about. I'm not sure that makes sense in 
a worker.

Another use case would be keeping track of what has been done so far, for 
this I guess it would make sense to have a localStorage API for shared 
workers (scoped to their name). I haven't added this yet, though.


> > > Gears had an explicit permissions variable applications could check, 
> > > which seems valuable - do we do anything similar elsewhere in HTML5 
> > > that we could use as a model here?
> >
> > HTML5 so far has avoided anything that requires explicit permission 
> > grants, because they are generally a bad idea from a security 
> > perspective (users will grant any permissions the system asks them 
> > for).
> 
> The Database spec has a strong implication that applications can request 
> a larger DB quota, which will result in the user being prompted for 
> permission either immediately, or at the point that the default quota is 
> exceeded. So it's not without precedent, I think. Or maybe I'm just 
> misreading this:
> 
> User agents are expected to use the display name and the estimated 
> database size to optimize the user experience. For example, a user agent 
> could use the estimated size to suggest an initial quota to the user. 
> This allows a site that is aware that it will try to use hundreds of 
> megabytes to declare this upfront, instead of the user agent prompting 
> the user for permission to increase the quota every five megabytes.

There are many ways to expose this, e.g. asynchronously as a drop-down 
infobar, or as a pie chart showing the disk usage that the user can click 
on to increase the allocaton whenever they want, etc.


> To be clear, are you saying that our philosophy is to leave any 
> permissions granting up to the individual user agent (i.e. not described 
> in the spec)? Or that we're trying to avoid specifying functionality 
> that might be invasive enough to require permissions?

Both.


> In fact, I'd go further - I don't think we should even *have* names for
> persistent workers (the use case for having names is "what if I want to run
> the same worker multiple times without having to host multiple scripts",
> which I don't think really applies to persistent workers).

Makes sense.


> Also, one of the things I'd like to experiment with in my implementation 
> is allowing cross-domain access to workers (this is required if you want 
> workers to be able to communicate/share resources across domains, since 
> workers don't have access to any of the cross-domain functionality that 
> window-based script has) - getting rid of the "name" and always having 
> persistent workers identified by their script url helps enable this, and 
> avoids some security issues, such as the ones described in this old 
> Gears proposal I came across: 
> http://code.google.com/p/gears/wiki/CrossOriginAPI

True.


> > > Additionally, there's no good way for workers under different 
> > > domains to talk to one another (a window can use the cross-domain 
> > > messaging functionality to talk to other domains, but there's no 
> > > analog for this for workers).
> >
> > This has been intentionally delayed while we wait for more 
> > implementation experience.
> 
> I'm hoping to experiment with this some (per my earlier comment), so 
> hopefully I'll be able to report back with some interesting data points 
> (or at least my miserable failure will serve as an object lesson for 
> future implementors :).

Great! :-)


On Fri, 20 Mar 2009, Jeremy Orlow wrote:
>
> Now that Web Storage spec has been split out, there are several parts of 
> the Web Worker spec that should no longer point at the HTML 5 spec.  An 
> example is the following section, but I'm sure there are others: 
> http://dev.w3.org/html5/workers/#apis-defined-in-other-specifications).

Fixed.


Regarding the suggestion of an explicit yield() that would push a 
continuation of the script onto the task queue, I haven't added this for 
now. I think it's something to bear in mind for implementors; if it 
becomes practical to do this, we should definitely revisit it.


On Tue, 24 Mar 2009, Robert O'Callahan wrote:
> 
> It's possible that people might want to implement something that's 
> equivalent to the storage mutex in observable behaviour, but allows more 
> parallelism, such as speculative execution or finer-grained locking when 
> the implementation can prove it's safe. I assume implementors of HTML5 
> already understand that that's allowed.

The spec allows implementations to do whatever they want so long as the 
black-box behaviour is equivalent. I hope this is understood by all 
implementors!


> - added navigator.releaseLock().
> 
> This name could be confusing to developers, because there is no 
> corresponding explicit acquireLock(), which there usually is in an API 
> that exposes releaseLock().
> 
> navigator.allowInterruption() maybe?

It doesn't really allow interruptions, but it allows storage changes... 
I've changed it to navigator.getStorageUpdates().


> It would be possible to use something like getLockedFeatures for workers 
> while using implicit locking for the main thread.

The problem with any locks that cross from workers to windows is that 
workers have a tendency to be long-lived, so it would lead to a lot of 
time waiting for locks. I don't think we should make that possible.


> Now, with the storage mutex, are there any cases you know of where 
> serializability fails? If there are, it may be worth noting them in the 
> spec. If there aren't, why not simply write serializability into the 
> spec?

Just writing that something must be true doesn't make it true. :-) I think 
it's safer for us to make the design explicitly enforce this rather than 
say that browser vendors must figure out where it might be broken and 
enforce it themselves.


> When two sets of unrelated browser contexts become related (e.g., C 
> loads A into an iframe), I imagined you would join A's lock and C's lock 
> into a single lock covering the new set of related browser contexts, 
> which is safe to do if at most one of those locks is currently held. 
> When this happens due to a document being created with origin A in C's 
> iframe, it happens asynchronously in C, right? So at that point C's lock 
> is not held by currently running script in C (although it might be held 
> by code in another domain which is already related to C), and we can 
> block the join operation in C until one of the two locks is released.
> 
> Then in your example, suppose C loads A's document first. Then C's lock 
> and A's lock are joined to make a CA-lock. Then suppose D ("another 
> window just like C") loads B's document; D's lock and B's lock are 
> merged to make the DB-lock. Now suppose C loads B. The two remaining 
> locks are merged to form a single CADB-lock. No deadlock is possible.

Yes, if we have one lock per group consisting of the transitive closure of 
all origins and browsing contexts related by origin or by a unit of 
related browsing contexts, then we could make any navigation that would 
cause two groups to be joined to also need to grab the lock of the other 
group before navigating. I'm not going to mention that in the spec, 
though; I'll leave that up to browser vendors to implement as an 
optimisation if they think it's worth it.


> > If it can be shown that it is not ever possible for script in one 
> > origin to synchronously invoke script in another origin, then I guess 
> > we could have per-origin locks instead of a single lock.
> 
> I'm not sure why synchronous invocation across origins matters.

Because it would enable a deadlock to occur. If origin A's lock is owned 
by some window 1, and in window 2, a script in origin B gets its lock and 
then synchronously calls a script in origin A which then tries to get its 
lock, and then the script from origin A in window 1 synchronously calls a 
script in origin B which then tries to get its lock, you'll have deadlock.


> I think what matters here is whether there's a synchronous operation 
> that can cause two browsing contexts to become related that previously 
> weren't.

Setting cookies that are visible cross-domain might be one such 
mechanism, as mentioned by Drew:


On Tue, 24 Mar 2009, Drew Wilson wrote:
> 
> The spec doesn't seem to say this explicitly, but it implies that 
> server-initiated cookie changes that happen in parallel with script 
> execution (for example, if the user agent fetches an image while script 
> is executing) will still be reflected in document.cookie even though the 
> storage mutex may be held. Is that correct (the intent is that only 
> script modifications are locked out, not changes due to network 
> activity)?

I suppose that network activity should also wait for the lock. I've made 
that happen.


> Cookies have a cross-domain aspect (multiple subdomains can share cookie 
> state at the top domain) - does this impact the specification of the 
> storage mutex since we need to lockout multiple domains?

There's only one lock, so that should work fine.


> Finally, are we making any exceptions for things that block the current 
> thread of execution (like displaying alerts() or sync xhr), or are we 
> also guaranteeing that other subdomains will still be locked out?

I suppose it would be bad to lock everyone when an alert is up, so I've 
made all the modal dialog features release the lock.

Sync XHR should also release the lock, and then grab it again to set 
cookies and rerelease it. I've contacted the XHR editor about this.


On Tue, 24 Mar 2009, Drew Wilson wrote:
>
> Is ApplicationCache intended to apply to workers? The application cache 
> API isn't available to workers, but I'm guessing the intent is that if 
> an application creates a dedicated worker then worker requests (like 
> importScripts()) would come out of the cache inherited from the parent 
> document. If not, then it seems impossible to support running workers 
> when in offline mode.

On Wed, 25 Mar 2009, Michael Nordman wrote:
>
> How's this for a starting point for how these things intereract...
> * Dedicated worker contexts should be associated with an appcache according
> to the same resource loading and cache selection logic used for child
> browsing contexts. (So just like navigating an iframe).

On Wed, 25 Mar 2009, Drew Wilson wrote:
> 
> Since dedicated workers are tightly tied (1:1) with a specific top-level 
> browsing context, I'd say that they should use the same appcache as the 
> document that started them.

On Tue, 24 Mar 2009, Drew Wilson wrote:
> 
> Since SharedWorkers are shared by multiple windows, there's some 
> ambiguity about which app cache it should use (perhaps always the one 
> from the creator window?) - it seems like an app might get different 
> SharedWorkers() loading from different app caches depending on the order 
> in which different windows create them, which seems like a dubious 
> outcome.

On Wed, 25 Mar 2009, Michael Nordman wrote:
>
> * Shared (or persistent) worker contexts should be associated with an
> appcache according to the same resource loading and cache selection logic
> used for top-level browsing contexts. (So just like navigating a window.)
> 
> What does a shared (or persistent) worker do when the appcache its
> associated with is updated? Is there a way to "reload" itself with the new
> script in the latest version of the appcache? What about message ports
> between the worker and other contexts?

On Wed, 25 Mar 2009, Drew Wilson wrote:
> 
> That may make sense for Shared workers, I think. For persistent workers I
> think this is a problem - persistent workers need a way to manage their own
> app cache, since they are not guaranteed to have any open windows/documents
> associated with them. My concern about this is that app cache manifests are
> only specified via <manifest> html tags, which makes them only applicable to
> HTML documents (you can't associate a manifest with a worker since there's
> no document to put the manifest tag in).
>
> One could imagine that the worker would reload its javascript via 
> importScripts(). It kind of assumes that the script is idempotent, 
> though.

On Wed, 25 Mar 2009, David Levin wrote:
> 
> Similarly one could use nested workers (which I like because it gives 
> the new script a new global object). The shared/persistent worker would 
> start a nested worker.  Then for a reload, it could shut down the 
> current nested worker and start up a new one.
> 
> Regarding message ports, it would be up to the implementation to decide 
> if the shared/persistent worker followed a pointer to implementation 
> pattern or if it handed out message ports directly to the nested worker.

On Wed, 25 Mar 2009, Drew Wilson wrote:
>
> Good point - I like the idea of nested workers, especially if the
> SharedWorker uses the pattern where it just passes off all incoming message
> ports directly to the nested worker so it doesn't have to proxy messages.
> It'd have to have some app-specific mechanism to get them all back when it
> wants to restart the nested worker, though :)

On Thu, 26 Mar 2009, Alexey Proskuryakov wrote:
>
> Letting faceless background processes update themselves without user 
> consent is not necessarily desirable. I think that they need browser UI 
> for this, and/or associated HTML configuration pages that could (among 
> other duties) trigger application cache update.
> 
> So in my opinion, this is pretty much a sub-task of defining what UI is 
> necessary for persistent workers in the browser, not a question of 
> exposing application cache APIs to them.

On Thu, 26 Mar 2009, Drew Wilson wrote:
> 
> I'd be curious about why you think this is a problem, especially given 
> the existence of importScripts() and XHR which allow workers to load 
> scripts dynamically anyway.
> 
> ApplicationCache for persistent workers would enable them to continue 
> running even when offline - I don't see that it introduces any new 
> security/permission wrinkles, though. If you don't provide something 
> like that, then you'll have workers doing things like using XHR to 
> download script, store it in the data store, then eval() it at load time 
> to roll their own manual offline support.

On Thu, 26 Mar 2009, Alexey Proskuryakov wrote:
> 
> importScripts() will only allow dynamic loading if any URL prefixes are 
> designated as "NETWORK" in the manifest, which security sensitive users 
> may potentially detect and block. The level of support for this in 
> browsers, firewalls, anti-viruses and other software will obviously 
> depend on future usage patterns and threats, but the possibility is 
> there.
> 
> But I was looking at this in terms of a model for users, not any 
> specific security threats - if we think of persistent workers as an 
> equivalent of native applications that need installation, then we should 
> consider that native applications don't usually update themselves 
> without user consent.

On Thu, 26 Mar 2009, Drew Wilson wrote:
> 
> It seems like a common model is for offline-enabled applications to 
> store their javascript in the ApplicationCache, and encourage users to 
> create desktop links to access those apps even when offline. Should 
> these applications (which for all intents are "installed") also prompt 
> users before updating? Are you suggesting that user agents may want to 
> require explicit user permission when any application invokes 
> ApplicationCache.update()? That might be a reasonable approach if a 
> given user agent wants to enforce some kind of "no silent update" 
> policy...

I think it makes sense to treat dedicated workers as simple subresources, 
not separate browsing contexts, and that they should thus just use the 
application cache of their parent browsing contexts. This is what WebKit 
does, according to ap.

I've now done this in the spec.


For shared workers, I see these options:

 - Not allow app caches, so shared workers don't work when offline. That 
   seems bad.

 - Same as suggested for dedicated workers above -- use the creator's 
   cache, so at least one client will get the version they expect. Other 
   clients will have no idea what version they're talking to, the creator 
   would have an unusual relationship with the worker (it would be able 
   to call swapCache() but nobody else would), and once the creator goes 
   away, there will be a zombie relationship.

 - Pick an appcache more or less at random, like when you view an image in 
   a top-level browsing context. Clients will have no idea which version 
   they're talking to.

 - Allow workers to specify a manifest using some sort of comment syntax.
   Nobody knows what version they'll get, but at least it's always the 
   same version, and it's always up to date.

Using the creator's cache is the one that minimises the number of clients 
that are confused, but it also makes the debugging experience most differ 
from the case where there are two apps using the worker.

Using an appcache selected the same way we would pick one for images has 
the minor benefit of being somewhat consistent with how window.open() 
works, and we could say that window.open() and new SharedWorker are 
somewhat similar.

I have picked this route for now. Implementation feedback is welcome in 
determining if this is a good idea.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 27 March 2009 18:23:57 UTC