[whatwg] Worker feedback from Jeremy Orlow on 2009-04-02 (public-whatwg-archive@w3.org from April 2009)

From: Jeremy Orlow <jorlow@google.com>
Date: Thu, 2 Apr 2009 13:00:29 -0700
Message-ID: <5dd9e5c50904021300x13058d08ia75715da9972e004@mail.gmail.com>
On Wed, Apr 1, 2009 at 3:17 PM, Robert O'Callahan <robert at ocallahan.org>wrote:

> On Thu, Apr 2, 2009 at 11:02 AM, Robert O'Callahan <robert at ocallahan.org>wrote:
>
>>  (Note that you can provide hen read-only scripts are easy to optimize for
>> full parallelism using )
>
>
> Oops!
>
> I was going to point out that you can use a reader/writer lock to implement
> serializability while allowing read-only scripts to run in parallel, so if
> the argument is that most scripts are read-only then that means it shouldn't
> be hard to get pretty good parallelism.


The problem is escalating the lock.  If your script does a read and then a
write, and you do this in 2 workers/windows/etc you can get a deadlock
unless you have the ability to roll back one of the two scripts to before
the read which took a shared lock.  If both scripts have an 'alert("hi!");'
then you're totally screwed, though.

There's been a LOT of CS research done on automatically handling the details
of concurrency.  The problem has to become pretty constrained (especially in
terms of stuff you can't roll back, like user input) before you can create
something halfway efficient.


On Wed, Apr 1, 2009 at 3:02 PM, Robert O'Callahan <robert at ocallahan.org>
 wrote:

> On Thu, Apr 2, 2009 at 7:18 AM, Michael Nordman <michaeln at google.com>
>  wrote:
>
>> I suggest that we can come up with a design that makes both of these camps
>> happy and that should be our goal here.
>> To that end... what if...
>>
>> interface Store {
>>   void putItem(string name, string value);
>>
>>   string getItem(string name);
>>   // calling getItem multiple times prior to script completion with the
>> same name is gauranteed to return the same value
>>   // (unless the current script had called putItem, if a different script
>> had called putItem concurrently, the current script wont see that)
>>
>>   void transact(func transactCallback);
>>   // is not guaranteed to execute if the page is unloaded prior to the
>> lock being acquired
>>   // is guaranteed to NOT execute if called from within onunload
>>   // but... really... if you need transactional semantics, maybe you
>> should be using a Database?
>>
>>   attribute int length;
>>   // may only be accessed within a transactCallback, othewise throws an
>> exception
>>
>>   string getItemByIndex(int i);
>>   // // may only be accessed within a transactCallback, othewise throws an
>> exception
>> };
>>
>
>>
>> document.cookie;
>> // has the same safe to read multiple times semantics as store.getItem()
>>
>>
>> So there are no locking semantics (outside of the transact method)... and
>> multiple reads are not error prone.
>>
>> WDYT?
>>
>
> getItem stability is helpful for read-only scripts but no help for
> read-write scripts. For example, outside a transaction, two scripts doing
> putItem('x', getItem('x') + 1) can race and lose an increment.
>

Totally agree that it doesn't quite work yet.

But what if setItem were to watch for unserializable behavior and throw a
transactCallback when it happens?  This solves the silent data corruption
problem, though reproducing the circumstances that'd cause this are
obviously racy.  Of course, reproducing the deadlocks or very slow script
execution behavior is also racy.



> Addressing the larger context ... More than anything else, I'm channeling
> my experiences at IBM Research writing race detection tools for Java
> programs ( http://portal.acm.org/citation.cfm?id=781528 and others), and
> what I learned there about programmers with a range of skill levels
> grappling with shared memory (or in our case, shared storage) concurrency. I
> passionately, violently believe that Web programmers cannot and should not
> have to deal with it. It's simply a matter of implementing what programmers
> expect: that by default, a chunk of sequential code will do what it says
> without (occasional, random) interference from outside.
>

I definitely see pro's and cons to providing a single threaded version of
the world to all developers (both advanced and beginner), but this really
isn't what we should be debating right now.

What we should be debating is whether advanced, cross-event-loop APIs should
be kept simple enough that any beginner web developer can use it (at the
expense of performance and simplicity within the browser) or if we should be
finding a compromise that can be kept fast, simple (causing less bugs!), and
somewhat harder to program for.

If someone wants to cross the event loop (except in the document.cookie
case, which is a pretty special one), they should have to deal with more
complexity in some form.  Personally, I'd like to see a solution that does
not involve locks of any sort (software transactional memory?).



> I realize that this creates major implementation difficulties for parallel
> browsers, which I believe will be all browsers. "Evil', "troubling" and
> "onerous" are perhaps understatements... But it will be far better in the
> long run to put those burdens on browser developers than to kick them
> upstairs to Web developers. If it turns out that there is a compelling
> performance boost that can *only* be achieved by relaxing serializability,
> then I could be convinced ... but we are very far from proving that.
>

Like I said, a LOT of research has been done on concurrency.  Basically, if
you're not really careful about how you construct your language and the
abstractions you have for concurrency, you can really easily back yourself
into a corner that you semantically can't get out of (no matter how good of
a programmer you are).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.whatwg.org/pipermail/whatwg-whatwg.org/attachments/20090402/19e8af30/attachment.htm>
Received on Thursday, 2 April 2009 13:00:29 UTC