RE: Workers from Justin James on 2008-07-20 (public-html@w3.org from July 2008)

From: Justin James <j_james@mindspring.com>
Date: Sun, 20 Jul 2008 00:36:36 -0400
To: "'Ian Hickson'" <ian@hixie.ch>
Cc: <public-html@w3.org>
Message-ID: <016e01c8ea22$31f0ee40$95d2cac0$@com>
> -----Original Message-----
> From: Ian Hickson [mailto:ian@hixie.ch]
> Sent: Friday, July 18, 2008 5:14 PM
> To: Justin James
> Cc: public-html@w3.org
> Subject: RE: Workers
> 
> On Fri, 18 Jul 2008, Justin James wrote:
> >
> > I think that you would be much happier with just making a async
> version of
> > eval() which "jails" itself:
> >
> > function RemoteScriptExample(scriptURL) {
> >    evalAsync("eval(downloadMethod(scriptURL));");
> > }
> >
> > function LocalScriptExample {
> >    evalAync("//Do stuff asynchronously here");
> > }
> 
> How would you communicate with such a mechanism?

I suppose it could take a second argument for a thread-safe messaging
object.

> In general, the Gears team found that people much prefer just giving a
> URL
> than giving a string, so it would be:
> 
>    evalAsync(url);
> 
> ...which if fundamentally no different than what the spec does:
> 
>    createWorker(url);
> 
> ...except that the spec returns a message port:
> 
>    var port = createWorker(url);

Yes, I am sure that if I saw the world from the eyes of the Gears team, that
might seem like the best way to do it. But I'm from a more traditional
background, and frankly, the idea of passing an URL to a script seems
incredibly backwards and clumsy. Offhand, I cannot recall ever seeing a
system of any sort where you basically say, "execute the code located by
this reference". I've seen a lot of systems where it says, "load code from
this location into memory and make it accessible to me"; that is simply
loading a library of module. But to execute code, sight unseen, and trust
the file system/server/whomever absolutely? No way.

> I don't really mind which spec we put it in. Would you like to propose
> this to the ECMAScript committee? (I don't really have the bandwidth to
> join another group as well.)

Sadly, I do not either; this group has me well beyond my limits as it is
(you can tell by my "batches" of responses on my days). It isn't that
important to me where it lands.

> There's no new object for workers in this proposal, actually, from the
> caller side. In fact, as far as I can tell what the current Workers
> spec
> does and what you propose are identical modulo the method name, as
> shown
> above.

Throughout the draft, it makes references to the "WindowWorker object". I
want to see a *function* for executing the work in a thread (even if it is a
method of the Window object), not a "WindowWorker object" with a
hidden/invisible "Execute" method.

You also brought up the possibility of using the data: URI scheme to
replicate this functionality. Examination of RFC 2397
(http://tools.ietf.org/html/rfc2397) shows that this is a wholly inadequate
approach:

"The "data:" URL scheme is only useful for short values. Note that
some applications that use URLs may impose a length limit; for
example, URLs embedded within <A> anchors in HTML have a length limit
determined by the SGML declaration for HTML [RFC1866]. The LITLEN
(1024) limits the number of characters which can appear in a single
attribute value literal, the ATTSPLEN (2100) limits the sum of all
lengths of all attribute value specifications which appear in a tag,
and the TAGLEN (2100) limits the overall length of a tag."

Thanks in part to a REALLY shoddy spec (it doesn't define "for sure" just
how long the data can be) and in part to uneven implementations of URI/URL
length maximums amongst browsers (last I checked, at least), it is
impossible for a developer to rely upon data: to carry a script reliably.

Therefore, if you want to support any use cases (the ones I list below are
just a sampling of potential use cases) of self-store/self-generating code
(and why not, since ECMAScript is quite clearly geared towards *precisely*
that kind of work!), doing it the way the Gears team suggests is not the
right approach.

> Why is this a major shortcoming? I haven't heard of any use cases for
> why
> this would be necessary.
> 
> Unless this is something that is commonly done -- and the Gears team's
> experience suggests it is not -- it seems like data: URLs are enough
> for
> this.

The Gears team has a limited field of vision, based upon their needs. They
are not every developer. ECMAScript is a dynamic language. One of the most
powerful constructs in a dynamic language (as anyone with a background in
Lisp [yes, I know that it is a functional language], Perl, Ruby, etc. can
tell you) is the eval() statement (or equivalent). The ability for scripts
to self-store and self-generate code to be executed in run-time is insanely
useful in a great many cases; I remember the smile on my face, the first
time that I realized that $template =~ s/<%(.*?)%>/$1/e; created my very own
templating engine in ONE line of code. All throughout ECMAScript, it is
clear that it was always intended to be used like a dynamic language and not
a static language that just happens to be run in a JIT-compiled scenario.
Asynchronous eval() fits the ECMAScript spirit; WindowWorker object feels
too much like the BackgroundWorker object from .Net (bleh).

Here's a few use cases for a proper asynchronous eval():

* Connections to the HTTP server are expensive. Why go back to the server to
request a script when it could have been delivered on the initial document
load to begin with?

* Maybe the script/code to be executed is a result of user input. Why post
that code to a server, only to get it back to execute it (I show below why
the data: URL approach is not a good one)?

* Anyone performing complex calculations (say, image editing; as repellent
as Web-based image editors are to me, a lot of folks are working on them, it
seems like).

* Anyone writing something to be really adaptive to user behavior will be
improved by self-generated code.

* Anyone looking to leverage the idle CPUs of their clients, instead of
burning server CPU cycles with HTTP session initiations.

Basically, if we are going to go along with the wave of Web applications,
let's go whole hog, grab the bull by the horns (a lot of farm metaphors, I
know...), and really make these things CAPABLE, and not just a way of
asynchronously doing the same AJAX-y tasks that the Gears team is focused
on. An asynchronous eval() gets us there. An asynchronous "pull a script
from an URL and execute it" does not.

One final note on the existing draft: I also find it problematic that the
locating and procurement of the script located with the URL parameter does
*not* occur in a separate thread. Considering that in a many cases, the HTTP
connect/transmit/close cycle takes a huge portion (if not the majority) of
the total execution time, I think you lose most of the intended benefit by
having that be in the main thread. Much better, if we decide we must go this
route, is to re-write the logic to go like so:

1) Create object.
2) Populate with URL.
3) Call a hidden/invisible "Execute" method of WindowWorker in a separate
thread.
4) "Execute" is written to say "eval(downloadToString(URL)); closing =
true;" (or something along those lines).

That will give a MUCH better performance boost than calling the URL fetch
before making a new thread.

Also, the spec may need to have a cap/throttle built in, so developers don't
try to do things that forces browser vendors to install a cap/throttle which
then breaks code.

J.Ja
Received on Sunday, 20 July 2008 04:37:30 UTC