RE: Workers from Justin James on 2008-07-22 (public-html@w3.org from July 2008)

From: Justin James <j_james@mindspring.com>
Date: Tue, 22 Jul 2008 00:34:11 -0400
To: "'Justin James'" <j_james@mindspring.com>, "'Ian Hickson'" <ian@hixie.ch>
Cc: <public-html@w3.org>
Message-ID: <00e901c8ebb4$307d5560$91780020$@com>
Ian -

I think that it is critical that you read this white paper from Microsoft
regarding data: URLs in IE 8:

http://code.msdn.microsoft.com/Release/ProjectReleases.aspx?ProjectName=ie8w
hitepapers&ReleaseId=575

Some highlights:

* Data URLs are not allowed to contain scripts

* There is a 32 KB cap on data: URLs

* Something we both missed in the data: URL RFC is that the only characters
allowed in a data: URL are characters allowed in an URL; this makes putting
useful code in a data: URL impossible.

It is clear to me, after reading this paper, that the idea of data: URLs
being used for HTML to self-contain and/or self-generate worker scripts is
not realistic.

Again, I am not saying that the importing of scripts from an URL should not
be possible, but I am continuing to say that there needs to be a way to
self-store and self-generate code to go into the worker.

J.Ja


> -----Original Message-----
> From: Justin James [mailto:j_james@mindspring.com]
> Sent: Monday, July 21, 2008 11:50 AM
> To: 'Ian Hickson'
> Cc: 'public-html@w3.org'
> Subject: RE: Workers
> 
> > -----Original Message-----
> > From: public-html-request@w3.org [mailto:public-html-request@w3.org]
> On
> > Behalf Of Ian Hickson
> > Sent: Sunday, July 20, 2008 2:29 AM
> > To: Justin James
> > Cc: public-html@w3.org
> > Subject: RE: Workers
> > On Sun, 20 Jul 2008, Justin James wrote:
> > > >
> > > > How would you communicate with such a mechanism?
> > >
> > > I suppose it could take a second argument for a thread-safe
> messaging
> > > object.
> >
> > That's basically what MessagePorts are, and basically how
> > createWorker()
> > works, except that it creates the ports for you as a convenience:
> 
> Yup, I know.
> 
> > > >    var port = createWorker(url);
> > >
> > > Yes, I am sure that if I saw the world from the eyes of the Gears
> > team,
> > > that might seem like the best way to do it. But I'm from a more
> > > traditional background, and frankly, the idea of passing an URL to
> a
> > > script seems incredibly backwards and clumsy. Offhand, I cannot
> > recall
> > > ever seeing a system of any sort where you basically say, "execute
> > the
> > > code located by this reference".
> >
> > It's exactly how the Web works today:
> >
> >    <script src="url"></script>
> 
> QUESTION: How well would a "javascript: URL" work with your approach?
> If the answer is "Great!" (I suspect so), then ignore the next
> paragraph.
> 
> Not the *entire* world, which is my point. There are still tons of
> people in-lining code into the HTML. Heck, look at what ASP.Net spews
> out. Even using data: URLs, their code suddenly gets very, very ugly
> (*uglier* if you ask me) very quickly.
> 
> > > I want to see a *function* for executing the work in a thread (even
> > if
> > > it is a method of the Window object), not a "WindowWorker object"
> > with a
> > > hidden/invisible "Execute" method.
> >
> > The WindowWorker object isn't how you execute code, it's just the
> > global
> > object. Whatever mechanism we use, we have to have a global object.
> 
> Yeah, I later figured this out. This draft is *really* difficult to
> follow, and reading the thread with Andrew, I am not the only one who
> is having a hard time reading it. I can't really put my finger on it,
> but it feels more like the summary of a conversation between a group of
> people who already intimately know the subject and just need to have it
> on paper than an actual spec. I know, that's why it's a draft and not
> the final form. :)
> 
> > As far as I can tell, data: URLs of megabytes in length work fine in
> > all
> > major shipping browsers that support data: URLs. Can you give an
> > example
> > of a major browser that supports data: URLs but doesn't support long
> > enough data: URLs to handle the script you want to handle? (And why
> > would
> > you have that script in text form instead of accessible from a URL?)
> 
> I got burned so many times in the mid-90's by browser URL length
> problems, that I have not tried to exceed 255 characters in an URL
> since then.
> 
> What you are saying though is this: Code that works correctly in one
> browser in one 100% spec-compliant browser may not work correctly in a
> different 100% spec-compliant browser. And that is not an acceptable
> situation.
> 
> The more I write and revise my responses to your message, the more I
> realize that probably 99% of my objections are caused by the lack of a
> proper specification around data: URLs. I have submitted a spec
> proposal via the bug tracker to add it to the HTML spec. :)
> 
> > I respect your opinion, but practical experience from actual Web
> > authors
> > writing code with experimental Workers implementations have more
> > weight. :-)
> 
> I never discounted the Gears team's experience. I'm just saying that
> the world is a lot bigger than their experience. There is a TON of
> existing practical experience showing the value of self-
> storing/modifying/generating code out there. The fact that no one is
> actually doing it with this experimental worker implementation is
> probably related to the fact that like 30 (or less!) people on the
> planet are working with experimental worker implementations, and that
> those use cases are less common, particularly on the Web, than what the
> Gears team is doing.
> 
> Regarding your responses to my example use cases (no need to go through
> them individually)... your responses are all valid. While writing a
> response to each one, I realized that you will probably never see eye-
> to-eye with me on it, because we hail from different backgrounds. For
> me, doing things in a dynamic/functional language way is fairly
> intuitive and natural. It is clear that you approach these things much
> more from the angle of a static language. There is nothing inherently
> better or worse about either viewpoint, either.
> 
> At this point, like I said, the only thing I disagree with the creation
> from URL is the fact that the data: URL spec is so problematic.
> 
> > > One final note on the existing draft: I also find it problematic
> that
> > > the locating and procurement of the script located with the URL
> > > parameter does *not* occur in a separate thread. Considering that
> in
> > a
> > > many cases, the HTTP connect/transmit/close cycle takes a huge
> > portion
> > > (if not the majority) of the total execution time, I think you lose
> > most
> > > of the intended benefit by having that be in the main thread.
> >
> > I think you're misreading the spec. The fetching of the resource
> > happens
> > asynchronously.
> 
> Like Andrew Fedoniouk, I completely missed the part of the spec (right
> about the numbered list) that specified the "separate and parallel
> execution" of those steps. I think that, at the very least, the draft
> should be updated to make this very clear. Would it be possible to make
> the first step in the list, "create a separate thread, and execute the
> rest of the steps in that thread", or somehow otherwise make this
> really obvious? Without including the creation of the thread in the
> steps, it is easy to overlook it at part of the process.
> 
> > Browsers are allowed to throttle the code as much as they like. We
> > can't
> > really do anything else since user agents run on such varied hardware
> > that
> > there's no way to really guarantee particular performance
> > characteristics
> > anyway.
> 
> I agree that different platforms will have different cap/throttle
> levels. But the code authors need to be able to check to see if they
> hit it! Some code may want to treat hitting the throttle as an error
> condition, other may want to simply ignore it. Also, the spec needs to
> clearly enunciate *precisely* the way a throttle is implemented, so at
> the very least, all browsers handle it the same. Does hitting throttle
> cause the call to create the worker to block? Or is the worker created,
> but execution is delayed, which would allow the calling code to
> continue (at the expense of the memory used by a separate object)? Or
> does it throw an error (such as "cannot create thread")? For example:
> 
> for (i = 0; i <= 1000000; i++) {
> arrayOfMessagePorts[i] = createWorker(arrayOfURLs[i]);
> }
> 
> Yes, I know that it is an extreme example (not really, if you want to
> do something to the individual pixels of an image in parallel...), but
> it illustrates the problem well. If the createWorker(URL) method does
> not block, you can easily trash the RAM situation by creating a million
> thread objects like this. From my experience doing just this in other
> languages, I can tell you that it gets ugly. On the other hand, without
> our draft explicitly stating how a browser should perform a throttle,
> the developer has no clue how to write code and take the possibility of
> throttling into account.
> 
> I propose the following changes to make this situation (and other
> pitfalls of parallel execution) far less dangerous to authors:
> 
> * Overloading the createWorker() method is to accept a time span (in
> ticks or milliseconds) as a timeout value.
> 
> * Defining user agent "throttling" as *always* blocking when calling
> the createWorker() method.
> 
> * Make it clear in the draft that just because the environment has
> returned from createWorker() does *not* mean that execution of the
> worker logic has started. Language such as, "creation of a worker
> object does not guarantee that the worker is executing, it only
> guarantees that the worker has been queued for execution" should be
> extremely helpful. This is very, VERY important!
> 
> * Make it clear in the draft that queued workers do not necessarily
> begin execution in the same order that they were created.
> 
> * Make it clear in the draft that the environment does not have to
> devote equal resources to each worker. Therefore, even if Worker A
> started execution before Worker B, and that they should both take the
> same amount of time to execute, Worker B could finish long before
> Worker A. Just because. :)
> 
> What the first two changes accomplish, is the ability of the previous
> example to be modified like so:
> 
> for (i = 0; i <= 1000000; i++) {
> //Assuming timeout is measured in milliseconds
> arrayOfMessagePorts[i] = createWorker(arrayOfURLs[i], 1000);
> if (arrayOfMessagePorts[i] == null) {
> //Throttling is occurring, try waiting a while and re-creating the
> worker!
> //Or just exit, if it is not a big deal.
> }
> }
> 
> This allows applications to graceful recover, slow down, or whatever is
> needed in the case of a throttling scenario, without jamming up the
> whole system.
> 
> Hope this all makes sense, and helps!
> 
> J.Ja
Received on Tuesday, 22 July 2008 04:35:11 UTC