- From: Justin James <j_james@mindspring.com>
- Date: Mon, 21 Jul 2008 11:50:24 -0400
- To: "'Ian Hickson'" <ian@hixie.ch>
- Cc: <public-html@w3.org>
> -----Original Message----- > From: public-html-request@w3.org [mailto:public-html-request@w3.org] On > Behalf Of Ian Hickson > Sent: Sunday, July 20, 2008 2:29 AM > To: Justin James > Cc: public-html@w3.org > Subject: RE: Workers > On Sun, 20 Jul 2008, Justin James wrote: > > > > > > How would you communicate with such a mechanism? > > > > I suppose it could take a second argument for a thread-safe messaging > > object. > > That's basically what MessagePorts are, and basically how > createWorker() > works, except that it creates the ports for you as a convenience: Yup, I know. > > > var port = createWorker(url); > > > > Yes, I am sure that if I saw the world from the eyes of the Gears > team, > > that might seem like the best way to do it. But I'm from a more > > traditional background, and frankly, the idea of passing an URL to a > > script seems incredibly backwards and clumsy. Offhand, I cannot > recall > > ever seeing a system of any sort where you basically say, "execute > the > > code located by this reference". > > It's exactly how the Web works today: > > <script src="url"></script> QUESTION: How well would a "javascript: URL" work with your approach? If the answer is "Great!" (I suspect so), then ignore the next paragraph. Not the *entire* world, which is my point. There are still tons of people in-lining code into the HTML. Heck, look at what ASP.Net spews out. Even using data: URLs, their code suddenly gets very, very ugly (*uglier* if you ask me) very quickly. > > I want to see a *function* for executing the work in a thread (even > if > > it is a method of the Window object), not a "WindowWorker object" > with a > > hidden/invisible "Execute" method. > > The WindowWorker object isn't how you execute code, it's just the > global > object. Whatever mechanism we use, we have to have a global object. Yeah, I later figured this out. This draft is *really* difficult to follow, and reading the thread with Andrew, I am not the only one who is having a hard time reading it. I can't really put my finger on it, but it feels more like the summary of a conversation between a group of people who already intimately know the subject and just need to have it on paper than an actual spec. I know, that's why it's a draft and not the final form. :) > As far as I can tell, data: URLs of megabytes in length work fine in > all > major shipping browsers that support data: URLs. Can you give an > example > of a major browser that supports data: URLs but doesn't support long > enough data: URLs to handle the script you want to handle? (And why > would > you have that script in text form instead of accessible from a URL?) I got burned so many times in the mid-90's by browser URL length problems, that I have not tried to exceed 255 characters in an URL since then. What you are saying though is this: Code that works correctly in one browser in one 100% spec-compliant browser may not work correctly in a different 100% spec-compliant browser. And that is not an acceptable situation. The more I write and revise my responses to your message, the more I realize that probably 99% of my objections are caused by the lack of a proper specification around data: URLs. I have submitted a spec proposal via the bug tracker to add it to the HTML spec. :) > I respect your opinion, but practical experience from actual Web > authors > writing code with experimental Workers implementations have more > weight. :-) I never discounted the Gears team's experience. I'm just saying that the world is a lot bigger than their experience. There is a TON of existing practical experience showing the value of self-storing/modifying/generating code out there. The fact that no one is actually doing it with this experimental worker implementation is probably related to the fact that like 30 (or less!) people on the planet are working with experimental worker implementations, and that those use cases are less common, particularly on the Web, than what the Gears team is doing. Regarding your responses to my example use cases (no need to go through them individually)... your responses are all valid. While writing a response to each one, I realized that you will probably never see eye-to-eye with me on it, because we hail from different backgrounds. For me, doing things in a dynamic/functional language way is fairly intuitive and natural. It is clear that you approach these things much more from the angle of a static language. There is nothing inherently better or worse about either viewpoint, either. At this point, like I said, the only thing I disagree with the creation from URL is the fact that the data: URL spec is so problematic. > > One final note on the existing draft: I also find it problematic that > > the locating and procurement of the script located with the URL > > parameter does *not* occur in a separate thread. Considering that in > a > > many cases, the HTTP connect/transmit/close cycle takes a huge > portion > > (if not the majority) of the total execution time, I think you lose > most > > of the intended benefit by having that be in the main thread. > > I think you're misreading the spec. The fetching of the resource > happens > asynchronously. Like Andrew Fedoniouk, I completely missed the part of the spec (right about the numbered list) that specified the "separate and parallel execution" of those steps. I think that, at the very least, the draft should be updated to make this very clear. Would it be possible to make the first step in the list, "create a separate thread, and execute the rest of the steps in that thread", or somehow otherwise make this really obvious? Without including the creation of the thread in the steps, it is easy to overlook it at part of the process. > Browsers are allowed to throttle the code as much as they like. We > can't > really do anything else since user agents run on such varied hardware > that > there's no way to really guarantee particular performance > characteristics > anyway. I agree that different platforms will have different cap/throttle levels. But the code authors need to be able to check to see if they hit it! Some code may want to treat hitting the throttle as an error condition, other may want to simply ignore it. Also, the spec needs to clearly enunciate *precisely* the way a throttle is implemented, so at the very least, all browsers handle it the same. Does hitting throttle cause the call to create the worker to block? Or is the worker created, but execution is delayed, which would allow the calling code to continue (at the expense of the memory used by a separate object)? Or does it throw an error (such as "cannot create thread")? For example: for (i = 0; i <= 1000000; i++) { arrayOfMessagePorts[i] = createWorker(arrayOfURLs[i]); } Yes, I know that it is an extreme example (not really, if you want to do something to the individual pixels of an image in parallel...), but it illustrates the problem well. If the createWorker(URL) method does not block, you can easily trash the RAM situation by creating a million thread objects like this. From my experience doing just this in other languages, I can tell you that it gets ugly. On the other hand, without our draft explicitly stating how a browser should perform a throttle, the developer has no clue how to write code and take the possibility of throttling into account. I propose the following changes to make this situation (and other pitfalls of parallel execution) far less dangerous to authors: * Overloading the createWorker() method is to accept a time span (in ticks or milliseconds) as a timeout value. * Defining user agent "throttling" as *always* blocking when calling the createWorker() method. * Make it clear in the draft that just because the environment has returned from createWorker() does *not* mean that execution of the worker logic has started. Language such as, "creation of a worker object does not guarantee that the worker is executing, it only guarantees that the worker has been queued for execution" should be extremely helpful. This is very, VERY important! * Make it clear in the draft that queued workers do not necessarily begin execution in the same order that they were created. * Make it clear in the draft that the environment does not have to devote equal resources to each worker. Therefore, even if Worker A started execution before Worker B, and that they should both take the same amount of time to execute, Worker B could finish long before Worker A. Just because. :) What the first two changes accomplish, is the ability of the previous example to be modified like so: for (i = 0; i <= 1000000; i++) { //Assuming timeout is measured in milliseconds arrayOfMessagePorts[i] = createWorker(arrayOfURLs[i], 1000); if (arrayOfMessagePorts[i] == null) { //Throttling is occurring, try waiting a while and re-creating the worker! //Or just exit, if it is not a big deal. } } This allows applications to graceful recover, slow down, or whatever is needed in the case of a throttling scenario, without jamming up the whole system. Hope this all makes sense, and helps! J.Ja
Received on Monday, 21 July 2008 15:51:20 UTC