- From: Justin James <j_james@mindspring.com>
- Date: Tue, 22 Jul 2008 00:34:11 -0400
- To: "'Justin James'" <j_james@mindspring.com>, "'Ian Hickson'" <ian@hixie.ch>
- Cc: <public-html@w3.org>
Ian - I think that it is critical that you read this white paper from Microsoft regarding data: URLs in IE 8: http://code.msdn.microsoft.com/Release/ProjectReleases.aspx?ProjectName=ie8w hitepapers&ReleaseId=575 Some highlights: * Data URLs are not allowed to contain scripts * There is a 32 KB cap on data: URLs * Something we both missed in the data: URL RFC is that the only characters allowed in a data: URL are characters allowed in an URL; this makes putting useful code in a data: URL impossible. It is clear to me, after reading this paper, that the idea of data: URLs being used for HTML to self-contain and/or self-generate worker scripts is not realistic. Again, I am not saying that the importing of scripts from an URL should not be possible, but I am continuing to say that there needs to be a way to self-store and self-generate code to go into the worker. J.Ja > -----Original Message----- > From: Justin James [mailto:j_james@mindspring.com] > Sent: Monday, July 21, 2008 11:50 AM > To: 'Ian Hickson' > Cc: 'public-html@w3.org' > Subject: RE: Workers > > > -----Original Message----- > > From: public-html-request@w3.org [mailto:public-html-request@w3.org] > On > > Behalf Of Ian Hickson > > Sent: Sunday, July 20, 2008 2:29 AM > > To: Justin James > > Cc: public-html@w3.org > > Subject: RE: Workers > > On Sun, 20 Jul 2008, Justin James wrote: > > > > > > > > How would you communicate with such a mechanism? > > > > > > I suppose it could take a second argument for a thread-safe > messaging > > > object. > > > > That's basically what MessagePorts are, and basically how > > createWorker() > > works, except that it creates the ports for you as a convenience: > > Yup, I know. > > > > > var port = createWorker(url); > > > > > > Yes, I am sure that if I saw the world from the eyes of the Gears > > team, > > > that might seem like the best way to do it. But I'm from a more > > > traditional background, and frankly, the idea of passing an URL to > a > > > script seems incredibly backwards and clumsy. Offhand, I cannot > > recall > > > ever seeing a system of any sort where you basically say, "execute > > the > > > code located by this reference". > > > > It's exactly how the Web works today: > > > > <script src="url"></script> > > QUESTION: How well would a "javascript: URL" work with your approach? > If the answer is "Great!" (I suspect so), then ignore the next > paragraph. > > Not the *entire* world, which is my point. There are still tons of > people in-lining code into the HTML. Heck, look at what ASP.Net spews > out. Even using data: URLs, their code suddenly gets very, very ugly > (*uglier* if you ask me) very quickly. > > > > I want to see a *function* for executing the work in a thread (even > > if > > > it is a method of the Window object), not a "WindowWorker object" > > with a > > > hidden/invisible "Execute" method. > > > > The WindowWorker object isn't how you execute code, it's just the > > global > > object. Whatever mechanism we use, we have to have a global object. > > Yeah, I later figured this out. This draft is *really* difficult to > follow, and reading the thread with Andrew, I am not the only one who > is having a hard time reading it. I can't really put my finger on it, > but it feels more like the summary of a conversation between a group of > people who already intimately know the subject and just need to have it > on paper than an actual spec. I know, that's why it's a draft and not > the final form. :) > > > As far as I can tell, data: URLs of megabytes in length work fine in > > all > > major shipping browsers that support data: URLs. Can you give an > > example > > of a major browser that supports data: URLs but doesn't support long > > enough data: URLs to handle the script you want to handle? (And why > > would > > you have that script in text form instead of accessible from a URL?) > > I got burned so many times in the mid-90's by browser URL length > problems, that I have not tried to exceed 255 characters in an URL > since then. > > What you are saying though is this: Code that works correctly in one > browser in one 100% spec-compliant browser may not work correctly in a > different 100% spec-compliant browser. And that is not an acceptable > situation. > > The more I write and revise my responses to your message, the more I > realize that probably 99% of my objections are caused by the lack of a > proper specification around data: URLs. I have submitted a spec > proposal via the bug tracker to add it to the HTML spec. :) > > > I respect your opinion, but practical experience from actual Web > > authors > > writing code with experimental Workers implementations have more > > weight. :-) > > I never discounted the Gears team's experience. I'm just saying that > the world is a lot bigger than their experience. There is a TON of > existing practical experience showing the value of self- > storing/modifying/generating code out there. The fact that no one is > actually doing it with this experimental worker implementation is > probably related to the fact that like 30 (or less!) people on the > planet are working with experimental worker implementations, and that > those use cases are less common, particularly on the Web, than what the > Gears team is doing. > > Regarding your responses to my example use cases (no need to go through > them individually)... your responses are all valid. While writing a > response to each one, I realized that you will probably never see eye- > to-eye with me on it, because we hail from different backgrounds. For > me, doing things in a dynamic/functional language way is fairly > intuitive and natural. It is clear that you approach these things much > more from the angle of a static language. There is nothing inherently > better or worse about either viewpoint, either. > > At this point, like I said, the only thing I disagree with the creation > from URL is the fact that the data: URL spec is so problematic. > > > > One final note on the existing draft: I also find it problematic > that > > > the locating and procurement of the script located with the URL > > > parameter does *not* occur in a separate thread. Considering that > in > > a > > > many cases, the HTTP connect/transmit/close cycle takes a huge > > portion > > > (if not the majority) of the total execution time, I think you lose > > most > > > of the intended benefit by having that be in the main thread. > > > > I think you're misreading the spec. The fetching of the resource > > happens > > asynchronously. > > Like Andrew Fedoniouk, I completely missed the part of the spec (right > about the numbered list) that specified the "separate and parallel > execution" of those steps. I think that, at the very least, the draft > should be updated to make this very clear. Would it be possible to make > the first step in the list, "create a separate thread, and execute the > rest of the steps in that thread", or somehow otherwise make this > really obvious? Without including the creation of the thread in the > steps, it is easy to overlook it at part of the process. > > > Browsers are allowed to throttle the code as much as they like. We > > can't > > really do anything else since user agents run on such varied hardware > > that > > there's no way to really guarantee particular performance > > characteristics > > anyway. > > I agree that different platforms will have different cap/throttle > levels. But the code authors need to be able to check to see if they > hit it! Some code may want to treat hitting the throttle as an error > condition, other may want to simply ignore it. Also, the spec needs to > clearly enunciate *precisely* the way a throttle is implemented, so at > the very least, all browsers handle it the same. Does hitting throttle > cause the call to create the worker to block? Or is the worker created, > but execution is delayed, which would allow the calling code to > continue (at the expense of the memory used by a separate object)? Or > does it throw an error (such as "cannot create thread")? For example: > > for (i = 0; i <= 1000000; i++) { > arrayOfMessagePorts[i] = createWorker(arrayOfURLs[i]); > } > > Yes, I know that it is an extreme example (not really, if you want to > do something to the individual pixels of an image in parallel...), but > it illustrates the problem well. If the createWorker(URL) method does > not block, you can easily trash the RAM situation by creating a million > thread objects like this. From my experience doing just this in other > languages, I can tell you that it gets ugly. On the other hand, without > our draft explicitly stating how a browser should perform a throttle, > the developer has no clue how to write code and take the possibility of > throttling into account. > > I propose the following changes to make this situation (and other > pitfalls of parallel execution) far less dangerous to authors: > > * Overloading the createWorker() method is to accept a time span (in > ticks or milliseconds) as a timeout value. > > * Defining user agent "throttling" as *always* blocking when calling > the createWorker() method. > > * Make it clear in the draft that just because the environment has > returned from createWorker() does *not* mean that execution of the > worker logic has started. Language such as, "creation of a worker > object does not guarantee that the worker is executing, it only > guarantees that the worker has been queued for execution" should be > extremely helpful. This is very, VERY important! > > * Make it clear in the draft that queued workers do not necessarily > begin execution in the same order that they were created. > > * Make it clear in the draft that the environment does not have to > devote equal resources to each worker. Therefore, even if Worker A > started execution before Worker B, and that they should both take the > same amount of time to execute, Worker B could finish long before > Worker A. Just because. :) > > What the first two changes accomplish, is the ability of the previous > example to be modified like so: > > for (i = 0; i <= 1000000; i++) { > //Assuming timeout is measured in milliseconds > arrayOfMessagePorts[i] = createWorker(arrayOfURLs[i], 1000); > if (arrayOfMessagePorts[i] == null) { > //Throttling is occurring, try waiting a while and re-creating the > worker! > //Or just exit, if it is not a big deal. > } > } > > This allows applications to graceful recover, slow down, or whatever is > needed in the case of a throttling scenario, without jamming up the > whole system. > > Hope this all makes sense, and helps! > > J.Ja
Received on Tuesday, 22 July 2008 04:35:11 UTC