- From: David Bruant <bruant@enseirb-matmeca.fr>
- Date: Mon, 07 Dec 2009 16:38:03 -0800
> The reason WebWorkers don't have access to the DOM is concurrency. For > example, to loop through a list of children I need to first read the > number of childrens, then have a for loop which starts at 0 and ends > at length-1. If you have two threads that can access the DOM > concurrently, then one could change the number of children while the > other was looping through the list, which would cause bugs in the > program. The only way to fix this is to make the DOM a monitor or > introduce semaphores, but then you would have to change the way the > DOM is accessed in HTML5, breaking backwards compatibility, which is > not a good idea. > > A better solution to your problem is to load fragments of the entire > document using AJAX and then insert those fragments into the main > document, when they are needed. You rarely need to see the entire > document at once anyways. > > Marius Gundersen >> One good way I have found would be to cut the whole page into several >> parts (one the server side, what is already done in the multi-page >> version) and to launch several workers. Each worker gets one part of the >> whole page in the background and could give it to the browsing context >> which will append the right part at the right place. >> > > As others have noted, the slowness turns out to not be parsing, but to be > a bunch of scripts that are doing various things such as adding the > sidebar annotations, setting up the <dfn> cross-references, and generating > the short table of contents. > > Plus, since browsers don't have thread-safe DOM implementations, we > actually can't expose the DOM in workers. Maybe one day. :-) > > -- Ian Hickson => I'm sorry for the misunderstanding. I shouldn't have said "the DOM API". To be as accurate as I can be I want to provide the DOMImplementation interface (http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-102161490) to the workers. As I'm going to explain, the point is to be able to create a document and then a documentFragment. I will explain my point through another use case. (Sorry for the confusion with the HTML5 one-page version.) Let imagine that I want to build a single page with several non-HTML sources of information. They can be in different formats (RSS, datas got from XML-RPC requests, any other kind of XML file, JSON...). I suppose that each source is a different JSON file with different structures (different properties, different nestings). Each source needs a particular treatment. As I said in my first e-mail, there are 3 mains steps before visualizing my page fully loaded. For each source of content, we have to : (1) get the content (2) transform it into a DOM tree (as a documentFragment or a string that is the representation of a HTML fragment, for example) (3) append this to the main document at the right place. (which triggers graphical rendering) This last step is either an appendChild or a ".innerHTML=" and must be done in the main browsing context, there is no choice. Let imagine that I want that one workers per source. For the moment, WebWorkers can do the step (1) independently (thanks to XMLHttpRequest). When each workers receives its JSON string, this string must be transformed into an HTML DOM tree (2) (let say a <table> for example). Because none of the DOM core API is currently available to the WebWorkers, we have two solutions to turn the JSON string received in (1) into an HTML DOM tree : (2.1) Send the JSON string (or the resulting object, whatever) to the main document which will create a documentFragment, run through the JSON object and append the <table>,<tbody>, <tr>s and <td>s and contents to this fragment for all the sources. (2.2) Each worker create a string which looks like "<table id=blabla><tbody><tr class="blibli"><td>1</td><td>2</td></tr><tr class="blibli"><td>3</td><td>37</td></tr></tbody></table>" with "+=" while running through the JSON object. Then send the string through postMessage() and the main browsing context can do a "rightPlace.innerHTML = e.data" (where e.data is the string). (2.1) We have the document/documentFragment/Element/Node abstraction, but we loose all the parallelism, because the browsing context is handling all the sources of information (and creating a documentFragment and all the appendings for each source) (2.2) We have the parallelism, because each Worker handles a source. However, we loose the DOM abstraction. I hope that I have made the string ridiculously long enough to convince you that it is not a good solution. For complicated examples, by experience, using += and .innerHTML is always a source of error especially because of closing tags. These problems don't occur when developing with the DOM abstraction. My proposition is : (2.3) Assuming that we have access to the DOMImplementation interface, we can create an object implementing the document interface which is DIFFERENT from the main document object and I insist on this point. I am NOT proposing to provide an access to the main document (the one which "created the workers"). Thanks to this document, we can create a different documentFragment in each worker and do in a parallel way the documentFragment appendings described in (2.1). The receiving context could have the following code : "onmessage_handler(e){ /* Some code to identify which worker it was and where its documentFragment should be ** inserted in the document. */ rightPlace = some_function(e); // An element in the main document. df = e.data; // this data is the documentFragment sent by the worker. rightPlace.appendChild(df); }" postMessag-ing an element/document/documentFragment/Node can cause a problem because of references to document that they contain (because a worker must NOT have access to the main document and the main document must NOT have access to a worker document either) As far as I can see, there are only two potential problems to postMessage such objects "from one document to another": * From Node interface : ownerDocument. For this, it can be decided that a when postMessage is called on a node, this node and all the subtree are automatically .adoptNode-ed by the main document (window.document) of the receiving context. * From Node interface : parentNode. When a node is postMessage-ed, if its parentNode is a reference to the a node in the worker context (a document or documentFragment), we can automatically do an importNode() from the main document (window.document) on it. By the way, document-s and documentFragment-s have no parent (http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1060184317), so this step is not even necessary for them. This way, between the postMessage() context and the event.data (used during onmessage handler) context, I have broken all the references (if I have forgotten some, tell me, I can propose a solution for them too) to a document living in a different and asynchronous running context. I have described a safe means to send a document/documentFragment/Element/Node from a worker to the main browsing context. The other direction shouldn't be hard to get either. I think that providing the DOMImplementation interface is a good way for implementors to provide a light-weight DOM implementation (because the DOM API needs for workers are not the same than for documents as we know them now). I may be wrong. Note : With a DOMImplementation available, the document response entity body of XMLHttpRequest has no reason to be null anymore. Thanks for your time and your feedback, David
Received on Monday, 7 December 2009 16:38:03 UTC