- From: John Boyer <boyerj@ca.ibm.com>
- Date: Fri, 30 Sep 2011 10:07:27 -0700
- To: Nick Van den Bleeken <Nick.Van.den.Bleeken@inventivegroup.com>
- Cc: "<public-forms@w3.org>"<public-forms@w3.org>
- Message-ID: <OF087BD3E2.82F51792-ON8825791B.005A8575-8825791B.005E1143@ca.ibm.com>
Hi Nick, This would at least help with getting the guarantees that would be useful to authors. However, XForms is a consumer of the XPath technology, and XPath defines the ways in which it can be extended, namely by adding functions. It is best to minimize the extent to which implementers have to hack the consumed XPath processor, particularly when their is a defined extension mechanism that can achieve the same effect. As for xforms authors not having to worry about performance, they already do that when choosing the kinds of xforms constructs to use, not to mention the kinds of xpath expressions to use or to avoid. It's also interesting that you chose next-sibling::*[1] because I was originally thinking that I really wanted foo[2], foo[3], foo[last()-1], and foo[last()-2] to be constant time, or more generally that foo[k] and foo[last()-k] would be O(k). So, in one processor someone would write foo[3] and get good performance, but in another they'd get poor performance unless they changed their form to use foo[1]/next-sibling::*[1]/next-sibling::*[1], at which point the form loses performance in the first processor. Next up, I bet users would ask why foo[number(some/node)] doesn't achieve constant performance when some/node contains 1. Given that functions are the defined extension point, it sure seemed like a good idea to use that mechanism because then you also have a single, documented way for an author to achieve the particular effect they want, if they want it. The function gets the author to say exactly what they're doing so that we don't have to write an AI system to discover the endless sequence of similar xpath expression patterns that people might think of to do a specific thing. Put another way, if we're going to hack the xpath processor, why not say that *all* numeric predicates should get constant performance? foo[n/2] is constant in every other language than xpath, so why not xpath too? The answer, of course, is that you can't consistently get this performance across all xpath axes without blowing memory usage out to O(n^2), which of course takes that long to initialize. The egregious one is descendant::node()[n/2]. Getting constant time on that operation for all nodes is just not going to happen. The functions also give authors something consistent to rely on. If you care about performance, use these functions. If there's not a function, then there's not an optimized way of doing it. From a language consistency standpoint, this seems better than suggesting "use a predicate and hope it is one of the ones that your chosen implementation optimized". Cheers, John From: Nick Van den Bleeken <Nick.Van.den.Bleeken@inventivegroup.com> To: John Boyer/CanWest/IBM@IBMCA Cc: "<public-forms@w3.org>" <public-forms@w3.org> Date: 09/30/2011 05:58 AM Subject: Re: Proposed note for insert and new functions for XForms 2.0 Sent by: public-forms-request@w3.org John, I might be a bit biased because I use XPath on a day-to-day basis and really like it ;) But I'm not really in favor of adding functions that get the evaluation speed you would expect (and love) and their dual native XPath expressions that behave poorly. If you are concerned that implementations will behave not optimal for foo[1], foo[last()], next-sibling::*[1] and previous-sibling::*[1] we could add language to the spec saying that an implementation 'should' evaluate expressions like those in a constant time. On the other hand I know that for example Saxon does a lot effort rewriting/simplifying XPath expressions when staticly analyzing the XPath expression that go way beyond the simple optimizations that you are requesting. I won't like it that you have to know all 'special' XForms extension functions, and know when to use them, to get optimal performance. One of the nice things of XForms is, that the author doesn't have to worry for performance, because the implementation optimizes the form (I have to admit that there are implementations that do a better job in optimizing forms then others, but due to a healthy competition others will follow automatically). Kind regards, Nick Van den Bleeken R&D Manager Phone: +32 3 821 01 70 Office fax: +32 3 821 01 71 nick.van.den.bleeken@inventivegroup.com www.inventivedesigners.com On 29 Sep 2011, at 22:01, John Boyer wrote: Hi Nick, It is better to think about the problem from the viewpoint of the XForms author, not the XForms implementer. W3C usually doesn't discuss performance that much; it was once suggested to me that they view it as a market differentiator for implementers, but that's a frustrating approach for the consumer of the standards-based technology. >From the consumer's viewpoint, I'd like features in the language that I can directly express that are *guaranteed* to produce consistent results across implementations, where consistent means not just information results but also user experience. As an XForms author I can't rely on all implementers to make certain optimizations that are not called out in the specification. I'd like to have language features that I can rely on, or at the very least that I SHOULD be able to rely on (in the RFC 2119 sense). If foo[1] is a constant time operation in my favorite processor, and a linear time operation in a number of other processors, then I might implement a feature like table sorting that works almost instantly in one processor but takes 5 seconds to complete in another. The results of both processors are informationally the same (which is what the W3C almost exclusively focuses on), but they are not consistent from a user experience standpoint. On a more technical level, it seemed like you were OK with adding last-child(), as you didn't say "same question" on that one. Well, it turns out that I justified first-child() before last-child() because it seemed like the right order and made sense for talking about stacks, but the one I really need most for sorting is last-child(), and if you're up for that, then I ought to be able to get first-child() just out of a need for language consistency. So let me assume that you're also suggesting that my implementation should also optimize foo[last()] rather than adding last-child(). It turns out that some of the things I find myself trying to do, even just to do some kinds of sort methods, involve trying to get not just the first or last child, but the first or last two or three elements. It becomes much easier to just give the XForms author a few extra efficient functions like next-sibling() and prev-sibling() rather than coming up with an increasingly long laundry list of specialized xpath patterns that they would have to memorize as the "safe" ones to use... in my implementation. You might optimize different expressions that do the same things... Anyway, on top of that, I suspect that a lot of really good uses of next-sibling() and prev-sibling will come when they are used in combination with XPath variables, which means characterizing optimizations involving across actions, and therefore across xpath expression evaluations. For want of five of the simplest functions imaginable, it seems a lot to ask XForms authors to hope that all implementers would write static analyzers that intuit what they're trying to do and then do it better than the native XPath engine is normally capable of. It seems way better to extend the XPath engine with functions that all authors can then count on. Best regards, John M. Boyer, Ph.D. Distinguished Engineer, IBM Forms and Smarter Web Applications IBM Canada Software Lab, Victoria E-Mail: boyerj@ca.ibm.com Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer Blog RSS feed: http://www.ibm.com/developerworks/blogs/rss/JohnBoyer?flavor=rssdw From: Nick Van den Bleeken <Nick.Van.den.Bleeken@inventivegroup.com > To: John Boyer/CanWest/IBM@IBMCA Cc: "<public-forms@w3.org>" <public-forms@w3.org> Date: 09/29/2011 11:33 AM Subject: Re: Proposed note for insert and new functions for XForms 2.0 Sent by: public-forms-request@w3.org John, I included some remarks/questions inline. Kind regards, Nick Van den Bleeken R&D Manager Phone: +32 3 821 01 70 Office fax: +32 3 821 01 71 nick.van.den.bleeken@inventivegroup.com www.inventivedesigners.com <Mail Attachment.png><Mail Attachment.png><Mail Attachment.png> On 28 Sep 2011, at 23:51, John Boyer wrote: RECOMMENDATION: Add a function with signature along the lines of "nodeset first-child(nodeset?, boolean?)" that returns the first child node of a given node, or the context node if no first parameter. The function SHOULD achieve O(1) on the operation. By default ignore whitespace text nodes, but if second parameter given and true, then include them in result. With this function implemented efficiently (and the efficient insert above), N stack operations will take O(N) processing time, and without it, the result is O(N^2). Why couldn't you just rewrite foo[1] to ibm:first-child(foo) during the static optimization phase when parsing the XPath expression (with ibm:first-child your proprietary extension function)? I think/hope that your XPath engine already does a lot other static optimizations when 'compiling/analizing' the XPath expression. Efficiently implementing queue-based algorithms requires the ability to push on one side of a list and pop from the other. Given the ability described above to efficiently insert at the beginning of the list, it seems we only need a way to efficiently identify the last child of a node in constant time. RECOMMENDATION: Add a function with signature along the lines of "nodeset last-child(nodeset?, boolean?)" that returns the last child node of a given node, or the context node if no first parameter. The function SHOULD achieve O(1) on the operation. By default ignore whitespace text nodes, but if second parameter given and true, then include them in result. Incidentally, the last-child() function also allows efficient insertion at the end of a list of children of a node by using it in the insert nodeset. This is helpful in creating an efficient sort that constructs its result in least to greatest order. The efficient first-child() and last-child() functions also appear to enable optimal O(N log N) sorting via XForms action loops, when used in combination with the iterate attribute. Without having at least first-child(), the best sort performance appears to be O(N log N log log N). A number of more useful data structures become possible to implement if it is also possible to obtain the parent, next sibling, and previous sibling of a node in constant time. These functions would prove convenient in implementing some sorting methods in the most natural ways, but efficient versions of several comparison-based data structures (e.g. various priority queues, trees and graphs) appear to become possible as well. Same remark/question why don't do this during the static optimization phase when parsing the XPath expression. RECOMMENDATION: Add functions with signatures along the lines of "nodeset parent(nodeset?)" returns the parent of a given node, or the context node if no first parameter. "nodeset next-sibling(nodeset?, boolean?)" returns the next sibling of a given node, or the context node if no first parameter. "nodeset prev-sibling(nodeset?, boolean?)" returns the previous sibling of a given node, or the context node if no first parameter. These functions SHOULD achieve O(1) on the operation. For latter two functions, ignore whitespace text nodes by default, but if second parameter given and true, then include them in result. For all of the functions, clearly if the first, last, next, previous or parent node doesn't exist, then an empty nodeset is returned. Finally, the eval() function is already on the list for XForms 2.0 with the use case from Michael Sperberg-McQueen of being able to use a repeat to drill into and edit arbitrarily structured XML. A second use case for the function pertains to algorithms like sorting as well as comparison-based data structure management. Given an eval() function, it would be possible to allow a form author to provide the comparison expression as a string so that the sort or comparison data structure could compare strings, numbers, dates or even secondary keys in case of primary key equality. This would help separate the algorithm or data structure from the application details. Same remark/question why don't do this during the static optimization phase when parsing the XPath expression. RECOMMENDATION: Add function with signature along lines of "object eval(string, nodeset?, number?)" that evaluates an xpath given in the string using the evaluation context of the eval() function, or in the context of a nodeset and a node at a given position, if those params are specified. definitely +1 for this one there is already a wiki page for this one http://www.w3.org/MarkUp/Forms/wiki/Eval_function Inventive Designers' Email Disclaimer: http://www.inventivedesigners.com/email-disclaimer -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. Inventive Designers' Email Disclaimer: http://www.inventivedesigners.com/email-disclaimer
Attachments
Received on Friday, 30 September 2011 17:07:59 UTC