- From: Javier Godoy <rjgodoy@fich.unl.edu.ar>
- Date: Tue, 22 Jan 2008 03:11:18 -0200
- To: "Michael Kay" <mhk@mhk.me.uk>, <public-qt-comments@w3.org>
- Cc: "'Sharon Adler'" <sca@us.ibm.com>, "'Andrew Eisenberg'" <andrew.eisenberg@us.ibm.com>, "'Jim Melton'" <jim.melton@acm.org>, "Hugo Minni" <hminni4k@yahoo.com.ar>
Thanks you very much for your opinions. Michael Kay wrote, [http://lists.w3.org/Archives/Public/www-xpath-comments/2008JanMar/0001.html] > A couple of editorial points first: > > (a) you should surely be referring to the XPath 2.0 Recommendation of 23 > January 2007 rather than the Proposed Recommendation of 21 November 2006. > (I would also suggest that you avoid referring to a specifically dated > version, so that you refer the reader to the latest edition at any given > time, which may incorporate errata.) Thanks for pointing this error. Indeed, I was working with REC-xpath20-20070123, but the bibliographical database i used pointed to an older version. I haven't noticed that. --------- > (c) since the namespace prefix "xs" is often used to refer to the XML > Schema namespace, it might be clearer to your readers if you chose > a prefix other than "XS" - perhaps "WXS"? Good point. I though there would be no confusion since "xs" is not normatively bound to XML schemas, but now I realize it will be clearer if I used a different prefix. I will change it. WXS is a good alternative. --------- > Now a general policy point: > > (d) there are many people who seem to perceive a need for subsetting > XPath, with a variety of objectives that usually include (i) reducing the > cost of > implementation, and (ii) making it harder for users to specify expressions > that will be expensive to evaluate. Objetive (i) is not actually our goal (it could be a consequence of some restrictions as I had understood them but, as you stated, modifying an existing XPath implementation for avoiding expensive operations would *increase* the implementation costs). The phrase "reduce the cost of implementing this specification" will be removed because it is misleading. Objetive (ii) is closer to our, but it is not intended as a way for protecting the system (i.e. "avoid expensive queries as a security measure"), since there are many other valid (and required, even after subsetting) expressions which are too expensive. Implementors should have to deal with this expressions (and reject them if appropriate). Instead, our interest is to provide servers with a polite way of rejecting expressions which are not useful (see point (g) below). >The designers of such subsets seem to > come up with a wide variety of different solutions to this problem. This > variety can only confuse users. The Query Schema Description (Section 5) provides (or tries to provide) a way for advertising this variety. Additional elements might be included in the query schema description if that helps on this purpose. > It also makes it less likely that an > implementor can take an existing XPath implementation and reuse it, which > by the law of unintended consequences actually increases costs for > implementors. Despite the difficulty of finding a rational basis for > deciding which features to include in a subset and which to exclude, I > think there is something to be said for having an XPath 2.0 subset > defined by the responsible W3C working groups (XSL and XQuery) > and then strongly discouraging other groups from defining their own > subsets. All features from XPath 2.0 MAY be supported, since none of them is actually forbidden. If implementors think that reusing a full featured XPath component fits their requirements, they are able to do so. On the other hand, if they consider that such component might involve expensive operations or storage, they are allowed to drop some optional features. The subsetting conforms appendix F of XPath 2.0, since the syntactic or semantic definitions of XPath are not modified (e.g., I say that queries MAY fail if some numeric predicates are specified, because they are out of the minimal subset"; however this does not alter element ordering, and does not modify the semantic of numeric predicates). OPTIONAL features (as defined in draft-godoy-webdav-xmlsearch) are to be understood as in RFC 2119, Section 5 (definition of keywords "MAY" and "OPTIONAL"): "An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.) " Such interoperation may be insuficient as proposed in the current version of my draft. Again, the Query Schema Description should be augmented. --------- > Now some detailed technical points: > > (e) an implementation that does not support descendant, > descendant-or-self, or "//" is going to be pretty unusable. > Searching for elements at > arbitrary depth is a great user convenience, and is essential in the case > of > recursive document structures. If you're going to make some of the axes > optional, I > suggest you choose the same subset as XQuery chose. The difference wrt XQuery required axes is that it includes the descendant and descendant-or-self axes (namespace is deprecated in XQuery, while i'm not sure about inheriting such deprecation.) In my draft, the rationale for making descendant and descendant-or-self optional was that they don't add expression power if elements occur at a well-known position within the tree. The "//" abbreviation was made optional because of the descendant-or-self axis. Thinking carefully about this point, it seems that either: - there will be elements at arbitrary depth (quite possible). In this case descendant and descendant-or-self would be convenient for selecting them. - if there are no elements at arbitrary depth (e.g., a structure defined by a very simple schema or DTD) it would be easy to implementors to optimize the query by other means. For instance, if we have: <!ELEMENT metadata (title, author+, comment*) > <!ELEMENT title (#PCDATA)> <!ELEMENT author (firstname, lastname)> <!ELEMENT firstname (#PCDATA) > <!ELEMENT lastname (#PCDATA) > the expression "//lastname" would only refer to "/metadata/author/lastname" and could be optimized in that way. It seems "descendant" and "descendant-or-self" should be REQUIRE, since there is no advantage in making them OPTIONAL. The "//" abbreviation would be allowed too. --------- > (f) you define the minimum set of functions that an implementation must > supply as being empty (no functions). There are some functions such as > not() and count() that I would consider absolutely indispensible. Agree. The minimum function signatures should be revised. --------- > (g) I don't think the restrictions you propose for numeric predicates > assist with either of your design objectives (reduced implementation > cost, throttled performance). They just make the language less > orthogonal and less interoperable. The idea behind this requirement was facilitating content to be stored in an "optimized" form (maybe a relational database or anyother implementation-dependent solution). IMHO, The impact of supporting numeric predicates depends on which kind of sequence they apply. For instance, supporting numeric predicates in AxisExpr selecting the child axis only requires some indexes, whose size is proportional to the number of children. On the other hand, supporting these predicates in AxisExpr selecting the descendant axis would require either calculating the element position on-the-fly, or storing an index which is proportional in size to the number of (ancestor,descendant) pairs. If the numeric predicate applies to a FilterExpr, then indexes may not help since many different FilterExpr are allowed and it would be overwhelming to index all of them. Maybe the restriction is too drastic, but... what is the meaning of the i-eth element within a sequence which is not semantically ordered? (besides that such element is well-defined because element ordering is well-defined) For instance, if we have <!ELEMENT metadata (title, author+, comment*) > <!ELEMENT title (#PCDATA)> <!ELEMENT author (firstname, lastname)> <!--author elements are orderer, more relevant author first--> <!ELEMENT comment (#PCDATA)> <!--ordering of comment elements is not significative--> "/metadata/author[1]" is meaningful, while "/metadata/comment[1]" is not (while both are valid XPath expressions). I MAY forget everything about the comment order when storing information in my "optimized storage", because it is not required by the application context. Why should I be forbidden to do so, only because it is required by XPath? (I'm not meaning it is an XPath fault, it is only that full-featured XPath is too expressive for my hypothetical simple schema) Regards, Javier
Received on Tuesday, 22 January 2008 05:12:13 UTC