- From: Michael Dyck <MichaelDyck@home.com>
- Date: Mon, 09 Apr 2001 23:36:06 -0700
- To: www-xml-query-comments@w3.org
Comments on: XQuery: A Query Language for XML W3C Working Draft 15 February 2001 ---------------------------------------------------------------------------- 2. The XQuery Language Figure 1 The horizontal arrowhead arc is somewhat misleading. To me, it suggests "becomes", i.e. one tree is transformed into the other. I suppose what you want to convey is that the two E nodes are unconnected, but ordered (in the data model). Perhaps a better way to convey this would be something like: + - - - - + . . E E | | +--+--+ +-+-+ | | | | | T E T E E [best viewed using a fixed-width font] That is, the horizontal-crossbar- with-short-verticals denotes an ordered set of nodes, whether they're the top-level E nodes, or the children of a node. Those two cases would still be distinguished, by the style of the line (solid or dashed), and also by the lack of a "parent" for the sequence of top-level nodes. Also, if these are meant to be documents, each "top-level" E should actually be the child of a top-level root node. ---------------------------------------------------------------------------- 2.1 Path Expressions para 2 ("In XQuery..."): "the result of a path expression is an ordered list of nodes" To my thinking, the phrase "ordered list" is redundant: "list" implies order. (And thus, "unordered list" is nonsense.) Note that section 2.8 agrees that every list has an order -- the distinction is whether that order is significant or not. "each node includes its descendant nodes" I don't think "includes" is a very good verb here, as it suggests that the descendant nodes are somehow *constituents* of the ancestor node. Perhaps change "includes" to "is connected to". "forest.)" The period should go outside the close-paren. "the result can be thought of as an ordered forest" Perhaps, but I'm not sure it's wise to use that way of thinking in this specification. For instance, you use it when you say "the top-level nodes in the path expression result", but I think this phrase is more open to misinterpretation. For instance, if the result contains a node and one of its ancestors, someone might think that only the ancestor is a "top-level" node. Or maybe it's only a top-level node if it's a document element node. Or a root node. And so on. But if you stick to the "list of nodes" way of thinking, I think there's less chance of confusion. para 4 ("A path expression can begin with..."): This para leaves out relative paths. "determined by the environment in which the query is executed" Actually, it's determined by the context in which the path expression is evaluated, which may be different. "(Q1) In the second chapter..." Put this, and all other "(Qn)" paras in this section, in bold italic, to match the rest of the document. 'document("zoo.xml")/chapter[2]//..." (and similarly throughout) If the document() function returns the root node of the document, this expression will not work, because chapter[2] is obviously not a child of the root node. See <http://lists.w3.org/Archives/Public/www-xml-query-comments/2001Mar/0002.html> para 8 ("It is sometimes desirable..."): "The first element in a list has ordinal number 1. The ordinal numbers of elements in a list are not affected by the presence of other types of nodes such as comments or processing instructions." This is only true if you change "list" to "list of elements" (which makes the second sentence pointless). For instance, in whatever/node()[RANGE 2 TO 5] the list of nodes "presented" to the predicate may include elements, comments, etc., so the ordinal numbers of the elements will certainly be affected by the presence of other types of nodes. ---------------------------------------------------------------------------- 2.2 Element Constructors para 1 "an optional list of expressions that provide the content of the element" Maybe change "content" to "content and attributes". But that might make it sound like "all attributes", which it isn't necessarily. Maybe "content and (optionally) attributes"? Maybe just leave it. para 4 ("In the following example...") "Note that, when a start-tag contains a variable name, the matching end-tag must contain the same variable name" What if it were a different variable with the same value? "example.)" The period belongs after the close-paren. ---------------------------------------------------------------------------- 2.3 FLWR Expressions para 1 "these clauses must appear in a specific order" Actually, Appendix B allows FOR and LET clauses in any order. Moreover, I think the syntax should allow similar freedom for WHERE clauses, instead of forcing them to occur after all the FOR and LET clauses. para 3 "The result of the FOR-clause is a list of tuples, each of which contains a binding for each of the variables." If XQuery is going to use XPath's concept of Evaluation Context, then I think you'll have trouble squaring that with this "list of tuples of bindings" model. To stick with the Evaluation Context model, you'd say something like: For each value yielded by the expression, a binding of the variable to that value is added to the context used to evaluate the rest of the construct. "The variables are bound to individual nodes ..." Change "nodes" to "values", since the result of the expression might be a list of strings, or numbers, etc. (Similarly in following paragraphs.) "the binding-tuples represent the cross-product of the node-lists returned by all the expressions" This only makes sense if the expressions are independent. If instead, one expression uses a variable that is bound to the result of a previous expression, e.g. FOR $c IN //chapter, $s IN $c/section you can't really talk about a cross-product. "Each variable in a FOR-clause can be thought of as iterating over the nodes returned by its repsective expression." So it doesn't *really*, but we can think of it that way? I'm inclined to think that it really does iterate over those nodes. Is there a sense in which it doesn't? para 4 "the variable $x" (twice) Put "$x" in a <CODE> element? para 5 "The number of tuples generated by a FOR/LET sequence is the product of the cardinalities of the node-lists returned by the expressions in the FOR-clauses." Again, this only makes sense if the expressions are independent. "determined by the order of their bound elements in the input document" They might be bound to non-element nodes, or non-node values. Even if they are bound to elements, those elements might not come from a single input document. para 7 ("The RETURN-clause...") "the RETURN-clause is executed on each tuple, in order, and the order of results is preserved in the output document." But mightn't the results be nodes from the input document(s)? In which case, the stated order might not be document order, and might contain duplicate nodes, which would seem to disagree with section 2.1, para 2. para 8 'a document named "bib.xml" that contains a list of <book> elements' It would be nice if you said what the top-level element was. <bib>? para before Q10 [The dictinct() function eliminates duplicates from a list of elements.] "Two elements are considered to be duplicates if their values (including name, attributes, and normalized content) are equal." The problem with this definition is that the result of a query can vary depending on *which* of two duplicate elements is retained. I'm not sure what "normalized content" means, but even setting that aside, two duplicate elements still have different contexts. For instance, if each <chapter> begins with a <section> entitled "Introduction", what does distinct(//section/title[.="Introduction"])/.. yield? A <section>, presumably, but which one? Some alternatives: (1) In the definition of distinct(), explicitly say which of several duplicate elements is retained. (2) Say that the matter is implementation-defined. (3) Say that the matter is undefined. (4) Somehow prohibit "up-navigation" from the result of distinct(). (5) Restrict distinct() to operate on a list of strings rather than a list of elements. "The result of the distinct function is an unordered set of elements." The phrase "unordered set" is redundant: a set *is* unordered. para before Q14: "This example uses ... number(element), which returns the content of an element expressed as a number." How come Q14 uses number() in 2 * number($e) but Q13 didn't use it in avg(//book/price) or $b/price > $a or $b/price - $a ? ---------------------------------------------------------------------------- 2.4 Operators in Expressions para 2 "Each instance of the XML Query data model ... is a forest that includes a total ordering ... among all its nodes." Even attribute nodes? (XPath says that the relative order of attribute nodes is implementation-dependent.) According to XPath, an element node "occurs before" its children. Is it BEFORE them? Are they AFTER it? "BEFORE and AFTER do not require their operands to have a local ordering." What's a local ordering? ---------------------------------------------------------------------------- 2.7 Filtering para 1 "those nodes that are present at any level in the first operand and are also top-level nodes in the second operand" This is another use of the "ordered forest" way of thinking that I discouraged in section 2.1. Moreover, you're using "the operand" to mean "the value of the operand", although I suppose I can tolerate that shorthand. :) ---------------------------------------------------------------------------- 2.9 Functions para 4 "XQuery Version 1 does not allow user-defined functions to be overloaded-- that is, it does not allow multiple functions to be declared with the same name and the same number of parameters." So if multiple functions were declared with the same name but *different* numbers of parameters, that would not be overloading, and would be allowed? "some of the built-in functions in the XQuery core library are overloaded-- for example, the string function of XPath can convert an instance of almost any type into a string." That's not really overloading -- there's only one declaration of the function. It just has a very general parameter type. "The process of finding the best available function for a given function is called function resolution." If XQuery doesn't allow function overloading, why is there more than one "available function" to choose from? The name of the function (and perhaps the number of arguments) should uniquely determine (at most) one function. Points 1 through 4 are still necessary, but only to determine whether the invocation is compatible with the declaration of that function. para 5 "calledfunction" Insert space. point 4, para 3 ("This rule generalizes...") "a list whose individual members are the results of invoking the function" But if any of the results of the multiple invocations are lists, they won't be individual members of the uber-result, since they'll be flattened into it. "invoking the function on tuples of arguments taken from the Cartesian product of the N input lists." Does the order of those invocations matter? para before Q22 "resursively-that" Change hyphen to double hyphen. "emptyand" Insert space. para after Q24 "reachablefunction" Insert space. ---------------------------------------------------------------------------- 2.10 User-Defined Datatypes para 1 "used to define an element" & "might define an element" Change "element" to "element-type"? point 1 "the implicit input document" The input might not be a single document. "documentfunction" Insert space. Q26 ".../emp[location = ..." Change "emp" to "emp_type"? ---------------------------------------------------------------------------- 3 Querying Relational Data Q29 'the notation "SORTBY(.)" ... causes the <pno> elements to be sorted by their content' But this does not necessarily put them in numeric order, which is what the English statement of the query specified. ---------------------------------------------------------------------------- -Michael Dyck
Received on Tuesday, 10 April 2001 02:37:22 UTC