Comments on XQuery draft

Comments on:
XQuery: A Query Language for XML
W3C Working Draft 15 February 2001

----------------------------------------------------------------------------
2. The XQuery Language

Figure 1

    The horizontal arrowhead arc is somewhat misleading. To me, it suggests
    "becomes", i.e. one tree is transformed into the other. I suppose what
    you want to convey is that the two E nodes are unconnected, but ordered
    (in the data model). Perhaps a better way to convey this would be
    something like:

       + - - - - +
       .         .
       E         E
       |         |
    +--+--+    +-+-+
    |  |  |    |   |
    T  E  T    E   E

    [best viewed using a fixed-width font] That is, the horizontal-crossbar-
    with-short-verticals denotes an ordered set of nodes, whether they're
    the top-level E nodes, or the children of a node. Those two cases would
    still be distinguished, by the style of the line (solid or dashed), and
    also by the lack of a "parent" for the sequence of top-level nodes.

    Also, if these are meant to be documents, each "top-level" E should
    actually be the child of a top-level root node.

----------------------------------------------------------------------------
2.1 Path Expressions

para 2 ("In XQuery..."):

"the result of a path expression is an ordered list of nodes"
    To my thinking, the phrase "ordered list" is redundant: "list" implies
    order.  (And thus, "unordered list" is nonsense.)
    Note that section 2.8 agrees that every list has an order -- the
    distinction is whether that order is significant or not.

"each node includes its descendant nodes"
    I don't think "includes" is a very good verb here, as it suggests that
    the descendant nodes are somehow *constituents* of the ancestor node.
    Perhaps change "includes" to "is connected to".

"forest.)"
    The period should go outside the close-paren.

"the result can be thought of as an ordered forest"
    Perhaps, but I'm not sure it's wise to use that way of thinking in this
    specification. For instance, you use it when you say
        "the top-level nodes in the path expression result",
    but I think this phrase is more open to misinterpretation. For instance,
    if the result contains a node and one of its ancestors, someone might
    think that only the ancestor is a "top-level" node. Or maybe it's only
    a top-level node if it's a document element node. Or a root node. And so
    on.  But if you stick to the "list of nodes" way of thinking, I think
    there's less chance of confusion.

para 4 ("A path expression can begin with..."):
    This para leaves out relative paths.

"determined by the environment in which the query is executed"
    Actually, it's determined by the context in which the path expression is
    evaluated, which may be different.

"(Q1) In the second chapter..."
    Put this, and all other "(Qn)" paras in this section, in bold italic, to
    match the rest of the document.

'document("zoo.xml")/chapter[2]//..." (and similarly throughout)
    If the document() function returns the root node of the document, this
    expression will not work, because chapter[2] is obviously not a child of
    the root node. See
<http://lists.w3.org/Archives/Public/www-xml-query-comments/2001Mar/0002.html>

para 8 ("It is sometimes desirable..."):

"The first element in a list has ordinal number 1. The ordinal numbers of
elements in a list are not affected by the presence of other types of nodes
such as comments or processing instructions." 
    This is only true if you change "list" to "list of elements" (which
    makes the second sentence pointless). For instance, in
        whatever/node()[RANGE 2 TO 5]
    the list of nodes "presented" to the predicate may include elements,
    comments, etc., so the ordinal numbers of the elements will certainly be
    affected by the presence of other types of nodes.

----------------------------------------------------------------------------
2.2 Element Constructors

para 1
"an optional list of expressions that provide the content of the element"
    Maybe change "content" to "content and attributes". But that might make
    it sound like "all attributes", which it isn't necessarily.
    Maybe "content and (optionally) attributes"? Maybe just leave it.

para 4 ("In the following example...")
"Note that, when a start-tag contains a variable name, the matching end-tag
must contain the same variable name"
    What if it were a different variable with the same value?

"example.)"
    The period belongs after the close-paren.

----------------------------------------------------------------------------
2.3 FLWR Expressions

para 1
"these clauses must appear in a specific order"
    Actually, Appendix B allows FOR and LET clauses in any order. Moreover,
    I think the syntax should allow similar freedom for WHERE clauses,
    instead of forcing them to occur after all the FOR and LET clauses.

para 3
"The result of the FOR-clause is a list of tuples, each of which contains a
binding for each of the variables."
    If XQuery is going to use XPath's concept of Evaluation Context, then I
    think you'll have trouble squaring that with this "list of tuples of
    bindings" model. To stick with the Evaluation Context model, you'd say
    something like:
        For each value yielded by the expression, a binding of the
        variable to that value is added to the context used to evaluate
        the rest of the construct.

"The variables are bound to individual nodes ..."
    Change "nodes" to "values", since the result of the expression might be
    a list of strings, or numbers, etc. (Similarly in following paragraphs.)

"the binding-tuples represent the cross-product of the node-lists returned
by all the expressions"
    This only makes sense if the expressions are independent. If instead,
    one expression uses a variable that is bound to the result of a previous
    expression, e.g.
        FOR $c IN //chapter, $s IN $c/section
    you can't really talk about a cross-product.

"Each variable in a FOR-clause can be thought of as iterating over the nodes
returned by its repsective expression."
    So it doesn't *really*, but we can think of it that way? I'm inclined to
    think that it really does iterate over those nodes. Is there a sense in
    which it doesn't?

para 4
"the variable $x" (twice)
    Put "$x" in a <CODE> element?

para 5
"The number of tuples generated by a FOR/LET sequence is the product of the
cardinalities of the node-lists returned by the expressions in the
FOR-clauses."
    Again, this only makes sense if the expressions are independent.

"determined by the order of their bound elements in the input document"
    They might be bound to non-element nodes, or non-node values.
    Even if they are bound to elements, those elements might not come from
    a single input document.

para 7 ("The RETURN-clause...")
"the RETURN-clause is executed on each tuple, in order, and the order of
results is preserved in the output document."
    But mightn't the results be nodes from the input document(s)? In which
    case, the stated order might not be document order, and might contain
    duplicate nodes, which would seem to disagree with section 2.1, para 2.

para 8
'a document named "bib.xml" that contains a list of <book> elements'
    It would be nice if you said what the top-level element was. <bib>?

para before Q10
[The dictinct() function eliminates duplicates from a list of elements.]
"Two elements are considered to be duplicates if their values (including
name, attributes, and normalized content) are equal."
    The problem with this definition is that the result of a query can vary
    depending on *which* of two duplicate elements is retained. I'm not sure
    what "normalized content" means, but even setting that aside, two
    duplicate elements still have different contexts. For instance, if each
    <chapter> begins with a <section> entitled "Introduction", what does
        distinct(//section/title[.="Introduction"])/..
    yield? A <section>, presumably, but which one?

    Some alternatives:
    (1) In the definition of distinct(), explicitly say which of several
        duplicate elements is retained.
    (2) Say that the matter is implementation-defined.
    (3) Say that the matter is undefined.
    (4) Somehow prohibit "up-navigation" from the result of distinct().
    (5) Restrict distinct() to operate on a list of strings rather than a
        list of elements.

"The result of the distinct function is an unordered set of elements."
    The phrase "unordered set" is redundant: a set *is* unordered.

para before Q14:
"This example uses ... number(element), which returns the content of an
element expressed as a number."
    How come Q14 uses number() in
        2 * number($e)
    but Q13 didn't use it in
        avg(//book/price)
    or
        $b/price > $a
    or
        $b/price - $a
    ?    

----------------------------------------------------------------------------
2.4 Operators in Expressions

para 2
"Each instance of the XML Query data model ... is a forest that includes a
total ordering ... among all its nodes."
    Even attribute nodes? (XPath says that the relative order of attribute
    nodes is implementation-dependent.)

    According to XPath, an element node "occurs before" its children.
    Is it BEFORE them? Are they AFTER it?

"BEFORE and AFTER do not require their operands to have a local ordering."
    What's a local ordering?

----------------------------------------------------------------------------
2.7 Filtering

para 1
"those nodes that are present at any level in the first operand and are also
top-level nodes in the second operand"
    This is another use of the "ordered forest" way of thinking that I 
    discouraged in section 2.1.

    Moreover, you're using "the operand" to mean "the value of the operand",
    although I suppose I can tolerate that shorthand. :)

----------------------------------------------------------------------------
2.9 Functions

para 4
"XQuery Version 1 does not allow user-defined functions to be overloaded--
that is, it does not allow multiple functions to be declared with the same
name and the same number of parameters."
    So if multiple functions were declared with the same name but
    *different* numbers of parameters, that would not be overloading,
    and would be allowed?

"some of the built-in functions in the XQuery core library are overloaded--
for example, the string function of XPath can convert an instance of almost
any type into a string."
    That's not really overloading -- there's only one declaration of the
    function. It just has a very general parameter type.

"The process of finding the best available function for a given function is
called function resolution."
    If XQuery doesn't allow function overloading, why is there more than one
    "available function" to choose from?  The name of the function (and
    perhaps the number of arguments) should uniquely determine (at most) one
    function. Points 1 through 4 are still necessary, but only to determine
    whether the invocation is compatible with the declaration of that
    function.

para 5
"calledfunction"
    Insert space.

point 4, para 3 ("This rule generalizes...")
"a list whose individual members are the results of invoking the function"
    But if any of the results of the multiple invocations are lists, they
    won't be individual members of the uber-result, since they'll be
    flattened into it.

"invoking the function on tuples of arguments taken from the Cartesian
product of the N input lists."
    Does the order of those invocations matter?

para before Q22
"resursively-that"
    Change hyphen to double hyphen.

"emptyand"
    Insert space.

para after Q24
"reachablefunction"
    Insert space.

----------------------------------------------------------------------------
2.10 User-Defined Datatypes

para 1
"used to define an element" & "might define an element"
    Change "element" to "element-type"?

point 1
"the implicit input document"
    The input might not be a single document.

"documentfunction"
    Insert space.

Q26
".../emp[location = ..."
    Change "emp" to "emp_type"?

----------------------------------------------------------------------------
3 Querying Relational Data

Q29
'the notation "SORTBY(.)" ... causes the <pno> elements to be sorted by
their content'
    But this does not necessarily put them in numeric order, which is what
    the English statement of the query specified.

----------------------------------------------------------------------------

-Michael Dyck

Received on Tuesday, 10 April 2001 02:37:22 UTC