- From: Michael Rys <mrys@microsoft.com>
- Date: Sat, 24 Feb 2001 13:38:53 -0800
- To: "W3C XML Query WG (E-mail) (E-mail)" <w3c-xml-query-wg@w3.org>
- Cc: "'www-xml-query-comments@w3.org'" <www-xml-query-comments@w3.org>
The following contains feedback on the XQuery document (based on the Jan 5th draft) by our prototype developers (I apologize for the delay in forwarding). I tried to cross-check with the WD release to make sure that I do not repeat fixed issues and with the issues list at http://www.w3.org/XML/Group/2001/02/issues.xml (which is not consistent with the documents issues list!!!). Issues that are in the document but not at the separate address above are repeated on purpose. In addition, we gave feedback on the normative grammar directly to the syntax editors some of which has been included already into the latest grammar drafts it appears. Once the working group decides on the final approach (the XQuery element constructor syntax or the XML embedded grammars), we will be happy to provide an LALR(1) for consideration for the normative grammar description part. My apologies for the formatting, but it is XML transformed into HTML and the XML is somewhat normalized and unordered, so I provide the text based on the HTML. The feedback is ordered according to sections. My comments and remarks are tagged with <MR>. Best regards Michael Issues grouped by section 2 The XQuery Language Issue #2: multiline comments ( Section 2: The XQuery Language, paragraph 6; normal priority, syntax) XQuery should be enhanced to allow multiline comments. If the standard doesn't do this, vendors are sure to do it anyway, just as T-SQL adds /*...*/ comments to SQL. 2.1 Path Expressions Issue #3: duplicate removal ( Section 2.1: Path Expressions, paragraph 2; critical, semantics) <MR>Make sure that all operations that need to perform duplicates elimination in list in XQuery's path expression language define which duplicate is removed. Potential candiate is the union operator |.</MR> Issue #5: consider character RANGEs ( Section 2.1: Path Expressions, paragraph 9; normal priority, semantics) When I worked with Pavel Curtis (formerly at PARC/Xerox) on a language project (MOO) , we found that it was also extremely useful to allow character ranges, like RANGE 'a' TO 'z'. In general, the RANGE operator should work anywhere a list of values is required. For example, the expression FOR $i IN RANGE 1 TO 5 RETURN <number>$i</number> should be a valid XQuery. <MR>This is actually a feature request for a more general RANGE operator that generates a list of values in the given range. The character range of course needs to be collation sensitive and thus needs a reference to a collation order when used to generate the range list.</MR> Issue #6: better RANGE syntax ( Section 2.1: Path Expressions, paragraph 9; high priority, syntax) XPath has such a compact syntax, that the currently proposed RANGE operator syntax is extremely awkward in comparison (or do you intend to introduce arbitrary XQuery expressions into XPath predicates?). I propose using the following alternative syntax, using the character sequence ... as a range operator: RANGE a TO b becomes a...b This syntax is more XPath-like. You should also consider introducing a special lexical sequence or function for the length of the enclosing node set, e.g., chapter[2...$] or else chapter[2...length()] <MR>Original proposal used .. which would overload parent operator ..</MR> Issue #7: improve description of dereference ( Section 2.1: Path Expressions, paragraph 11; low priority, editorial) This description of the dereference operator is pretty confusing. For example, is the name following a dereference operator an element name (exactly the same as an XPath name test) or a type name? (Both explanations are given <MR>* stands for any #type# should probably be #name#</MR>.) Improve the exposition. What happens in a query like FOR $node IN (//procedure UNION FOR $p IN //procedure[1], $e IN //* AFTER ($p//incision)[1] BEFORE ($p//incision)[2] RETURN shallow($e) ) RETURN $node/idref->procedure (modified from example Q16) -- can the dereference operator result in shallow node copies, or only the original nodes (presumably both)? In general, we can create XML that is no longer valid (in the sense that multiple elements share the same id value) -- what happens to dereference then? Issue #8: computed references ( Section 2.1: Path Expressions, paragraph 11; normal priority, semantics) The dereference operator applies only to path expressions. However, in general, one will want to be able to compute a reference and then dereference it. This functionality needs to be added to XQuery. E.g., concat('E', '3')->emp/@mgr Issue #9: default namespace is poorly defined ( Section 2.1: Path Expressions, paragraph 17; high priority, syntax) The syntax NAMESPACE DEFAULT = "uri" precludes the use of a prefix named DEFAULT (without putting the name DEFAULT in single-quotes). This would be avoided if you change the syntax slightly to DEFAULT NAMESPACE = "uri" <MR>We proposed that change for the grammar and it seems to be added to the last grammar versions</MR> Issue #36: no local namespace decls ( Section 2.1: Path Expressions, paragraph 16; critical, semantics) The namespace decls are kind of global. But in XML, namespace decls are local to a particular element. XQuery needs to allow us to declare namespaces on an element and then override those namespace decls on subelements. 2.2 Element Constructors Issue #10: quote usage is awkward ( Section 2.2: Element Constructors, paragraph 8; critical, syntax) Changing the usage of quotes (as used in XPath, XSLT, and XML) is going to cause a lot of user confusion and other problems. For one thing, it means that no one can leverage any existing parsing code they have, because the quoting rules have changed. But more importantly, it will cause problems for tools that auto-generate XQuery expressions. Suppose a tool wants to auto-generate a query from an existing chunk of XML. Instead of being able to copy that chunk of XML into XQuery, the chunk has to be specially serialized using the XQuery quoting rules (but only if the chunk contains an XQuery keyword). Also, note that as XQuery goes through future versions, the keyword set may expand (e.g., you may add an UPDATE keyword). These quoting rules will break backwards compatibility with older queries that use the new keywords as unquoted identifiers. Some alternatives: * (recommended) Introduce unambiguous leading characters for all identifiers. The grammar is almost like this already -- for example, in the expression <FOR it's already clear that this is an element constructor, because of the leading < symbol. Similarly for variable names, which always begin with $. If you add an @ symbol in front of attribute names in element constructors, then I think the only remaining cases are names in path expressions (which is a problem XPath has already -- for example, the XPath and/or/@and) and prefix and function names. You need to use the same approach for path expressions that XPath uses, to be consistent with XPath. I think it would be fine to require that prefix and function names (which are always local to the query, and not data-dependent) cannot collide with keyword names. * Reserve keywords without possibility for use as names. This alternative is unacceptable, but we mention it because so many programming languages use it (including Java, C, C++). * Introduce sufficient token lookahead to disambiguate identifiers from keywords. This solution requires sitting down with the grammar and figuring out what the rules should be. It may complicate the grammar description, but a solution (if one exists) is unlikely to require more than two or three tokens of lookahead. <MR>This is an issue that would disappear with the XML based syntax under discussion. I left it in since no decision has yet been made.</MR> Issue #11: no computed attribute constructor ( Section 2.2: Element Constructors, paragraph 3; critical, syntax) There appears to be no computed attribute constructor. That is, if the variable $a contains the name of my attribute, I cannot build the element <foo $a="value"/>. <MR>We have discussions on this but I could not find it in the issues list yet</MR> Issue #12: literal string representation different from both XPath and XML ( Section 2.2: Element Constructors, paragraph 7; normal priority, syntax) Note that the syntax for escaped quote characters in string literals differs from the XPath grammar (XPath does not allow escape characters - how broken is that?!) and the XML grammar (which uses entities). This may be problematic for some scenarios (e.g., automatic XQuery generation). Issue #37: well-formedness constraints ( Section 2.2: Element Constructors, paragraph 1; critical, semantics) Xml syntax are mixed into the XQuery syntax. How well-formed should they be? Meaning for <author>, should all the characters be complianced to the Xml 1.0 standard? And the end tag should match with element tag? <MR>This may be addressed by the XML based grammar. Otherwise, we need to mention the constraints on what can be in the close tag based on the open tag.</MR> Issue #38: validity constraints ( Section 2.2: Element Constructors, paragraph 1; critical, semantics) Are the results of expressions validated? If a schema type is associated with an element, but the element does not satisfy the schema constraints, does an error occur? What about scalar values that do not match pattern facets? Etc., etc. <MR>We have discussions on this on the mailing lists, but I could not find it in the issues list yet</MR> Issue #39: non-element ctors ( Section 2.2: Element Constructors, paragraph 5; normal priority, semantics) In example Q8, why can't the construction of comment and processing instruction be the same as the element constructor? Instead of using comment("Houston, we have a problem"), it could be simply <!-- Houston, we have a problem -->. <MR>We have discussions on this on the mailing lists, but I could not find it in the issues list yet</MR> 2.3 FLWR Expressions Issue #13: tuple order of FOR ( Section 2.3: FLWR Expressions, paragraph 5; normal priority, semantics) The description of FLWR expressions says that "The tuples generated by the FOR/LET sequence have an order that is determined by the order of their bound elements in the input document, with the first bound variable taking precedence, followed by the second bound variable, and so on.". But of course, there may be many input documents, not just one. Also, there may not be a document at all. Also, it is important to point out to the reader that this ordering is different from XPath (which uses reverse document order for the reverse axes). Also, you should point out that unlike path expressions (which remove duplicate nodes), the XQuery FOR $a IN path, $b IN PATH generates a cross-product of path with itself (not a single iteration through path). Issue #14: result of RETURN ( Section 2.3: FLWR Expressions, paragraph 7; critical, semantics) The result type of a RETURN expression does not seem to be limited to "nodes, ordered forests of nodes, or primitive values" as described. Some examples: * list of primitives: FOR $a IN document("zoo.xml") LET $b := path_selecting_a_list_of_scalar_types RETURN $b * empty result: FOR $a IN document("zoo.xml") RETURN () * list of both primitives and nodes: FOR $a IN document("zoo.xml") RETURN 3,<foo/> * unordered forest of nodes: FOR $a IN distinct(document("zoo.xml")) RETURN <foo>$a</foo> If there are semantic constraints on RETURN that exceed the existing syntactical constraints, then these need to be clearly defined. We also need to know which constraints can be determined at analysis-time, and which can only be determined at execution-time (e.g., an empty result). Issue #15: result of an unordered RETURN ( Section 2.3: FLWR Expressions, paragraph 7; critical, semantics) I'm especially concerned about the last example given for issue 14: FOR $a IN distinct(document("zoo.xml")) RETURN <foo>$a</foo> How are the results of an unordered RETURN supposed to be serialized out (or otherwise ordered later). Is this expression supposed to be illegal unless I apply an explicit sort or group-by clause? Note that even if the FOR loop is unordered, the RETURN result might still be unambiguously ordered (if it is independent of the FOR). This is a mess. <MR>I left out the issue that relates to xquery-unordered-collections (distinct should only remove duplicates and a toset operation should do what distinct does now)</MR> Issue #17: expressions as literal content ( Section 2.3: FLWR Expressions, paragraph 17; high priority, syntax) This syntax suffers from the problem that is is not immediately clear to the user which characters will be literally echoed into the result, and which are operators in the syntax and will be interpreted. Consider example Q13, or an even simpler version of it: LET $a := avg(//book/price) FOR $b in /book RETURN <diff>$b/price - $a</diff> Presumably (the spec does not explain the results for most of the examples) this is supposed to return a result that looks like <diff>1.50</diff><diff>-3.24</diff> But the user might have expected the result <diff>7.50 - 6.00</diff><diff>2.76 - 6.00</diff> I suppose this alternate result is supported through the quoting rules LET $a := avg(//book/price) FOR $b in /book RETURN <diff>$b/price "-" $a</diff> but the user cannot determine by simple inspection that the quotes were needed (that the hyphen would be interpreted instead of echoed). It would be much easier both for language parsers and users if interpreted expressions were syntactically distinguished from literal values. For example, LET $a := avg(//book/price) FOR $b in /book RETURN <diff>{ $b/price - $a }</diff> vs. LET $a := avg(//book/price) FOR $b in /book RETURN <diff>{ $b/price } - { $a }</diff> (I don't necessarily advocate the use of curly braces for this purpose; I just picked a random punctuation character to illustrate the concept.) <MR>This is an issue that would probably disappear with the XML based syntax under discussion. I left it in since no decision has yet been made.</MR> Issue #19: SORTBY semantics not well-defined ( Section 2.3: FLWR Expressions, paragraph 20; critical, semantics) The semantics of SORTBY are undefined. Collation order? Data type? How do the three sorts SORT BY ($b/price), SORT BY ($b/price/text()), and SORT BY (number($b/price)) differ? How does schema information affect (or not affect) a sort? What if the key set is empty? What does an expression that mixes types in the sort key, like FOR $h IN //holding RETURN <holding>$h/title</holding> SORT BY ( IF $h/@type="journal" THEN $h/editor ELSE number($h/price) ) (modified from Q18) return? <MR>We have discussions on this but I could not find it in the issues list yet</MR> Issue #40: non-element ctors ( Section 2.3: FLWR Expressions, paragraph 10; normal priority, semantics) In example Q10, the explanation of duplicate values for distinct is still too vague. Does attribute order matter? Encodings? Maybe we can use the canonical XML spec to distinguish two element contents. For example, consider the XML <e a1="1" a2="2"> <child></child> </e> <e a2="2" a1="1"><child/></e> <MR>I would like to see this as a request for clarification. Attribute order does not matter according to Infoset thus our datamodel does not provide for that either.</MR> Issue #41: text() vs. data() ( Section 2.3: FLWR Expressions, paragraph 21; normal priority, syntax) In Q15, text() is used. Shouldn't this change to data() for the same reasons it's data() in the algebra spec? <MR>This is issue 48, but I could not find it in the separate issues list yet</MR> 2.4 Operators in Expressions Issue #20: document order definition should be at the beginning ( Section 2.4: Operators in Expressions, paragraph 2; low priority, editorial) The explanation of ordinal position belongs at the beginning of this document, not in section 2.4. Issue #21: definition of data model instance and global ordering ( Section 2.4: Operators in Expressions, paragraph 2; critical, semantics) In these specs, the phrase "data model instance" is hopelessly confused with the phrase "XML document or fragment". The data model spec currently says that a data model instance is a possibly unordered collection of zero or more XML documents and fragments, and the XQuery spec says that only one data model instance is the input to an XQuery. Thus, a document model instance might have no global ordering. Also, even if the top-level nodes of the data model are ordered, we know that a query fragment can result in a non-ordered list of nodes. So how does global ordering work then? This issue really needs to be resolved and cleared up once-and-for-all. If we cannot describe consistently the data model and its ordering (or lack thereof), then how can we hope to define a query language over it? Until this issue is resolved, the BEFORE/AFTER semantics are not well-defined. For example, FOR $a IN document("one.xml"), $b IN document("two.xml") WHERE $a/title BEFORE $b/title RETURN $a Issue #22: BEFORE/AFTER vs. XPath preceding/following ( Section 2.4: Operators in Expressions, paragraph 2; high priority, semantics) How do the XQuery keywords BEFORE and AFTER differ from the XPath axes preceding and sibling? If they don't differ, then these keywords should be removed. If they do differ, then we need examples. Issue #23: shallow semantics not well-defined ( Section 2.4: Operators in Expressions, paragraph 2; normal priority, semantics) Does shallow() copy text, comment, or p-i content of an element? What is the global order of the resulting node? Since attributes (which may be ID-typed) are copied, does this new node have the same or a different identity from the original node? <MR>The author of this remark seems confused between node identity and the impact of ID-types on node ID. Thus I leave this remark in the issue to make sure that the difference will be clear.</MR> 2.8 Datatypes Issue #16: comma is problematic and unnecessary ( Section 2.8: Datatypes, paragraph 5; high priority, syntax) The use of commas to separate element constructors seems both unnecessary and problematic. How can I put a comma into the text content of an element, like <comma>,</comma>? Must I really quote it, as in <comma>","</comma> Remove comma from this grammar. I understand the desire to construct lists. However, lists of elements are already clear (<a/><b/> is a list of two elements. Since lists of lists are not allowed (i.e., there is no need to distinguish between a list of three elements <a/><b/><c/> vs. a list of two elements followed by one element <a/><b/>,<c/>) there is no added value in writing $a,$b,$c instead of just $a $b $c (even when the variables are bound to primitive values). This is also consistent with the way the values will be serialized out (e.g., idrefs/nmtokens are space-separated lists, not comma-separated lists). Also, there is no value in using square brackets to construct lists -- parentheses will work just as well (with no conflict with their use as grouping operators). <MR>This would be solved by the XML based construction grammar</MR> Issue #25: namespace for builtin XQuery data types ( Section 2.8: Datatypes, paragraph 4; high priority, semantics) Built-in XQuery data types (like ELEMENT, ATTRIBUTE, and LIST) should not be represented with keywords, but instead as types in a reserved XQuery namespace. This is extensible as well as consistent and compatible with the use of XML Schema types. And there are probably a half-dozen more reasons to prefer this approach to keywords. 2.9 User-Defined Functions Issue #26: list arguments to functions ( Section 2.9: User-Defined Functions, paragraph 10; critical, semantics) Although I want very much want set semantics in XQuery, the fourth rule for function resolution needs work. Does this rule work only with single-argument functions? What about a two-argument function? Consider this query fragment: FOR $c IN document("customers.xml") LET $orders := $c//Order RETURN concat("O", $orders/@OrderID) Also, what happens if sets are passed as both arguments? Don't forget that set-based semantics conflicts strongly with XPath 1.0. Using set-semantics for XPath functions means that you create a new meaning for path expressions that is different from the XPath 1.0 spec. Experience with SQL XML shows that this new semantics needs careful definition -- it is not enough to wave hands and say that a list-typed argument to a function expecting a scalar results in a list of that function applied to each scalar. The interactions of this rule with the rest of the XPath language (like types -- XPath has no scalar list type, only nodeset) must be explored and defined. <MR>See my proposal at http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2001Feb/0271.html </MR> Issue #27: coercion rules for function return types ( Section 2.9: User-Defined Functions, paragraph 14; critical, semantics) The rules for function resolution partially sketch out how type coercions work on function arguments. However, the spec does not define how type coercions work on function return values. Consider several possible variations of this function: FUNCTION coerce() RETURNS xsd:integer { RETURN 1.0 -- RETURN "1" -- RETURN [ 1 ] -- etc. } Are these legal XQueries, and if so, how do they work? Issue #28: connected() misses some nodes ( Section 2.9: User-Defined Functions, paragraph 17 ; low priority, code example/sample) In example Q23, the connected() function counts nodes connected through IDREF(s) attributes, but not IDREF(s) text content (e.g., if the element is <e>12345</e>). Also, if this function is supposed to work in general, then it should probably use descendant-or-self instead of just child (that is, $e//* instead of $e/*). Issue #47: wrong idref instance in Q23 ( Section 2.9: User-Defined Functions, paragraph 17; high priority, code example/sample) In example Q23, the example uses a number as a ID/IDREF value which is not a valid instance. 2.10 User-Defined Datatypes Issue #29: schema information should be easily queryable ( Section 2.10: User-Defined Datatypes, paragraph 5; normal priority, code example/sample) In example Q26, the namespace uri is repeated in the query and the schema. Schema information such as target-namespace should be easily exposed to the query engine. Of course, the user could do something like SCHEMA "myschema.xsd" NAMESPACE xsd = "http://www.w3.org/2000/10/XMLSchema" DEFAULT NAMESPACE = document("myschema.xsd")/xsd:schema/@targetNamespace/text() but why should the user have to do that? Instead, I should be able to do something like: SCHEMA $myschema = "myschema.xsd" DEFAULT NAMESPACE = target-namespace($myschema) If you don't define a library of such functions for XSD manipulation, every user will end up writing their own anyway. 2.11 Operations on Datatypes Issue #30: type names need to be computable ( Section 2.11: Operations on Datatypes, paragraph 2; high priority, type system) The type argument to INSTANCEOF should be computeable (e.g., $expr INSTANCEOF $type. In general, types should be dynamically computable and not limited to compile-time only. Issue #31: type operations are incomplete ( Section 2.11: Operations on Datatypes, paragraph 2 <http://www.w3.org/XML/Group/2001/01/xquery.html>; critical, type system) The type operations described in this section do not capture all of the type relationships expressible in XSD. Presumably this section is pending the work on MSL, but in any case, there is more to types than just subtyping and coercions.. Issue #32: CAST syntax should be a function ( Section 2.11: Operations on Datatypes, paragraph 2; high priority, syntax) To be consistent with XPath, type casting should be done through functions (e.g., number(), string(), etc.). A function syntax would also be consistent with the scalar type constructors (e.g., date("2001-01-30") ). If necessary, add a generic cast() function that takes two arguments - the expression to be cast, and the type to which it should be cast. <MR>This is another proposal for resolution of issue [xquery-cast-expression]. See also recent discussions on TREAT and CAST.</MR> 3 Querying Relational Data Issue #48: Remove the section on relational data representation in XML ( Section 3: Querying Relational Data; high priority, editorial) <MR>This section should be removed and examples on grouping and joins should be given without basing it on a (IMO) strange relational-XML mapping.</MR> B XQuery Grammar <MR>Generally, there should be only a BNF based LALR(1) normative grammar. The language specific grammar should disappear from the next working draft</MR> E XQuery Semantics <http://www.w3.org/XML/Group/2001/01/xquery.html> Issue #35: RANGE cannot map to algebra ( Section E: XQuery Semantics, paragraph 1 <http://www.w3.org/XML/Group/2001/01/xquery.html>; critical, mapping to algebra) There is no corresponding algebra operator for RANGE Issue #42: contains() has no algebra equivalent ( Section E: XQuery Semantics, paragraph 1 <http://www.w3.org/XML/Group/2001/01/xquery.html>; critical, mapping to algebra) How is the function contains() mapped to the algebra? Issue #43: FILTER has no algebra equivalent ( Section E: XQuery Semantics, paragraph 1 <http://www.w3.org/XML/Group/2001/01/xquery.html>; critical, mapping to algebra) How is the filter operator mapped to the algebra? Issue #46: algebra operations not exposed in XQuery ( Section E: XQuery Semantics, paragraph 1 <http://www.w3.org/XML/Group/2001/01/xquery.html>; critical, mapping to algebra) What are the XQuery equivalents for the algebra concepts bag, bagtolist, and index()?
Received on Saturday, 24 February 2001 16:39:34 UTC