- From: Mary F. Fernandez <mff@research.att.com>
- Date: Mon, 07 Aug 2000 13:16:09 -0400
- To: w3c-xsl-wg@w3.org
- Cc: www-xml-query-comments@w3.org
- Message-ID: <398EEED9.78E9BE00@research.att.com>
This message is in response to : http://lists.w3.org/Archives/Public/www-xml-query-comments/2000Jul/0008.html The Query WG appreciates the comments from the XSL WG. We respond to those comments individually below. Several issues that would benefit from additional input by the XSL WG are noted in the response. Several of the XSL WG's comments relate to stylistic choices, such as the use of functional notation to specify the data model and the use of union types instead of subtypes to specify alternatives. We believe these choices make the technical content more precise and clear, but we encourage the XSL WG to identify when and if these choices obscure or weaken the technical content. | The XSL WG has reviewed the XML Query Data Model document (dated 11 | May 2000) and has following comments and proposal. | | Specific Issues with the current document: | ----------------------------------------- | | (1) Overall, the way that the XML Query Data Model handles node types | is somewhat awkward from an XPath perspective: | | - some functions are defined on the Node type and some on specific | types of Node; for example, isDocNode is defined on Node, but | children is defined separately for each kind of node | This is a stylistic choice. We chose to associate accessors with the most specific node to which they apply. Node is an alternative of eight node types, therefore it is logical to associate with Node the accessors that distinguish between its alternatives. Simlilarly, children is defined only for those nodes (DocNode, ElemNode) that have children elements; it is not defined on the most general Node type. | | - it uses union types rather than more familiar subtyping | We chose the union type, because the document is intended to be a specification, not a definition of an interface in a particular programming language, and because union types make other parts of data model more precise. For example, children of ElemNode can only contain element, namespace, or processing instruction nodes: the union type captures this constraint. | - relationship of the isXXXNode functions to the type system is | unclear | Please clarify this comment. We have not yet introduced a type system. | - it uses many different separate isXXXNode functions with an | implicit constraint that only one of them returns true for a | given node, rather than a single nodeType function (as in the | DOM) This choice is consistent with the functional notation. Your suggestion would require the introduction of a tag value for each node type. | (2) The approach taken to data-typing by the XML Query Model appears | to have difficulty dealing with the possibility that the string | representation of a value of particular data-type in an element | can be interrupted by processing instructions and comments; for | example, the integer 10 might be represented by | | <amount>1<?bizarre processing instruction?>0</amount> | This is correct and the Query WG is aware of this problem. We see three possible solutions to this problem, and we welcome the XSL WG's comments on these suggestions. (1) Accept the current solution in which only consecutive CDATA information items are coalesced into a single scalar value, i.e., CDATA items in which there are no intervening processing instructions or comments. The disadvantage of this choice is that it is inconsistent with XML Schema, which permits the above example and recognizes the content of <amount> as the integer 10. The advantage of this solution is that it is simple. We also believe it supports the "80-20" rule, i.e., examples like the one above are unusual, and the Query WG may choose not to support such outliers. (2) Ignore all processing instructions and comments or ignore those processing instructions and comments that are embedded in scalar content. The disadvantages of this choice is that it would prevent querying of embedded processing instructions or comments and it would not guarantee that the Infoset to XML Query Data Model mapping was invertible. As above, this solution is simple and it does not conflict with XML Schema. (3) Adopt a data model in which embedded processing instructions and comments are preserved. The advantages are that these items are queryable and that the Infoset to Data Model is invertible. The disadvantage is that the complexity of the data model will be increased for the purpose of supporting primarily outlier cases. | (3) Given that the infoset does not define node identity, we feel | that the XML Query data model must define node identity | rigourously. It appears that while the current document assumes | such identity, it does not formally define it. | Please see definition of ref type in Sec. 2.1.1 (given below) and explain how to make this definition more rigorous, while not specifying a particular implementation strategy. "The data model provides node references as a mechanism to test and bind the identity of nodes in a given instance of the data model. The actual mechanism for implementing node identity is implementation dependent, for example, node identity might be represented by a key value, an object identifier, an XPointer value, etc. Ref(N) denotes a reference to a node with type N. The data model provides the function ref to create a reference to a node and the function deref to produce the node referent of a reference value. Their signatures are: ref : Node -> Ref(Node) deref : Ref(Node) -> Node The function ref is surjective, i.e., it is onto. The function deref is the inverse of ref, i.e., for all nodes n, node_equal(deref(ref(n)), n) is true, where node_equal is an implementation-dependent equality operator over nodes." | (4) The data model has the concept of constructors to describe the | data model of result trees. In the current model, nodes have | accessors for parents, however, parents are not specified in | constructors. This appears to be a problem. | This inconsistency is noted. We mention here that the Query WG is currently discussing whether parent will remain an operator in the Query Algebra. The algebra is a strongly typed language, and it is difficult to infer a non-trival type for parent. The interests of the two WGs may diverge on this issue, and the Query WG welcomes more feedback from the XSL WG on this issue. | (5) We believe that the approach of representing complex schema types | by the data model of that type's schema as an XML document is | rather inadequate. This places the burden of processing the | schema (traversing import/include links, detecting inheritance, | equivalence classes etc.) on every user. We would prefer an | approach where element types are represented by some processed | form of the schema defining the type. The Query WG agrees. This issue has already been raised in the Query WG's last call comments on XML Schema. Please see item LC-198 in http://www.w3.org/XML/Group/xmlschema-current/lcissues.html. | (6) We also noticed the following relatively minor issues: | | - It would be preferable if the XML Query Data Model used terms | and abbreviations consistent with XPath. For example, it should | use "ProcessingInstructionNode" not "PINode". The choices in | XPath were made conciously to maximize useability and we urge | the query group to build on the XPath experiences. We agree that every effort should be made to use consistent terminology. We note, however, that related specifications, such as Infoset and DOM, also use distinct vocabularies. The issue of defining a standard vocabulary for XML data models is in the scope of the XML CG but out of scope for XML Query. | | - It is not clear to us why the name of an ElemNode is a | *reference* to a QNameValue and not just a QNameValue. | This already has been corrected in a subsequent revision. | - It appears that InfoItemNode is not fully specified. Which info | items it represents is not clear and the possible answers all | have drawbacks: If all implementations are required to support | all info items, then that imposes a significant implementation | burden, with very little benefit for most users. If not all are | required, then there is a significant potential interoperability | problem. | This is a good point. In the first design of the XML Query Data Model, there was no mechanism for accessing the Infoset items from which the data model values were derived. Some members of the XML Query WG argued that preserving access to Infoset items was important for applications that might, for example, need access to individual CDATA items. The Query WG is considering eliminating this feature, because, as you suggest, it does not support the 80-20 rule. The Query WG welcomes more feedback from the XSL WG on this issue. | Discussion Points: | ----------------- | | (1) Given that the XPath data model is already present, we feel | strongly that the XML Query data model should be built with | an appropriate relationship to the XPath data model. | We made every effort not to diverge significantly from the XPath data model and specified in the document wherever differences with XPath do exist. However, the XPath data model was not an adequate basis for the XML Query Algebra nor did it support XML Schema, therefore it was necessary to specify in more detail a model that was adequate. | (2) The above approach makes sense if and only if the XML query group | decides to use XPath as a part of the query language. In that | case (i.e., if there is committement to use XPath), we feel | very strongly that the data models must be closely related. | We agree that there is a clear a benefit to having, at best, a single data model and, at worst, closely related models. | (3) XPath clearly does not support typed values at this time. Adding | schema support to XPath is a vital work-item from the XPath 2.0 | effort. Adding types to the XPath data model could be achieved by, | for example, adding a "type" node for each element node and | attribute node; the "type" node could be accessible by a new | "type" axis in XPath; and a new typed-value() function could | return the value of a element or attribute converted to the type | specified by its type node. | | (4) It is not clear to us that the semi-formal approach used in | defining the query data model is sufficiently rigourous. It is | clear that the notation leads to a more compact specification, | but it is not obvious whether resulting specification is more | rigourous than a prose specification. | The formalism supports our two primary goals: to provide a formal data model on which the XML Query Algebra can be built and to define a specification that can be implemented easily and correctly. In particular, the constructors and accessors of the XML Query Data Model are the most basic operators in the XML Query Algebra. Since these operators must be defined formally in some manner, we chose to define them in a separate specification using a formalism that can easily be mapped onto an implementation in a high-level programming language. This made our job of producing an implementable specification easier. It might be beneficial, however, to improve the accessibility of the specification with more prose description, and we welcome collaboration on improving the content and presentation. -- Mary Fernandez AT&T Labs - Research Principal Technical Staff Member 180 Park Ave., Bldg 103, E243 mff@research.att.com Florham Park, NJ 07932-0971 http://www.research.att.com/~mff 973-360-8679 FAX: 973-360-8187
Received on Monday, 7 August 2000 13:16:19 UTC