Re: public comments from the XSL WG on the Query Data Model from Mary F. Fernandez on 2000-08-07 (www-xml-query-comments@w3.org from August 2000)

From: Mary F. Fernandez <mff@research.att.com>
Date: Mon, 07 Aug 2000 13:16:09 -0400
To: w3c-xsl-wg@w3.org
Cc: www-xml-query-comments@w3.org
Message-ID: <398EEED9.78E9BE00@research.att.com>
This message is in response to :
http://lists.w3.org/Archives/Public/www-xml-query-comments/2000Jul/0008.html

The Query WG appreciates the comments from the XSL WG.  We respond to
those comments individually below.  Several issues that would benefit
from additional input by the XSL WG are noted in the response.

Several of the XSL WG's comments relate to stylistic choices, such as
the use of functional notation to specify the data model and the use of
union types instead of subtypes to specify alternatives.  We believe
these choices make the technical content more precise and clear, but
we encourage the XSL WG to identify when and if these choices obscure
or weaken the technical content.

|  The XSL WG has reviewed the XML Query Data Model document (dated 11
|  May 2000) and has following comments and proposal.
|
|  Specific Issues with the current document:
|  -----------------------------------------
|
|  (1) Overall, the way that the XML Query Data Model handles node types

|      is somewhat awkward from an XPath perspective:
|
|      - some functions are defined on the Node type and some on
specific
|        types of Node; for example, isDocNode is defined on Node, but
|        children is defined separately for each kind of node
|
   This is a stylistic choice.  We chose to associate accessors with
   the most specific node to which they apply. Node is an alternative
   of eight node types, therefore it is logical to associate with Node
the
   accessors that distinguish between its alternatives.  Simlilarly,
   children is defined only for those nodes (DocNode, ElemNode) that
   have children elements; it is not defined on the most general Node
   type.

|
|      - it uses union types rather than more familiar subtyping
|
   We chose the union type, because the document is intended to be a
   specification, not a definition of an interface in a particular
   programming language, and because union types make other parts of
   data model more precise.  For example, children of ElemNode can
   only contain element, namespace, or processing instruction nodes:
   the union type captures this constraint.

|      - relationship of the isXXXNode functions to the type system is
|        unclear
|
   Please clarify this comment.  We have not yet introduced a type
   system.

|      - it uses many different separate isXXXNode functions with an
|        implicit constraint that only one of them returns true for a
|        given node, rather than a single nodeType function (as in the
|        DOM)
   This choice is consistent with the functional notation.  Your
   suggestion would require the introduction of a tag value for each
   node type.

|  (2) The approach taken to data-typing by the XML Query Model appears
|      to have difficulty dealing with the possibility that the string
|      representation of a value of particular data-type in an element
|      can be interrupted by processing instructions and comments; for
|      example, the integer 10 might be represented by
|
|        <amount>1<?bizarre processing instruction?>0</amount>
|

   This is correct and the Query WG is aware of this problem.  We see
   three possible solutions to this problem, and we welcome the XSL
   WG's comments on these suggestions.

   (1) Accept the current solution in which only consecutive CDATA
       information items are coalesced into a single scalar value,
       i.e., CDATA items in which there are no intervening processing
       instructions or comments.  The disadvantage of this choice is
       that it is inconsistent with XML Schema, which permits the
       above example and recognizes the content of <amount> as the
       integer 10.

       The advantage of this solution is that it is simple.  We also
       believe it supports the "80-20" rule, i.e., examples like the one

       above are unusual, and the Query WG may choose not to support
       such outliers.

   (2) Ignore all processing instructions and comments or ignore
       those processing instructions and comments that are embedded in
       scalar content.  The disadvantages of this choice is that it
       would prevent querying of embedded processing instructions or
       comments and it would not guarantee that the Infoset to XML
       Query Data Model mapping was invertible.  As above, this
       solution is simple and it does not conflict with XML Schema.

   (3) Adopt a data model in which embedded processing instructions
       and comments are preserved.  The advantages are that these
       items are queryable and that the Infoset to Data Model is
       invertible.  The disadvantage is that the complexity of the
       data model will be increased for the purpose of supporting
       primarily outlier cases.

|  (3) Given that the infoset does not define node identity, we feel
|      that the XML Query data model must define node identity
|      rigourously.  It appears that while the current document assumes
|      such identity, it does not formally define it.
|

    Please see definition of ref type in Sec. 2.1.1 (given below) and
    explain how to make this definition more rigorous, while not
    specifying a particular implementation strategy.

    "The data model provides node references as a mechanism to test
     and bind the identity of nodes in a given instance of the data
     model. The actual mechanism for implementing node identity is
     implementation dependent, for example, node identity might be
     represented by a key value, an object identifier, an XPointer
     value, etc.  Ref(N) denotes a reference to a node with type N.
     The data model provides the function ref to create a reference to
     a node and the function deref to produce the node referent of a
     reference value. Their signatures are:

              ref        : Node      -> Ref(Node)
              deref      : Ref(Node) -> Node

     The function ref is surjective, i.e., it is onto. The function
     deref is the inverse of ref, i.e., for all nodes n,
     node_equal(deref(ref(n)), n) is true, where node_equal is an
     implementation-dependent equality operator over nodes."

|  (4) The data model has the concept of constructors to describe the
|      data model of result trees. In the current model, nodes have
|      accessors for parents, however, parents are not specified in
|      constructors. This appears to be a problem.
|
    This inconsistency is noted.  We mention here that the Query WG is
    currently discussing whether parent will remain an operator in the
    Query Algebra. The algebra is a strongly typed language, and it is
    difficult to infer a non-trival type for parent.  The interests of
    the two WGs may diverge on this issue, and the Query WG welcomes
    more feedback from the XSL WG on this issue.


|  (5) We believe that the approach of representing complex schema types

|      by the data model of that type's schema as an XML document is
|      rather inadequate. This places the burden of processing the
|      schema (traversing import/include links, detecting inheritance,
|      equivalence classes etc.) on every user. We would prefer an
|      approach where element types are represented by some processed
|      form of the schema defining the type.

    The Query WG agrees. This issue has already been raised in the
    Query WG's last call comments on XML Schema.  Please see item
    LC-198 in
    http://www.w3.org/XML/Group/xmlschema-current/lcissues.html.

|  (6) We also noticed the following relatively minor issues:
|
|      - It would be preferable if the XML Query Data Model used terms
|        and abbreviations consistent with XPath. For example, it should

|        use "ProcessingInstructionNode" not "PINode". The choices in
|        XPath were made conciously to maximize useability and we urge
|        the query group to build on the XPath experiences.

    We agree that every effort should be made to use consistent
    terminology.  We note, however, that related specifications, such
    as Infoset and DOM, also use distinct vocabularies.  The issue of
    defining a standard vocabulary for XML data models is in the scope
    of the XML CG but out of scope for XML Query.

|
|    - It is not clear to us why the name of an ElemNode is a
|      *reference* to a QNameValue and not just a QNameValue.
|

    This already has been corrected in a subsequent revision.

|      - It appears that InfoItemNode is not fully specified. Which info

|        items it represents is not clear and the possible answers all
|        have drawbacks: If all implementations are required to support
|        all info items, then that imposes a significant implementation
|        burden, with very little benefit for most users. If not all are

|        required, then there is a significant potential
interoperability
|        problem.
|

    This is a good point.  In the first design of the XML Query Data
    Model, there was no mechanism for accessing the Infoset items from
    which the data model values were derived.  Some members of the XML
    Query WG argued that preserving access to Infoset items was
    important for applications that might, for example, need access to
    individual CDATA items.

    The Query WG is considering eliminating this feature, because, as
    you suggest, it does not support the 80-20 rule.  The Query WG
    welcomes more feedback from the XSL WG on this issue.

|  Discussion Points:
|  -----------------
|
|  (1) Given that the XPath data model is already present, we feel
|      strongly that the XML Query data model should be built with
|      an appropriate relationship to the XPath data model.
|
    We made every effort not to diverge significantly from the XPath
    data model and specified in the document wherever differences with
    XPath do exist.  However, the XPath data model was not an adequate
    basis for the XML Query Algebra nor did it support XML Schema,
    therefore it was necessary to specify in more detail a model that
    was adequate.

|  (2) The above approach makes sense if and only if the XML query group

|      decides to use XPath as a part of the query language. In that
|      case (i.e., if there is committement to use XPath), we feel
|      very strongly that the data models must be closely related.
|

    We agree that there is a clear a benefit to having, at best, a
    single data model and, at worst, closely related models.

|  (3) XPath clearly does not support typed values at this time. Adding
|      schema support to XPath is a vital work-item from the XPath 2.0
|      effort. Adding types to the XPath data model could be achieved
by,
|      for example, adding a "type" node for each element node and
|      attribute node; the "type" node could be accessible by a new
|      "type" axis in XPath; and a new typed-value() function could
|      return the value of a element or attribute converted to the type
|      specified by its type node.
|
|  (4) It is not clear to us that the semi-formal approach used in
|      defining the query data model is sufficiently rigourous. It is
|      clear that the notation leads to a more compact specification,
|      but it is not obvious whether resulting specification is more
|      rigourous than a prose specification.
|

    The formalism supports our two primary goals: to provide a formal
    data model on which the XML Query Algebra can be built and to
    define a specification that can be implemented easily and
    correctly.  In particular, the constructors and accessors of the
    XML Query Data Model are the most basic operators in the XML Query
    Algebra.  Since these operators must be defined formally in some
    manner, we chose to define them in a separate specification using
    a formalism that can easily be mapped onto an implementation in a
    high-level programming language.  This made our job of producing
    an implementable specification easier.

    It might be beneficial, however, to improve the accessibility of
    the specification with more prose description, and we welcome
    collaboration on improving the content and presentation.

--
Mary Fernandez                    AT&T Labs - Research
Principal Technical Staff Member  180 Park Ave., Bldg 103, E243
mff@research.att.com              Florham Park, NJ 07932-0971
http://www.research.att.com/~mff  973-360-8679 FAX: 973-360-8187
Received on Monday, 7 August 2000 13:16:19 UTC