- From: Mary F. Fernandez <mff@research.att.com>
- Date: Mon, 07 Aug 2000 13:16:09 -0400
- To: w3c-xsl-wg@w3.org
- Cc: www-xml-query-comments@w3.org
- Message-ID: <398EEED9.78E9BE00@research.att.com>
This message is in response to :
http://lists.w3.org/Archives/Public/www-xml-query-comments/2000Jul/0008.html
The Query WG appreciates the comments from the XSL WG. We respond to
those comments individually below. Several issues that would benefit
from additional input by the XSL WG are noted in the response.
Several of the XSL WG's comments relate to stylistic choices, such as
the use of functional notation to specify the data model and the use of
union types instead of subtypes to specify alternatives. We believe
these choices make the technical content more precise and clear, but
we encourage the XSL WG to identify when and if these choices obscure
or weaken the technical content.
| The XSL WG has reviewed the XML Query Data Model document (dated 11
| May 2000) and has following comments and proposal.
|
| Specific Issues with the current document:
| -----------------------------------------
|
| (1) Overall, the way that the XML Query Data Model handles node types
| is somewhat awkward from an XPath perspective:
|
| - some functions are defined on the Node type and some on
specific
| types of Node; for example, isDocNode is defined on Node, but
| children is defined separately for each kind of node
|
This is a stylistic choice. We chose to associate accessors with
the most specific node to which they apply. Node is an alternative
of eight node types, therefore it is logical to associate with Node
the
accessors that distinguish between its alternatives. Simlilarly,
children is defined only for those nodes (DocNode, ElemNode) that
have children elements; it is not defined on the most general Node
type.
|
| - it uses union types rather than more familiar subtyping
|
We chose the union type, because the document is intended to be a
specification, not a definition of an interface in a particular
programming language, and because union types make other parts of
data model more precise. For example, children of ElemNode can
only contain element, namespace, or processing instruction nodes:
the union type captures this constraint.
| - relationship of the isXXXNode functions to the type system is
| unclear
|
Please clarify this comment. We have not yet introduced a type
system.
| - it uses many different separate isXXXNode functions with an
| implicit constraint that only one of them returns true for a
| given node, rather than a single nodeType function (as in the
| DOM)
This choice is consistent with the functional notation. Your
suggestion would require the introduction of a tag value for each
node type.
| (2) The approach taken to data-typing by the XML Query Model appears
| to have difficulty dealing with the possibility that the string
| representation of a value of particular data-type in an element
| can be interrupted by processing instructions and comments; for
| example, the integer 10 might be represented by
|
| <amount>1<?bizarre processing instruction?>0</amount>
|
This is correct and the Query WG is aware of this problem. We see
three possible solutions to this problem, and we welcome the XSL
WG's comments on these suggestions.
(1) Accept the current solution in which only consecutive CDATA
information items are coalesced into a single scalar value,
i.e., CDATA items in which there are no intervening processing
instructions or comments. The disadvantage of this choice is
that it is inconsistent with XML Schema, which permits the
above example and recognizes the content of <amount> as the
integer 10.
The advantage of this solution is that it is simple. We also
believe it supports the "80-20" rule, i.e., examples like the one
above are unusual, and the Query WG may choose not to support
such outliers.
(2) Ignore all processing instructions and comments or ignore
those processing instructions and comments that are embedded in
scalar content. The disadvantages of this choice is that it
would prevent querying of embedded processing instructions or
comments and it would not guarantee that the Infoset to XML
Query Data Model mapping was invertible. As above, this
solution is simple and it does not conflict with XML Schema.
(3) Adopt a data model in which embedded processing instructions
and comments are preserved. The advantages are that these
items are queryable and that the Infoset to Data Model is
invertible. The disadvantage is that the complexity of the
data model will be increased for the purpose of supporting
primarily outlier cases.
| (3) Given that the infoset does not define node identity, we feel
| that the XML Query data model must define node identity
| rigourously. It appears that while the current document assumes
| such identity, it does not formally define it.
|
Please see definition of ref type in Sec. 2.1.1 (given below) and
explain how to make this definition more rigorous, while not
specifying a particular implementation strategy.
"The data model provides node references as a mechanism to test
and bind the identity of nodes in a given instance of the data
model. The actual mechanism for implementing node identity is
implementation dependent, for example, node identity might be
represented by a key value, an object identifier, an XPointer
value, etc. Ref(N) denotes a reference to a node with type N.
The data model provides the function ref to create a reference to
a node and the function deref to produce the node referent of a
reference value. Their signatures are:
ref : Node -> Ref(Node)
deref : Ref(Node) -> Node
The function ref is surjective, i.e., it is onto. The function
deref is the inverse of ref, i.e., for all nodes n,
node_equal(deref(ref(n)), n) is true, where node_equal is an
implementation-dependent equality operator over nodes."
| (4) The data model has the concept of constructors to describe the
| data model of result trees. In the current model, nodes have
| accessors for parents, however, parents are not specified in
| constructors. This appears to be a problem.
|
This inconsistency is noted. We mention here that the Query WG is
currently discussing whether parent will remain an operator in the
Query Algebra. The algebra is a strongly typed language, and it is
difficult to infer a non-trival type for parent. The interests of
the two WGs may diverge on this issue, and the Query WG welcomes
more feedback from the XSL WG on this issue.
| (5) We believe that the approach of representing complex schema types
| by the data model of that type's schema as an XML document is
| rather inadequate. This places the burden of processing the
| schema (traversing import/include links, detecting inheritance,
| equivalence classes etc.) on every user. We would prefer an
| approach where element types are represented by some processed
| form of the schema defining the type.
The Query WG agrees. This issue has already been raised in the
Query WG's last call comments on XML Schema. Please see item
LC-198 in
http://www.w3.org/XML/Group/xmlschema-current/lcissues.html.
| (6) We also noticed the following relatively minor issues:
|
| - It would be preferable if the XML Query Data Model used terms
| and abbreviations consistent with XPath. For example, it should
| use "ProcessingInstructionNode" not "PINode". The choices in
| XPath were made conciously to maximize useability and we urge
| the query group to build on the XPath experiences.
We agree that every effort should be made to use consistent
terminology. We note, however, that related specifications, such
as Infoset and DOM, also use distinct vocabularies. The issue of
defining a standard vocabulary for XML data models is in the scope
of the XML CG but out of scope for XML Query.
|
| - It is not clear to us why the name of an ElemNode is a
| *reference* to a QNameValue and not just a QNameValue.
|
This already has been corrected in a subsequent revision.
| - It appears that InfoItemNode is not fully specified. Which info
| items it represents is not clear and the possible answers all
| have drawbacks: If all implementations are required to support
| all info items, then that imposes a significant implementation
| burden, with very little benefit for most users. If not all are
| required, then there is a significant potential
interoperability
| problem.
|
This is a good point. In the first design of the XML Query Data
Model, there was no mechanism for accessing the Infoset items from
which the data model values were derived. Some members of the XML
Query WG argued that preserving access to Infoset items was
important for applications that might, for example, need access to
individual CDATA items.
The Query WG is considering eliminating this feature, because, as
you suggest, it does not support the 80-20 rule. The Query WG
welcomes more feedback from the XSL WG on this issue.
| Discussion Points:
| -----------------
|
| (1) Given that the XPath data model is already present, we feel
| strongly that the XML Query data model should be built with
| an appropriate relationship to the XPath data model.
|
We made every effort not to diverge significantly from the XPath
data model and specified in the document wherever differences with
XPath do exist. However, the XPath data model was not an adequate
basis for the XML Query Algebra nor did it support XML Schema,
therefore it was necessary to specify in more detail a model that
was adequate.
| (2) The above approach makes sense if and only if the XML query group
| decides to use XPath as a part of the query language. In that
| case (i.e., if there is committement to use XPath), we feel
| very strongly that the data models must be closely related.
|
We agree that there is a clear a benefit to having, at best, a
single data model and, at worst, closely related models.
| (3) XPath clearly does not support typed values at this time. Adding
| schema support to XPath is a vital work-item from the XPath 2.0
| effort. Adding types to the XPath data model could be achieved
by,
| for example, adding a "type" node for each element node and
| attribute node; the "type" node could be accessible by a new
| "type" axis in XPath; and a new typed-value() function could
| return the value of a element or attribute converted to the type
| specified by its type node.
|
| (4) It is not clear to us that the semi-formal approach used in
| defining the query data model is sufficiently rigourous. It is
| clear that the notation leads to a more compact specification,
| but it is not obvious whether resulting specification is more
| rigourous than a prose specification.
|
The formalism supports our two primary goals: to provide a formal
data model on which the XML Query Algebra can be built and to
define a specification that can be implemented easily and
correctly. In particular, the constructors and accessors of the
XML Query Data Model are the most basic operators in the XML Query
Algebra. Since these operators must be defined formally in some
manner, we chose to define them in a separate specification using
a formalism that can easily be mapped onto an implementation in a
high-level programming language. This made our job of producing
an implementable specification easier.
It might be beneficial, however, to improve the accessibility of
the specification with more prose description, and we welcome
collaboration on improving the content and presentation.
--
Mary Fernandez AT&T Labs - Research
Principal Technical Staff Member 180 Park Ave., Bldg 103, E243
mff@research.att.com Florham Park, NJ 07932-0971
http://www.research.att.com/~mff 973-360-8679 FAX: 973-360-8187
Received on Monday, 7 August 2000 13:16:19 UTC