Re: Two comments on the XML Query Data Model.

Thanks for the reply.

On Thu, 14 Dec 2000, Mary F. Fernandez wrote:

> > 1. I believe one of the parallels to a relational table definition and
> > multiple rows in the table in the XML world is define one schema and have
> > multiple documents for it. From my glancing at the data model draft, I
> > could not find any mention of such a requirement. Is it not suitable to
> > define an XML repository/database as <schema, setOfDocs> pairs?
> 
> Yes, that is a perfectly suitable definition, although it does not
> correspond directly to the PSV Infoset, which is the basis
> for the XML Query Data Model.  An instance of the PSV Infoset 
> is a document annotated with the schema information by which
> it was validated.  The XML Query Data Model is based on this input.
> 
> It might be appropriate, however, for an *implementation* of
> the XML Query Data Model to represent an XML database as a 
> schema, document-set pair. 

I think the questions are does the query data model define what any query
should be able to address -- in this case, defining a repository as a
set of <schema, document-set> pairs will allow to formulate queries such
as <perform this query on all documents that conform to this schema>

I agree that the above is mostly implementation dependent, and does not
merit any further discussion.

> > 2. The Query WG relies on obtaining an ordered list of trees (also called
> > hedge, sometimes also called forest) for the result of a Path expression.
> > This is not provided by XPath which, I believe, defines a node set (one
> > of the four possible results of an XPath Expression) as an unordered set
> > of trees. Does not this additional requirement for the Query purposes
> > require more definitions in the data model? What I think is required is
> > that you define an order for nodes that can be returned by one path
> > expression.
> 
> You are correct: the Query Data Model supports ordered lists (forests)
> of trees, and XPath expressions compute node sets, which, when
> necessary, are sorted by a total document order.  It is somewhat
> cumbersome
> to define XPath's semantics in terms of ordered lists.
> Even in the absence of this incompatibility, support
> for unordered collections (e.g., bags and sets) of trees
> are required by the XML Query use cases.  Therefore, unordered
> collections of trees will be added to the XML Query Data Model
> and will be supported by the XML Query Algebra.  Unordered collections,
> however, will not appear in the first public working draft
> of the Algebra but will appear in a subsequent draft.  Given unordered
> collections, it is possible to define XPath semantics correctly
> and simply.
> 
> > 
> > A typical example of the above scenario is in traversal of links, for
> > example, you can have a book element which has author as children, and
> > also an IDREFS attribute called authors which points to person.
> > I believe such a path expression is valid -
> > book/(author | @authors -> person)
> > Now does this not require an order specification between child elements
> > and elements reachable through links? The reason is a user might expect
> > the same ordered result for the above expression as for the following
> > expression -
> > book (@author -> person | author)
> > Note that the above is possible even without an OR operator in the query.
> > 
> 
> A minor point: the above syntax is from Quilt, not XPath.
> XPath uses the id() function to dereference IDREF(S) values.
> As you note, the two expressions above should be equivalent.
> XPath does *not* require an order specification between child
> elements and dereferenced IDREF attributes, because all nodes
> are always sorted by *document* order.  The assumption is that
> a total document order exists on nodes, and that order may be
> implementation
> dependent.

I think the notions of duplicate nodes in the result of a path expression,
and order definition are quite closely related. These are my two cents
worth of thought --

1. Even if the query algebra defines ordered lists, all that it is doing
is defining order for what typically is a set. Therefore the result of a
single path expression cannot have duplicate nodes.
2. You can result in duplicate nodes in multiple cases --
   a) You have an OR/UNION operator -> From very little thought, I would
suggest that in such a case the result be unordered/ordered by document
order.
   b) You have dereference operators -> I believe dereference operation
requires order, which is not the same as document order. I would suggest
that the result be the order be as is "natural" for dereference, and not
be the document order. Also dereference can result in duplicates. I would
suggest that there be no duplicates.
   c) In case we have recursion (*), say P1/(P2)*/P3, again duplicate
nodes will arise only when there is dereference operation as part of the
recursion sub-expression, viz, P2. I would recommend that again there be
no duplicates, and the order be what is "natural" for dereference.

I believe in almost every other case, I believe we will not obtain
duplicate nodes, and global document order is sufficient.

In short, the questions as to whether we should return an ordered list
when we have dereference, and if yes, what should be the order, are the
questions to be answered.

thanks and regards - murali.

Received on Thursday, 14 December 2000 13:40:37 UTC