What is the result of a path expression? from Jerome Simeon on 2001-07-17 (www-xml-query-comments@w3.org from July 2001)

From: Jerome Simeon <simeon@research.bell-labs.com>
Date: Tue, 17 Jul 2001 12:07:43 -0400
To: "Makoto Murata" <mmurata@trl.ibm.co.jp>
Cc: www-xml-query-comments@w3.org, simeon@research.bell-labs.com
Message-ID: <15188.25295.596351.176871@starling.research.bell-labs.com>
Murata-san,

This is a response to the following message, which you posted to the
XML Query Working Group's comments list:

http://lists.w3.org/Archives/Public/www-xml-query-comments/2001Jun/0010.html

> Suppose that an XML document <div><div><p/><p/></div></div> is stored 
> in test.xml.  What is the result of the following query?
> 
>    document("zoo.xml")//div
> 
> In my understanding, the result is <div><div><p/></p></div></div>
> followed by <div><p/></p></div>.  Is this correct?  
> 

This is correct. The expression document("zoo.xml")//div returns a
sequence of all div elements. According to the XPath 1.0 semantics,
this sequence is in document order, and does not contain duplicate
nodes. Please note that although no given node can appear twice in the
toplevel sequence, this does not overrule the possibility for a node
to appear both within the toplevel sequence and nested in the content
of some other node.

> 
> If this is the case, the image of a simple tree regular language
> (<div><div></div></div>, <div><div><p/></div></div>,
> <div><div><p/><p/></div></div>,...) by this query is not tree-regular.
> 

I am not sure I understand precisely what your input language is. But
I think your question can be rephrased as: "does there exist a tree
regular language and a path expression such that the image of that
language by this path expression cannot be described precisely by a
tree regular language".

This statement is true. Here is an example (maybe there exists a
simpler one). Assuming a tree regular language described by the
following type, written in the syntax of the XML Query Formal
Semantics type system:

type P    = ELEMENT p    ( () | (Div1,P,Div2) )
type Div1 = ELEMENT div1 ()
type Div2 = ELEMENT div2 ()

Documents of type Div are of the form:

<p/>

<p><div1/><p/><div2/></p>

<p><div1/><p><div1/><p/><div2/></p><div2/></p>

etc.

Now the result of applying the path expression: //(div1|div2) to such
a language results in documents of the form:

()

<div1/><div2/>

<div1/><div1/><div2/><div2/>

etc.

This language is not regular but context free and can be precisely
described by:

{ (ELEMENT div1 ())^n, (ELEMENT div2 ())^n | n \in N }

Moreover, there exist no best regular approximation for that
language. I believe a similar result was first stated in [1].

In order to solve that problem, the XQuery 1.0 Formal Semantics relies
on a specific recfactor type function which computes an approximate,
but often accurate type for the corresponding path expressions.

You can consult the Section 4.10 [Typing descendent-or-self] in the
XQuery 1.0 Formal semantics for a more detailed definition and
explanation of the typing for //.

  http://www.w3.org/TR/query-algebra/#sec_descendents

We appreciate your feedback on the XML Query specifications.  Please
let us know if this response is satisfactory. If not, please respond
to this message, explaining your concerns.

Jerome Simeon,
On behalf of the XML Query Working Group.


References

[1] Tova Milo, Dan Suciu, Victor Vianu: Typechecking for XML
Transformers. PODS 2000: 11-22


<mmurata@trl.ibm.co.jp>  writes:
> 
> Dear colleagues,
> 
> I have a question about the latest working draft "XQuery 1.0: An XML
> Query Language".
> 
> >2.1 Path Expressions
> ..
> >The result of a path expression is a sequence of nodes or primitive
> >values. The nodes in a path expression result are ordered according to
> >their position in the original hierarchy, in document order (as
> >defined in [XPath 1.0].) If the result of a path expression includes
> >nodes that are in different documents, the ordering of these nodes is
> >implementation-dependent.
> 
> Suppose that an XML document <div><div><p/><p/></div></div> is stored 
> in test.xml.  What is the result of the following query?
> 
>    document("zoo.xml")//div
> 
> In my understanding, the result is <div><div><p/></p></div></div>
> followed by <div><p/></p></div>.  Is this correct?  
> 
> If this is the case, the image of a simple tree regular language
> (<div><div></div></div>, <div><div><p/></div></div>,
> <div><div><p/><p/></div></div>,...) by this query is not tree-regular.
> 
> Or, is my understanding incorrect?  The following sentence suggests
> that the result is rather <div><div><p/></p></div></div>.  If this is
> the case, I am quite sure that queries by regular path expressions 
> always lead to hedge regular languages.
> 
> > The result of a path expression may contain
> >duplicate values (i.e., multiple nodes with the same name, type, and
> >content), but it will not contain duplicate nodes (i.e., multiple
> >occurrences of the same node).
> 
> Cheers,
> 
> <warning>Speaking for himself only</warning>
> 
> IBM Tokyo Research Lab / International University of Japan, Research Institute
> MURATA Makoto (FAMILY Given)
>
Received on Tuesday, 17 July 2001 12:09:11 UTC