- From: Jerome Simeon <simeon@research.bell-labs.com>
- Date: Tue, 17 Jul 2001 12:07:43 -0400
- To: "Makoto Murata" <mmurata@trl.ibm.co.jp>
- Cc: www-xml-query-comments@w3.org, simeon@research.bell-labs.com
Murata-san, This is a response to the following message, which you posted to the XML Query Working Group's comments list: http://lists.w3.org/Archives/Public/www-xml-query-comments/2001Jun/0010.html > Suppose that an XML document <div><div><p/><p/></div></div> is stored > in test.xml. What is the result of the following query? > > document("zoo.xml")//div > > In my understanding, the result is <div><div><p/></p></div></div> > followed by <div><p/></p></div>. Is this correct? > This is correct. The expression document("zoo.xml")//div returns a sequence of all div elements. According to the XPath 1.0 semantics, this sequence is in document order, and does not contain duplicate nodes. Please note that although no given node can appear twice in the toplevel sequence, this does not overrule the possibility for a node to appear both within the toplevel sequence and nested in the content of some other node. > > If this is the case, the image of a simple tree regular language > (<div><div></div></div>, <div><div><p/></div></div>, > <div><div><p/><p/></div></div>,...) by this query is not tree-regular. > I am not sure I understand precisely what your input language is. But I think your question can be rephrased as: "does there exist a tree regular language and a path expression such that the image of that language by this path expression cannot be described precisely by a tree regular language". This statement is true. Here is an example (maybe there exists a simpler one). Assuming a tree regular language described by the following type, written in the syntax of the XML Query Formal Semantics type system: type P = ELEMENT p ( () | (Div1,P,Div2) ) type Div1 = ELEMENT div1 () type Div2 = ELEMENT div2 () Documents of type Div are of the form: <p/> <p><div1/><p/><div2/></p> <p><div1/><p><div1/><p/><div2/></p><div2/></p> etc. Now the result of applying the path expression: //(div1|div2) to such a language results in documents of the form: () <div1/><div2/> <div1/><div1/><div2/><div2/> etc. This language is not regular but context free and can be precisely described by: { (ELEMENT div1 ())^n, (ELEMENT div2 ())^n | n \in N } Moreover, there exist no best regular approximation for that language. I believe a similar result was first stated in [1]. In order to solve that problem, the XQuery 1.0 Formal Semantics relies on a specific recfactor type function which computes an approximate, but often accurate type for the corresponding path expressions. You can consult the Section 4.10 [Typing descendent-or-self] in the XQuery 1.0 Formal semantics for a more detailed definition and explanation of the typing for //. http://www.w3.org/TR/query-algebra/#sec_descendents We appreciate your feedback on the XML Query specifications. Please let us know if this response is satisfactory. If not, please respond to this message, explaining your concerns. Jerome Simeon, On behalf of the XML Query Working Group. References [1] Tova Milo, Dan Suciu, Victor Vianu: Typechecking for XML Transformers. PODS 2000: 11-22 <mmurata@trl.ibm.co.jp> writes: > > Dear colleagues, > > I have a question about the latest working draft "XQuery 1.0: An XML > Query Language". > > >2.1 Path Expressions > .. > >The result of a path expression is a sequence of nodes or primitive > >values. The nodes in a path expression result are ordered according to > >their position in the original hierarchy, in document order (as > >defined in [XPath 1.0].) If the result of a path expression includes > >nodes that are in different documents, the ordering of these nodes is > >implementation-dependent. > > Suppose that an XML document <div><div><p/><p/></div></div> is stored > in test.xml. What is the result of the following query? > > document("zoo.xml")//div > > In my understanding, the result is <div><div><p/></p></div></div> > followed by <div><p/></p></div>. Is this correct? > > If this is the case, the image of a simple tree regular language > (<div><div></div></div>, <div><div><p/></div></div>, > <div><div><p/><p/></div></div>,...) by this query is not tree-regular. > > Or, is my understanding incorrect? The following sentence suggests > that the result is rather <div><div><p/></p></div></div>. If this is > the case, I am quite sure that queries by regular path expressions > always lead to hedge regular languages. > > > The result of a path expression may contain > >duplicate values (i.e., multiple nodes with the same name, type, and > >content), but it will not contain duplicate nodes (i.e., multiple > >occurrences of the same node). > > Cheers, > > <warning>Speaking for himself only</warning> > > IBM Tokyo Research Lab / International University of Japan, Research Institute > MURATA Makoto (FAMILY Given) >
Received on Tuesday, 17 July 2001 12:09:11 UTC