- From: Paul Cotton <pcotton@microsoft.com>
- Date: Tue, 5 Dec 2000 12:35:20 -0800
- To: www-xml-query-comments@w3.org
- Cc: Murali Mani <mani@CS.UCLA.EDU>
Paul Cotton, Microsoft Canada 17 Eleanor Drive, Nepean, Ontario K2E 6A3 Tel: (613) 225-5445 Fax: (425) 936-7329 <mailto:pcotton@microsoft.com> -----Original Message----- From: Murali Mani [mailto:mani@CS.UCLA.EDU] Sent: Tuesday, December 05, 2000 2:36 PM To: Paul Cotton; chamberl@almaden.ibm.com; mff@research.att.com; danaF@crossgain.com; petsa@us.ibm.com; mmani@almaden.ibm.com Cc: Murali Mani Subject: XML Query algebra I was studying a few aspects of XML Query Algebra. I would like to list the four items for the perusal of the Query WG members. I try not to use terms like "useful", "natural" etc, which is left for more experienced people. I would also like to mention that all these are my personal opinions, and does not reflect anyone else's opinions in any way. 1. Order definition in the query data model - Order among child nodes, document order, as well as order among nodes referenced using IDREFs are all "useful". Order among IDREFs nodes cannot be captured by document order. I believe there are numerous solutions possible, one possible solution is define "queryable properties" for each element in the document, and these properties will have the necessary order definitions. 2. Algebra: Should result of a path expression contain duplicate nodes? - The present XML Query Algebra decomposes a path expression into multiple "steps" and executes each step separately. There is no duplicate elimination at the end. Therefore from where I stand, I beleve there will be duplicates in the result we obtain. I am against this, and feel that there should not be duplicates. There could be several reasons, I outline a few I could think of: a) If duplicates are desired, one can easily write multiple path expressions. b) If we write a path expression to determine whether a data definition is cyclic (say part-subpart, parent-ancestor models), then the recursion will not terminate in case of a cycle. c) It conflicts with "simple" mathematical techniques like visit every node, and determine whether that node satisfies the path expression. 3. No regular expression operators in the path expressions in algebra and QL. I believe presence of * will definitely help in recursion in case of a recursive data definition such as part-subpart or parent-ancestor. The fourth point is clouded with some uncertainty, and this could be why I need so much space to explain one thing. 4. Absence of tree pattern match in the algebra, and QL. After some deliberation, I have come to the following tentative definition for tree pattern match: TreePatternMatch (Pred1, Pred2, ... Predn) is one where Predi can be matched by 0 or more nodes, say (Nodei1, Nodei2, ..., Nodein_i). Now tree pattern match requires that (Nodelm \neq Nodepq) for all l, m, p, q, where ... (eliminate trivial case). Why do we suddenly need this feature unheard of in the DB community before? 1. Suppose we have a type definition such as book, and an instance of this b1 -- a) RDBs allow type definitions for book as book -> (title, author, publisher) b) XML allows type definitions as book -> (title, author*, publisher) Now in RDBs, b1.author (or anything) refers to one value, but in XML b1.author refers to multiple values. Now there can be two scenarios in XML in a query like b1 [author = "" and author = ""] -- The user might or might not insist that the two authors be distinct. The scenario where the two authors are distinct is what is assisted by tree pattern match as I define. I have heard oppositions that it cannot be expressed etc, I think these are slightly lame, because we can always use something like what I used above. I believe that a decision on the "usefulness" or "importance" of tree pattern match should be taken after some more careful study. From where I stand, it appears that it has not been studied at all by the members of the Query WG I know. (I also say with sincerity that I am also not familiar with the literature in tree pattern match, which I believe has quite a bit of strong math behind it.) sample references are: a) caterpillar automata by Anne, in the Jl - Markup Technologies. b) a Ph.D thesis referred to in the above paper. also, I believe that 1, 2 and 4 are not evolutionary features, though 3 I consider it to be evolutionary. regards - murali.
Received on Tuesday, 5 December 2000 15:39:27 UTC