RE: Regarding a question on the mailing list www-ql from Murali Mani on 2001-03-26 (www-xml-query-comments@w3.org from March 2001)

From: Murali Mani <mani@CS.UCLA.EDU>
Date: Sun, 25 Mar 2001 16:11:19 -0800 (PST)
To: Paul Cotton <pcotton@microsoft.com>
cc: <www-xml-query-comments@w3.org>
Message-ID: <Pine.SOL.4.33.0103251518260.14203-100000@panther.cs.ucla.edu>
Thanks for your message. From my stand point, I think we have reached a
consensus about the mail. The main reason for my message was some
discussion about forest which I happened to glance over in the mailing
list - the confusion there was regarding whether forest means child nodes
of the same node, or just a list of nodes, may be with different parents.

To me, hedge is a very common term and it is difficult to get a better
term than this. But I would leave to the research community to decide
which is the best term to use depending on the popularity. I think one not
so bad differentiation of the different types needed for XML is to use the
6-tuple definition for XML Schema, which a few people know, where there
is a clear differentiation between tree types and hedge types. But it
takes time to reach the community, also it might be not needed at all.
Also once we understand forest is a list of trees, then I think anyone can
just accept it and use it. (To a graph theory person say, forest might be
a misnomer, the person might assume that you are trying to define a set
based semantics like XPath.)

The following are just my opinions. please view with criticism.

For the algebra, I might give one suggestion -- I think the usage of XML
for semantic web is where XML is the data model which everyone sees, and
issues queries over. But actually XML might be just a wrapper over several
underlying data models such as relational, object etc. I think XML data
model will subsume all existing data models in expressiveness (the amount
of interpretation that can be attached), and possibly also operations. XML
might not be unsurpassable in future, but I think most people like me are
still trying to understand the problems with XML, and thinking beyond is
difficult.

(I am not at all aware of any analysis of operations for XML data
model than what is presented in the algebra, I recommend that an approach
be also made where a study is done about what are the operations under
which regular tree languages are closed, and examine them with regard to
the operations being defined in the algebra.)

But if XML is to serve as such a rich data model, it is necessary that it
be not overdone also. I think if the algebraic optimization which the
group is working on is the common optimization across all underlying data
models, then do not overdo the algebra, breaking up a path expression is
not to be done. But I think your idea is just one particular underlying
data model -- say the PSV info set, in which case it is fine.

I think you also come into the issue of duplicate nodes in a path
expression, which might be distasteful to logic people (like Wadler)??

Right now, I should say that I am putting more and more effort in the
schema. For W3C members who are familiar with the missions of W3C, I need
not say this, XML is the back bone for the next generation web, and schema
is the backbone for several applications such as semantic web. Also a
universally accepted schema is what is typically desired, though I have no
ideas whether multiple schemas is a bad idea.

It is taking us lot of energy and time to say that given a regular tree
grammar and W3C's XML schema proposal, the better one if the regular tree
grammar. The correctness, to me, appears to be there, I do not know how
much our beliefs have reached researchers, and if there are oppositions
to this.

At least I am glad that TREX has decided to take our similar approach. But
this seemingly not very difficult thing technically is where most of my
energy is going presently, and I do not think I can comment much about
operations in the algebra etc with deep thought. I expected XML Query WG
will be the one which takes up the issue of schema further, but now I am
not very sure actually.

<warning>speaking for himself only</warning>

thanks and regards - murali.

PS: Regarding whether I have technical papers, I should say that I have
none so far.

On Sun, 25 Mar 2001, Paul Cotton wrote:

> This is a response to the following message, which you posted to the XML
> Query Working Group's comments list:
>
> http://lists.w3.org/Archives/Public/www-xml-query-comments/2001Mar/0032.
> html
>
> The XML Query Working Group has approved the following response:
>
> Murali Mani wrote:
> > ==========================
> > In order not to overload the term forest, I would like you to consider
> the
> > term "hedge". Hedge is a sequence of trees -- "a hedge is to a tree
> what a
> > string is to character".
> >
> > Also regarding types, please consider using tree types and hedge types
> --
> > a tree type is a type whose value is a tree (it can be specified as a
> > union of tree types), a hedge type is a type whose value is a hedge
> > (specified as a model group over tree types and hedge types).
>
> Thanks for the suggestion. We choose terms based on study of commonly
> used computer-science terms and based on what we believe will be most
> intuitive to users.
>
> We appreciate your feedback on the XML Query specifications. Please let
> us  know if this response is satisfactory. If not, please respond to
> this message, explaining your concerns.
>
> Paul Cotton
> On behalf of the XML Query Working Group
Received on Sunday, 25 March 2001 19:11:24 UTC