- From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
- Date: Sun, 1 Mar 1998 21:04:44 -0500 (EST)
- To: Paul Prescod <papresco@technologist.com>
- Cc: Dan Connolly <connolly@w3.org>, fork@xent.ics.uci.edu, www-html@w3.org
In message "Re: Composing language descriptions: tree automata and language design?",
Paul Prescod wrote...
>
> And I also happen to think that tree automata is really important. But I
> must admit that I don't see a big connection there yet.
As for the namespace issue, forest automatons are not directly related. Forest
regular expressions, which are equivalent to forest automatons, are much more
closely related. In particular, horizontal concatenation is
immediately applicable to a schema language for XML.
> > Am I the only person who's been studying language design for
> > the last few years who hasn't studied tree autonoma?
>
> No, I first heard of it at SGML/XML 97. My interest now is in arbitrary
> transformations of DTDs and instances as is demonstrated in the paper. I
> hope Murata-san can provice examples of more advanced transformations.
> The paper only has renaming-in-context, which is not too difficult
> without tree automata.
In the case of n-ary tree automatons, we can use any tree homomorphism.
Not only renaming but a variety of rewriting (node-deletion, subtree-deletion,
node-insertion, subtree-insertion, reordering, etc.) is possible. For
the definition of linear tree homomorphisms, see the text book by Gecsec
and Steinby. (But this book is not easy to get, I am afraid.)
@book{gs:Automata,
author = "F. G\'{e}cseg and M. Steinby",
title = "Tree Automata",
publisher = "Akad\'{e}miai Kiad\'{o}, Budapest",
year= "1984",}
In the case of forest automatons, there is a subtle issue: we do not know
how many subordinates a node has and how these subordinates are related in
advance. For example, given a sequence <sec> <h1/> <p/> <h1/> <p/> <p/> </sec>,
I would like to create <sec> <subsec><head/><p/> </subsec> <subsec> <head/>
<p/> </p> </subsec> </sec>. My solution is to use an unambiguous regular
expression (e.g., (h1 p*)* ) to capture underlying structures for such sequences.
My upcoming paper for PODDP(Principles of Digital Document Processing) '98
provides an algebra and a logic-based language that are extensions of
the RDB and datalog. This paper also sketches an operator for such rewriting.
I should disclose this paper to the public soon.
In message "Composing language descriptions: tree automata and language design?",
Dan Connolly wrote...
> I'm starting to get a "where have you been all my life?" feeling.
When I said "The use of extended CFG's as document schemas is a bad idea"
in my talk at PODDP'96, everybody laughed. After explaining shortcomings
of the traditional approach and advantages of the forest automaton theory,
I repeated the same claim. This time, everybody was silent and serious.
The forest regular language theory was studied in 60's and 70's (away too
early!). Unfortunately, it has not been used for real applications, although
it is very powerful. Kil-Ho Shin (one of my colleagues at Fuji Xerox)
rediscovered the forest regular language theory when he was studying structured
documents. Since then, I have firmly believed that the forest automaton
theory deserves much more attention from document processing communities.
I am happy to learn that more and more people are interested.
Cheers,
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp
Received on Monday, 2 March 1998 03:12:22 UTC