- From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
- Date: Sun, 1 Mar 1998 21:04:44 -0500 (EST)
- To: Paul Prescod <papresco@technologist.com>
- Cc: Dan Connolly <connolly@w3.org>, fork@xent.ics.uci.edu, www-html@w3.org
In message "Re: Composing language descriptions: tree automata and language design?", Paul Prescod wrote... > > And I also happen to think that tree automata is really important. But I > must admit that I don't see a big connection there yet. As for the namespace issue, forest automatons are not directly related. Forest regular expressions, which are equivalent to forest automatons, are much more closely related. In particular, horizontal concatenation is immediately applicable to a schema language for XML. > > Am I the only person who's been studying language design for > > the last few years who hasn't studied tree autonoma? > > No, I first heard of it at SGML/XML 97. My interest now is in arbitrary > transformations of DTDs and instances as is demonstrated in the paper. I > hope Murata-san can provice examples of more advanced transformations. > The paper only has renaming-in-context, which is not too difficult > without tree automata. In the case of n-ary tree automatons, we can use any tree homomorphism. Not only renaming but a variety of rewriting (node-deletion, subtree-deletion, node-insertion, subtree-insertion, reordering, etc.) is possible. For the definition of linear tree homomorphisms, see the text book by Gecsec and Steinby. (But this book is not easy to get, I am afraid.) @book{gs:Automata, author = "F. G\'{e}cseg and M. Steinby", title = "Tree Automata", publisher = "Akad\'{e}miai Kiad\'{o}, Budapest", year= "1984",} In the case of forest automatons, there is a subtle issue: we do not know how many subordinates a node has and how these subordinates are related in advance. For example, given a sequence <sec> <h1/> <p/> <h1/> <p/> <p/> </sec>, I would like to create <sec> <subsec><head/><p/> </subsec> <subsec> <head/> <p/> </p> </subsec> </sec>. My solution is to use an unambiguous regular expression (e.g., (h1 p*)* ) to capture underlying structures for such sequences. My upcoming paper for PODDP(Principles of Digital Document Processing) '98 provides an algebra and a logic-based language that are extensions of the RDB and datalog. This paper also sketches an operator for such rewriting. I should disclose this paper to the public soon. In message "Composing language descriptions: tree automata and language design?", Dan Connolly wrote... > I'm starting to get a "where have you been all my life?" feeling. When I said "The use of extended CFG's as document schemas is a bad idea" in my talk at PODDP'96, everybody laughed. After explaining shortcomings of the traditional approach and advantages of the forest automaton theory, I repeated the same claim. This time, everybody was silent and serious. The forest regular language theory was studied in 60's and 70's (away too early!). Unfortunately, it has not been used for real applications, although it is very powerful. Kil-Ho Shin (one of my colleagues at Fuji Xerox) rediscovered the forest regular language theory when he was studying structured documents. Since then, I have firmly believed that the forest automaton theory deserves much more attention from document processing communities. I am happy to learn that more and more people are interested. Cheers, Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp
Received on Monday, 2 March 1998 03:12:22 UTC