Composing language descriptions: tree automata and language design?

Dan Connolly (connolly@w3.org)
Fri, 27 Feb 1998 22:31:49 -0600


Message-ID: <34F79335.111A@w3.org>
Date: Fri, 27 Feb 1998 22:31:49 -0600
From: Dan Connolly <connolly@w3.org>
To: fork@xent.ics.uci.edu, www-html@w3.org
CC: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
Subject: Composing language descriptions: tree automata and language design?

This is a multi-part message in MIME format.

--------------36F969537A95
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Ever since all this XML namespace stuff started, I have
been saying

	Context-Free Grammars are Very Well Studied,
	and basing standards like XML on them is a safe bet.
	(Similarly for RDF based on AI frame/slot/value)

	But you can't just take a CFG from Joe and another
	from Bob and slap them together. At least I haven't
	seen any well-studied way to do it.

Then Murata Makoto sent a pointer to this little paper:

% Forest-regular Languages and Tree-regular Languages
% MURATA Makoto
% May 26, 1995
% http://www.geocities.com/ResearchTriangle/Lab/6259/prelim1.pdf

He also posted a small example, but since it went to xml-wg,
I won't forward it. A larger example is available in:

http://www.fxis.co.jp/DMS/sgml/xml/sgmlxml97.html

I'm starting to get a "where have you been all my life?" feeling.

Am I the only person who's been studying language design for
the last few years who hasn't studied tree autonoma?
I surfed around in comp.lang.* last night just to sanity-check.
Nope. No sign of it.

It looks really cool. It looks like a good model for
adding the equivalent of Modula-3 generics[3] or maybe even ML
modules (no citation--I haven't studied these closely) to SGML.
But I'm not sure. I don't quite have my head wrapped around
it yet. I could use some help.

[3] http://www.research.digital.com/SRC/m3defn/html/generics.html
(but to do it justice, you need to read chapter 8,
"How the Language Got its Spots" from
System Programming with Modula-3 Edited by Greg Nelson Prentice Hall
Series in Innovative Technology ISBN 0-13-590464-1 L.C. QA76.66.S87 1991 

I can't find a copy on the web, but it's widely cited; e.g.
http://cs.fit.edu/~ryan/study/bibliography.html)

The classicist in me had just about given up hope of finding
a good model for composing DTDs before I saw this. I thought
we were going to have to just swim around in loosely defined
XML-based languages (where the instances were well-formed
but couldn't be machine-checked beyond that) for some time.

Rohit... have I piqued your interest? Can you spare
the time to study this forest autonoma stuff to see
if it fits into the ecology of knowledge representation
in the Web?

Anybody else?

-- 
Dan Connolly
http://www.w3.org/People/Connolly/

--------------36F969537A95
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Received: from www10.w3.org (www10.w3.org [18.23.0.20]) by anansi.w3.org (8.8.7/8.7.3) with ESMTP id DAA01947 for <connolly@anansi.w3.org>; Fri, 27 Feb 1998 03:22:14 -0500 (EST)
Received: from mail10.jump.net (mail10.jump.net [207.8.124.18]) by www10.w3.org (8.8.5/8.7.3) with ESMTP id DAA07646 for <connolly@w3.org>; Fri, 27 Feb 1998 03:22:14 -0500 (EST)
Received: from shoal by mail10.jump.net (8.8.8/jump.1.11)
	 id CAA29574;  for <connolly@w3.org> Fri, 27 Feb 1998 02:24:00 -0600 (CST)
Message-ID: <34F677DA.9AB@w3.org>
Date: Fri, 27 Feb 1998 02:22:50 -0600
From: Dan Connolly <connolly@w3.org>
Organization: World Wide Web Consortium (http://www.w3.org/)
X-Mailer: Mozilla 3.01Gold (WinNT; I)
MIME-Version: 1.0
Newsgroups: comp.specification.larch
CC: connolly@w3.org
Subject: ZF Replacement schema in larch?
Content-Type: multipart/mixed; boundary="------------1DB9750A2299"

This is a multi-part message in MIME format.

--------------1DB9750A2299
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

I'm having trouble tranlating the idiom
	{ x: P(x) }
into larch. Any clues? I checked the larch FAQ
http://www.cs.iastate.edu/~leavens/larch-faq.html#SEC39
with no luck.

The actual example I'm after is more like:

  Dom(u) = {1} \U {1.v1.v2...vk : v1.v2...vk \in Dom(v)}


This seems to provide a clue:

===============
http://www.realtime-info.be/encyc/techno/terms/26/102.htm

		the
                     Replacement axiom
                     schema: 

                     If F(x,y) is a formula
                     such that for any x,
                     there is a unique y
                     making F true, and X is
                     a set, then 

                             {F x : x in X}

                     is a set. In other words,
                     if you do something to
                     each element of a set,
                     the result is a set. 
===============

But I can't figure out how to use that clue.

I might be avoiding the use of \E for no good reason.
Maybe that solves the problem.

I'm trying to absorb a paper about Forest autonoma
http://www.geocities.com/ResearchTriangle/Lab/6259/prelim1.pdf
by transcribing it into larch. I made it through
the basic definitions, but I'm having trouble with
some of the more complex ones. I've attached
my work so far. Any clues?

Also... is there any hope of running the larch prover
on a linux box?

-- 
Dan Connolly
http://www.w3.org/People/Connolly/

--------------1DB9750A2299
Content-Type: text/plain; charset=us-ascii; name="forest.lsl"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="forest.lsl"

forest(E, C): trait
%
% Transcribed by Dan Connolly <connolly@w3.org> from:
% Forest-regular Languages and Tree-regular Languages
% MURATA Makoto
% May 26, 1995
% http://www.geocities.com/ResearchTriangle/Lab/6259/prelim1.pdf
%
% $Id$

includes
  Integer,
  List(Int, IntList),
  Set(IntList, IntListSet)

introduces

  empty: -> C
  __{__} : E, C -> C
  __ || __ : C, C -> C

  tree: C -> Bool

  width : C -> Int

  Dom: C -> IntListSet
  Domaux: C -> IntListSet
  Domaux2: C, C -> IntListSet

  forest: C, IntList -> E

  __ / __ : C, IntList -> C

asserts
  C generated by empty, __{__}, || % 2.1 Forest

  forall a: E, u,v,w: C, i, w1, u1: Int, v1k, w2k, u1k, u2k, d: IntList

% 2.2 tree
  tree(a{u}) == true; % 2.2. Tree
  tree(empty) == false;
  tree(u||v) == false;

% 2.3 forest width
  width(empty) == 0;
  width(a{u}) == 1;
  width(u||v) == width(u) + width(v);

% 2.4 forest domain
  Dom(empty) = {};

  Dom(a{v}) == {{1}} \U Domaux(a{v});
  (1 \precat v1k) \in Domaux(a{v}) == v1k \in Dom(v);
  % @@ and nothing else is in Domaux(a{v})

  Dom(v || w) == Dom(v) \U Domaux2(v,w);
  ((w1 + width(v)) \precat w2k) \in Domaux2(v,w) ==  w2k \in Dom(w);
  % @@ and nothing else

% 2.6 forest function
  forest(a{v}, {1}) == a;
  forest(a{v}, 1 \precat v1k) == forest(v, v1k);
  u1k \in Dom(v) => forest(v || w, u1k) = forest(v, u1k);
  (\not ((u1 \precat u2k) \in Dom(v))) =>
	 forest(v || w, u1 \precat u2k) =
		forest(w, (u1 + width(v)) \precat u2k)

% 2.8 subtree
%  Dom(u/d) == { 1 \precat v1k
--------------1DB9750A2299--

--------------36F969537A95--