Re: DTD of natural language

	From ricko@gate.sinica.edu.tw Wed Nov  8 23:22:10 2000
	Date: Wed, 8 Nov 2000 18:21:54 +0800 (CST)
	From: Rick Jelliffe
	X-Sender: ricko@gate
	To: "Richard A. O'Keefe" <ok@atlas.otago.ac.nz>
	cc: html-tidy@w3.org, johnspeter@hotmail.com
	Subject: Re: DTD of natural language
	In-Reply-To: <200011080311.QAA30870@atlas.otago.ac.nz>
	MIME-Version: 1.0
	Content-Type: TEXT/PLAIN; charset=US-ASCII
	
I wrote:
	
	> SGML and XML just aren't *useful* for expressing the structure of English
	> or any other natural language.  Lisp data values, yes.  Prolog data values,
	> yes.  Either would be considerably more compact than XML.  You might raise
	> RDF as a counter-example, but RDF is inexpressibly clumsy and bulky, and
	> the structural constraints could not be expressed as a DTD.
	
Rick Jelliffe  <ricko@gate.sinica.edu.tw> replied

	You mean SGML and XML DTDs:  you can express any kind of
	structure (that can be captured as directed, cyclic graph) in
	SGML and DTD.  A grammar abstracts away certain aspects to
	reduce the amount of markup needed.

I expressely wrote "could not be expressed as a DTD".  That was, after
all, what this thread is *about*:  a DTD or DTDs for English.

Yes, you can express any digraph in SGML.  For that matter, you can
express any digraph in Lisp (which is far more compact) and people do
exactly that.  If you get right down to it, you can express any digraph
as an integer, using some kind of Goedel coding.

The question is whether one can encode the constraints in a DTD (or
Schema or whatever) so that *only* valid digraphs in some set will be
allowed by that schema.

The practical point is that using SGML or XML to express the structure
of English would have no practical advantages over expressing it as
Lisp or Prolog data values, and many disadvantages.

	People involved in this area may be interested in exploring the
	Schematron toolkit, which is a schema language for XML based on
	making assertions about the presence or absense of patterns
	(guarded XPaths) in an XML document.  This allows
	non-regular-expression-based patterns in a document to be
	reported or required.

	http://www.ascc.net/xml/resource/schematron/schematron.html
	
This sounds like a very useful reference.  Thank you very much.

Received on Wednesday, 8 November 2000 19:25:55 UTC