- From: Richard A. O'Keefe <ok@atlas.otago.ac.nz>
- Date: Thu, 9 Nov 2000 13:25:40 +1300 (NZDT)
- To: ok@atlas.otago.ac.nz, ricko@gate.sinica.edu.tw
- Cc: html-tidy@w3.org, johnspeter@hotmail.com
From ricko@gate.sinica.edu.tw Wed Nov 8 23:22:10 2000 Date: Wed, 8 Nov 2000 18:21:54 +0800 (CST) From: Rick Jelliffe X-Sender: ricko@gate To: "Richard A. O'Keefe" <ok@atlas.otago.ac.nz> cc: html-tidy@w3.org, johnspeter@hotmail.com Subject: Re: DTD of natural language In-Reply-To: <200011080311.QAA30870@atlas.otago.ac.nz> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII I wrote: > SGML and XML just aren't *useful* for expressing the structure of English > or any other natural language. Lisp data values, yes. Prolog data values, > yes. Either would be considerably more compact than XML. You might raise > RDF as a counter-example, but RDF is inexpressibly clumsy and bulky, and > the structural constraints could not be expressed as a DTD. Rick Jelliffe <ricko@gate.sinica.edu.tw> replied You mean SGML and XML DTDs: you can express any kind of structure (that can be captured as directed, cyclic graph) in SGML and DTD. A grammar abstracts away certain aspects to reduce the amount of markup needed. I expressely wrote "could not be expressed as a DTD". That was, after all, what this thread is *about*: a DTD or DTDs for English. Yes, you can express any digraph in SGML. For that matter, you can express any digraph in Lisp (which is far more compact) and people do exactly that. If you get right down to it, you can express any digraph as an integer, using some kind of Goedel coding. The question is whether one can encode the constraints in a DTD (or Schema or whatever) so that *only* valid digraphs in some set will be allowed by that schema. The practical point is that using SGML or XML to express the structure of English would have no practical advantages over expressing it as Lisp or Prolog data values, and many disadvantages. People involved in this area may be interested in exploring the Schematron toolkit, which is a schema language for XML based on making assertions about the presence or absense of patterns (guarded XPaths) in an XML document. This allows non-regular-expression-based patterns in a document to be reported or required. http://www.ascc.net/xml/resource/schematron/schematron.html This sounds like a very useful reference. Thank you very much.
Received on Wednesday, 8 November 2000 19:25:55 UTC