RE: Surface vs. Abstract Syntax, was: RE: What do the ontologists want from pat hayes on 2001-05-19 (www-rdf-logic@w3.org from May 2001)

From: pat hayes <phayes@ai.uwf.edu>
Date: Sat, 19 May 2001 00:03:33 -0500
To: "Jonathan Borden" <jborden@mediaone.net>
Cc: www-rdf-logic@w3.org
Message-Id: <v0421013ab72ba780526c@[205.160.76.183]>
>pat, earlier you wrote:
>
> >
> > >2.  RDF is not necessarily verbose: the RDF syntax in the current W3C
> > >    spec is verbose, but other RDF syntaxes are much less so (eg n3,
> > >    as Jos de Roo pointed out).
> >
> > I agree: the verbosity arises chiefly from XML rather than RDF
> > itself. Has anyone suggested Lexical_XML?
> > <capletter>I</capletter><letter>t</letter><space>
> > </space><letter>l</letter><double-letter>oo</double-letter><letter>k</
> > letter><letter>s</letter> like that. Its a really neat universal
> > notation: you can describe it in itself!  (The proof is too long to
> > fit in this message, however.)
> >
>
>compaining about XML's verbosity is directly along the lines of complaining
>that LISP uses too many paren's.

I disagree. The point is that parens are informationally quite dense. 
The only way of indicating applicative structure in fewer symbols 
would be some form of Polish notation, and that can only be used for 
fixed-arity (usually binary) operators. (Some LISPs use a special 
abbreviation for 'as many close-parens as you need here to get back 
to the top level' (usually a close-square-paren) , which improves the 
density somewhat. XML might consider adopting a similar 'big slash'; 
you could write it as <</*>> .) On the other hand, the 
<label>...</label> format of XML is almost comically redundant: since 
the entire expression is nested, the slash-labellings are not needed. 
Also, why does one need to bracket the *label* ?? (And who chose 
characters to do so that already have widespread use in mathematics? 
Sigh.) A notation like (label/  ....) would indicate the same 
structure using half the number of characters, more readably.

>perhaps the greatest benefit of XML is that its surface syntax directly
>represents its abstract syntax,

So does LISP. In fact, so do almost all mathematical and formal notations.

> and for someone familiar with XML, this
>means that one can look at a document, even in the absense of a schema, and
>get a pretty good idea of its structure.

This is true of any explicit syntax. You can look at a page of 
mathematics and do that, even if you don't know the math very well.

I think that what makes XML so longwinded is not that its surface 
syntax *represents* its abstract syntax, but that it explicitly 
*describes* it, which is like writing English by prefixing (and 
postfixing!) every word and phrase by a label describing its 
syntactic category. This seems to me to be based on a 
misunderstanding of the very nature of syntax. Languages (almost all 
of them, natural and artificial) work by *displaying* their syntactic 
structure, not by *describing* it. If you do both, you pretty much 
guarantee to be using more symbols than you need to be using to 
convey the same information. I've never seen any XML that didn't seem 
obviously wildly redundant with useless information, repeated over 
and over again. Its almost impossible to write the stuff: one has to 
invent editor shorthands to avoid going crazy.

> > >Another would
> > be to bite
> > >the bullet and make containment and ordering a natural feature of the RDF
> > >abstract syntax. The current container mechanism _is_ painful.
> >
> > But this goes beyond just adding containers. LISP lists are used to
> > implement everything: expressions, in particular, are encoded as
> > lists. That is why I prefer to say "S-expressions", to emphasise that
> > this is a general-purpose datastructuring technique, not merely one
> > kind of datatype among many. (It may be that I have misunderstood
> > you, and that you mean 'containment' in a much more comprehensive
> > sense than simply an additional datatype; in which case I would agree
> > with you, I think.)
>
>A couple of points. First the RDF XML syntax does not make full use of XML's
>abilities to represent structured data. Every XML element node already _has_
>a list of child nodes (e.g. the DOM NodeList). However the RDF abstract
>syntax does not naturally maintain order so the syntactic hack is introduced
>converting <rdf:li> into <rdf:_1> <rdf:_2> etc. This is _painful_ to my XML
>accustomed eye.

It's painful even to to my non-XML-accustomed eye.

>Second: When I did alot of LISP programming, I recall it often took a bit of
>work to create complex datatypes out of LISP. If you recall of Dan Corkill
>and John Lowrance's GRASPER language (UMass c. 1981), one can indeed
>represent nodes, arcs, spaces etc in LISP - on the other hand adding trees
>and maps as native datatypes _tremendously_ increases the speed of such a
>language. XML naturally represents trees and somewhat naturally handles
>maps.

OK, I agree that such savings are very handy when the work has 
already been done.

> > This proposal for thinking about RDF would make
> > the RDF into an implementation language in which to write expressions
> > in some other language (which would have a semantics), rather than
> > just adding a handy datatype to RDF, in the way that DAML adds lists
> > as a shorthand for nested-triple structures of a certain kind (which
> > would be too painful to spell out in detail).
> >
>
>again, the data structures naturally represented in an RDF abstract syntax
>are orthogonal to what semantics are assigned to the expressions. put
>another way, whether we code these using s-expressions or XML is not
>relevent. but _having_ XML already provides an entirely natural way to
>represent lists. what i am suggesting is that this concept could be
>reflected in the RDF abstract syntax -- a trivial way to do this might be s
>:= <p,s,o,i> where i is the index of the statement as reflected by document
>order.

I wonder what the RDF-ish reason for not doing this will be? I could 
guess, but it would probably not be appropriate.

Pat

---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Saturday, 19 May 2001 01:03:54 UTC