RE: Surface vs. Abstract Syntax, was: RE: What do the ontologists want from Jonathan Borden on 2001-05-19 (www-rdf-logic@w3.org from May 2001)

From: Jonathan Borden <jborden@mediaone.net>
Date: Sat, 19 May 2001 08:04:23 -0400
To: "pat hayes" <phayes@ai.uwf.edu>
Cc: <www-rdf-logic@w3.org>
Message-ID: <001401c0e05b$d82f5400$0201a8c0@ne.mediaone.net>
> >compaining about XML's verbosity is directly along the lines of
> complaining
> >that LISP uses too many paren's.
>

pat hayes wrote:

> I disagree. The point is that parens are informationally quite dense.
> The only way of indicating applicative structure in fewer symbols
> would be some form of Polish notation, and that can only be used for
> fixed-arity (usually binary) operators.

Minimizing tokens was apparently not the most significant critereon for
development of the XML syntax. Interestingly the forerunner of XML, SGML
provided facilities that would, for example, allow the s-expression syntax
to be declared as a valid SGML application, and i recall seeing just that at
some point.

All these issues were argued out and in the end a compromise was reached.
Like all compromises, not everyone is happy with each feature. What _has_
happened is that a large number of people have found XML easy to work with
and adequate for their particular needs, even if not the most efficient
representation for their particular application. I can assure you that these
issues have been beaten to an infinite set of deaths on the SGML/XML lists.

I can't argue with any of these points except to say: XML parsers have been
spread like viruses to all reaches of the digital world. XML applications
can run on platforms ranging from cell phones to mainframes. Perhaps if the
LISP community could have resisted the temptation to fragment, and had
provided ubiquitous open source software, we might be using parens rather
than angle brackets, but that is a different battle to fight. I am more
concerned with developing a semantic "infrastructure" because I believe
there are some fundamental issues that need to be addressed*

* of course if you insist on using parens, I can provide a piece of software
(or tell you how to write one) that will parse s-expressions as XML.

>
> >perhaps the greatest benefit of XML is that its surface syntax directly
> >represents its abstract syntax,
>
> So does LISP. In fact, so do almost all mathematical and formal notations.
>
> > and for someone familiar with XML, this
> >means that one can look at a document, even in the absense of a
> schema, and
> >get a pretty good idea of its structure.
>
> This is true of any explicit syntax. You can look at a page of
> mathematics and do that, even if you don't know the math very well.
>
> I think that what makes XML so longwinded is not that its surface
> syntax *represents* its abstract syntax, but that it explicitly
> *describes* it, which is like writing English by prefixing (and
> postfixing!) every word and phrase by a label describing its
> syntactic category.

this is a good analogy. suppose we could transport ourselves 1 million years
into the future (and suppose that the world still uses Unicode). If you were
assigned the task of translating documents written in some completely
unknown language into english, i dare say that you would _greatly_
appreciate if each word were marked up in such fashion.

a critereon for XML was not succinctness. a critereon was the ability to
archive information for very long term use.

> This seems to me to be based on a
> misunderstanding of the very nature of syntax.

Nah, the point of 'well formedness' as opposed to 'schema valid' is that
documents are better stand alone and you don't need to continueally refer to
a schema which can get misplaced. you can parse math because of the schema
built into your head.

suppose this very practical real world example:

"
Dear Dr. Smith;

I have had the great pleasure of seeing your wonderful 42 year old patient
Rev. John Roberts III, who presents with back pain radiating down the right
leg for 2 weeks. He has weakness in the left gastrocnemious. An MRI
demonstrates a Left L5-S1 disc herniation. I recommend surgery.

Best Regards,

Jonathan Borden, M.D.
"

Marked up, it contains new tokens and no new information to me:

<office.note>
Dear <referring.md><person.name><title>Dr.</title>
<family.name>Smith</family.name></referring.md>;

I have had the great pleasure of seeing your wonderful <patient.age>42
year<patient.age> old patient <patient><person.name><prefix>Rev.</prefix>
<given>John</given> <family>Roberts</family>
<suffix>III</suffix></person.name></patient>, who presents with
<chief.complaint>back pain radiating down the <laterality>right</laterality>
leg for <duration>2 weeks</duration></chief.complaint>. He has
<physical.exam>weakness in the left gastrocnemious</physical.exam>. An MRI
demonstrates a Left L5-S1 disc herniation. I recommend
<procedure><coded.value type="cpt"
code="63030">surgery</coded.value></procedure>

Best Regards,

<attending.surgeon><person.name>Jonathan Borden,
M.D.</person.name><attending.surgeon>
</office.note>


The point is that successive applications on a processing chain can add
information and apply transformations akin to "knowledge sources".

Languages (almost all
> of them, natural and artificial) work by *displaying* their syntactic
> structure, not by *describing* it.

The above XML still displays on a browser as text, despite being heavily
marked up.

If you do both, you pretty much
> guarantee to be using more symbols than you need to be using to
> convey the same information.

yep. but these symbols are meaningful to applications that process the
information.

I've never seen any XML that didn't seem
> obviously wildly redundant with useless information, repeated over
> and over again. Its almost impossible to write the stuff: one has to
> invent editor shorthands to avoid going crazy.

called the browser.

 XML naturally represents trees and somewhat naturally handles
> >maps.
>
> OK, I agree that such savings are very handy when the work has
> already been done.


that's really the entire point, we don't need to constantly reinvent
browsers, transform engines (e.g. XSLT), database glue, query languages etc.
etc.

what i am suggesting is that this concept could be
> >reflected in the RDF abstract syntax -- a trivial way to do this
> might be s
> >:= <p,s,o,i> where i is the index of the statement as reflected
> by document
> >order.
>
> I wonder what the RDF-ish reason for not doing this will be? I could
> guess, but it would probably not be appropriate.
>


IMHO RDF is really too young to consider itself a legacy. Especially taking
on a task as monumental as becoming a "semantic platform" for the "semantic
web". RDF needs to adapt, and adapt well.

Jonathan Borden
The Open Healthcare Group
http://www.openhealth.org
Received on Saturday, 19 May 2001 08:05:15 UTC