Re: Syntax and semantics from W. E. Perry on 2000-05-16 (xml-uri@w3.org from May 2000)

From: W. E. Perry <wperry@fiduciary.com>
Date: Tue, 16 May 2000 16:53:33 -0400
To: Paul Prescod <paul@prescod.net>, xml-uri@w3.org
Message-ID: <3921B54C.3A2B86F8@fiduciary.com>
Paul Prescod wrote:

> A few questions:

Exactly the right questions!

>  * can you define semantics?

In the terms of late 20th century critical theory:  that signified; or more fully, the body
of content which is the product of the signifier's inherent function. In Aristotelian terms:
the substance or stuff of meaning, beyond the mere arrangement (syntattica) of its
expression. The two views are actually not that far apart, particularly if you accept my
opinion that in both of them it is a function, a process (see below on behavior) which
elaborates semantics from syntax.

>  * In what sense does XML not have semantics?

The specification of XML 1.0 stays admirably focussed on delineating the syntax. With few
exceptions (and in my reading most of those are to support 'legacy' features and SGML
compatability--most notably the DTD itself), the spec avoids assigning various of the
syntactically permissible possibilities to the expression of respective meanings.

> Isn't the interpretation
> of less-than symbols and ampersands as an annotated, tree-structured,
> information set the "semantic content" of XML?

As an 'information set' (and most especially as the Infoset) it most certainly is. Nothing in
the XML 1.0 spec mandates a canonical infoset or a tree structure. As I have argued at length
elsewhere, imposing either the infoset or the tree structure view upon XML syntax, as
specified, greatly and gratuitously curtails the expressive and functional possibilities
inherent in the syntax alone.

> Can any useful language,
> or meta-language, or meta-meta-language be entirely devoid of semantics?

As your example above indicates, there will always be some small semantic content in the
choice of, for example, less-than symbols rather than curly braces, or ampersands rather than
hashmarks. Those choices can be limited to the minimum logistic requirements of orthography,
particularly when a syntactical specification is restrained, or silent, on assigning them any
significance beyond the orthographic. (de minimis non curat [spec])

>  * if semantics are entirely local, then does Microsoft have the right
> to interpet the "a" element type in xhtml as meaning "archive" and the
> "b" as meaning "Beethoven"?

Absolutely, if they can implement desirable behavior from a process by so assuming.

> If they write a web browser that archives
> any link you click on and play's music for bold, will you defend them on
> the basis that semantics are local?

No. If I want that behavior I will buy their software or use their process (or to use Tim
Bray's terminology, dispatch to it from my local node).

> I think that behavior is local, but
> semantics absolutely must be shared.

Not shared a priori. That is the single great lesson to be drawn from the Internet topology
of autonomous, largely anonymous nodes which, when they act must treat one another as peers
because they do not know enough about each other to infer any other relationship. Semantics
are effectively negotiated in the instance, and the ability of two nodes to negotiate a
successful transaction, understanding, or other disposition of given content on one occasion
implies NOTHING about their likelihood of reaching a similar conclusion, or any conclusion at
all, with analogous content on a subsequent occasion (this really is a Heraclitan cosmos).
The vertical industry data vocabularies (FpML, ESteel, etc., etc.) which have been the
shining demonstration of XML's acceptance in the past year are predicated on a closed-world
view which is anathema to the real potential of XML as syntax. All of these vocabularies are
designed to convey intent, with the expectation that intent will be correctly interpreted and
result in the execution of a desired process. It is only in a closed world (a cartel, to put
it bluntly) that those expected processes could be assumed to be generally known and
generally considered desirable activities. In the Internet topology we simply do not have
enough knowledge of our fellow nodes to make such assumptions, but we may well have the
desire to do business with or otherwise communicate with them. In order to do so, we could
first attempt to indoctrinate them in the shared assumptions and accepted premises of our own
milieu, but that might not work:  they may prove recalcitrant; it may turn out they are more
influential than we within the larger Internet universe; or they may simply not give us their
attention, or understand what we are trying to communicate. That is where the value of the
Semantic Web becomes apparent. One pair at a time, autonomous nodes can build the semantic
context within which they come to understand one another. I will not elaborate on this
process now; I have done it at plenty length elsewhere. The only point to make now is that
the negotiation must be based on each node handed the other semantically neutral content. If
the receiver can do something useful with the content received, then by definition it has an
interest (expressible through process) in that message; otherwise, it does not and cannot be
compelled to except  (to the extent both sender and receiver are patient and willing) by
serially providing it the pieces from which it can build, in its own context, the ability to
perform some useful process upon particular instance content.

> But behavior and semantics are
> separate. You can read the Catcher in the Rye and start a fund for
> wayward teens or decide to shoot John Lennon.

Precisely! Semantics are the outcome of, or more exactly are expressed in, the behavior
applied to syntactically understood content.

> > In an Internet topology, the effective definition of a
> > process is the form of its execution at a particular occasion on a > 'client-side' ....

> You speak of behavior and semantics as if they are interchangable. I
> don't feel that they are.

No. Semantics are elaborated from syntax through process (behavior).

Respectfully,

Walter Perry
Received on Tuesday, 16 May 2000 16:53:35 UTC