Re: Syntax and semantics from W. E. Perry on 2000-05-17 (xml-uri@w3.org from May 2000)

From: W. E. Perry <wperry@fiduciary.com>
Date: Wed, 17 May 2000 11:10:10 -0400
To: Tim Berners-Lee <timbl@w3.org>, xml-uri@w3.org, xml-dev@xml.org
Message-ID: <3922B651.437EDE6B@fiduciary.com>
May I respectfully disagree with these conclusions, but in so doing hope to
remind the Director of some of his own better ideas:

Tim Berners-Lee wrote:

> In a distributed system, the semantics must be carried by the message.

No. Content is carried by the message, expressed in the agreed syntax. The only
case in which a message may be said to carry semantics is where there is
agreement (if only implicitly) beforehand that particular syntactic constructs
shall be treated as conveying particular (agreed and fixed) semantics. This
abuse of agreement over syntax is what I refer to as 'intent'. Intent is
predicative, not nominative, and thereby violates the expected neutrality, or
disinterest of XML markup in function. The purpose of conveying intent, after
all, is to shape the receiving node's local interpretation of a message to a
predictable semantic outcome. My objection to SOAP, for example, is that it is
premised on conveying precisely this sort of intent.

Permitting particular syntactic constructs to be hijacked to convey specific
semantics not only disregards the inherent anonymity and autonomy of the
receiving node, it greatly curtails the purely syntactic possibilities, and by
implication the extensibility itself, of XML. The eXtensible Markup Language is
expected to be extensible through markup, not through pre-ordained assignment
of syntactic constructs to semantic outcomes. In a world of such pre-ordained
vocabularies (abundant examples available in vertical industry markup
languages, not to mention SOAP) XML is beggared in being permitted a few
syntactic constructs of defined semantic intent, while the infinite remaining
possibilities are reduced to NOP's.

Realize that the expected or intended semantic resolution of defined syntactic
constructs is a particularly pernicious form of presentation insinuated into
what should be ontological markup.

> True, in most cases today semantics are best defined by what program slurps
> it up to the right effect.

Yes. A quibble:  semantics, thus understood, are not strictly speaking
'defined', but are realized through the operation of a process.

> Hence "a quicken input file" defines the semantics of a bank statement file.

No. A quicken input (i.e., data) file exhibits a syntactic form expected by the
quicken executable and when processed by that executable yields (the quicken
program's understanding of) the semantics of a bank statement (the 'file' as a
concrete expression of those semantics is functionally otiose).

>  However, on the internet, the semantics of messages are defined in the
> specifications of the languages.

No. The implementation of particular language (natural or otherwise) processing
at a particular Internet node algorithmically determines the semantic outcome
of particular syntactic input against those processes.

> They are not arbitrary.

Arbitrariness is not a useful description nor an easily measured characteristic
of the semantic understanding local to a particular node. Those semantics are
certainly idiosyncratic; they may be entirely private; they may or may not be
useful, to that node or any other, depending on the availability of further
processing capable of dealing with them in their locally-elaborated and
locally-expressed form.

>  The message conveys a meaning between two agents operating on behalf of two
> social entities.

No. A message is not, nor does it convey, meaning. Meaning is the product of
each interpretation of the message content. I must insist on this point:  you
cannot simply wish it away. It is why we must write processing code to perform
that interpretation. Let us be very clear about what we are trying to do. Does
anyone seriously believe that functional code is, or will soon become,
unnecessary for processing the instance data of each XML document or message?
If not, how can so many apparently accept that we will somehow come to wield
markup so well that it will of itself provide in every conceivable instance
both the semantic outcome which the sender intends as well as that which is
specific to the unique and private capabilities and environment of the
receiver? That result is possible only if we curtail the syntactic
possibilities to those few to which we have assigned or mapped specific
semantic outcomes beforehand.

> The semantics of HTML tags are not defined in a mathematical way but the
> semantics of a bank transfer are.

The semantics of HTML 'tags' are realized in each instance through local
processing by the browser on the receiving node. That processing is
algorithmic, which presumably qualifies as 'in a mathematical way'. Whether my
reading of XML prevails in the end, or whether the premises of SOAP and the
vertical industry data vocabularies do, there will also be algorithmic
processing performed locally on each receiving node against each XML instance.
The question is whether that process simply elaborates each syntactic structure
encountered into an instance of its pre-assigned semantics, or whether that
processing is truly local to the capabilities of the receiving node, and its
unique understanding of the instance data, as well as to the unique content
supplied in the instance.

> In the future, we will be able to define the semantics of a new language by
> relating it to things like quicken input files,
> and also by specifying mathematical properties of the protocols - such as the
> relationship between a check and a bank statement.  In the meantime, we still
> use English in specifications.  But the crucial thing is to recognize that
> the namespace identifier identifies the language of the message and so
> indirectly its meaning. The namespace identifier has to be the hook onto
> which I can hang semantic information.  I don't see any other philosophical
> basis for XML messages having any meaning.  I don't see how any alternative
> world would work, how you would prove that anyone owes you money or that the
> weather in Amsterdam is rainy.

This is the crux. There is no single relationship between a check and a bank
statement (I have been writing code to process both since 1982). There are only
relationships of an instance of one to an instance of the other, within the
understanding of the entity which processes that relationship at a given
moment. Each step of any such process must directly model the role--the
function--of whoever or whatever is doing the processing. As the account
holder, you see a different relationship between instance of check and
statement than the debit-processing procedure at your bank does; than the
clearing house does; or than the loan officer considering your transactional
history with the bank does. None of these viewpoints is canonical:  they are
all instance interpretations of the instance correspondence between instances
of checks and instances of statements. From these instance relationships can we
infer a class, to define and code appropriate processing in each case? Yes, but
we must predicate that processing on the role, the capabilities and the
viewpoint of the node which is to perform it. If you are not a checking debit
processing routine, what business do you have telling a node which is one how
to perform its own unique job--expressing, that is, a processing intent to that
node? And if you are a checking debit processing routine, why are you handing
off this work instead of doing it yourself? This is the essence of a
distributed system. It can be harnessed into a pipeline of processing, but only
by allowing each node to perform its own particular task in its own unique way.
At each node the outcome of each such task reflects the semantic understanding
of that particular node--what other understanding could it reflect? There will
be other nodes--some known to that node and some not--which will want to take
the output of that node's process and perform further processing upon
it--attach themselves, that is, to a pipeline of processing at that point. That
pipeline of process is forked by the action of new nodes attaching themselves
and consuming completed work. It cannot be the responsibility of the node whose
task is completed to restate and present that work product in the form each of
those nodes would find best suited for its own unique processes. In the first
place, the prior node may not know which nodes are consuming its product, nor
for what purpose, In the second place, the prior node cannot--because it is not
uniquely specialized in the tasks of the later nodes--know which of its output
they might require or in what form. Generally, those later nodes will require
other data to be combined into their processes, from source which the prior
know likely knows nothing of.

The point is that this pipeline of process *is* the Semantic Web. The semantics
are local and presumed unique to each node. The web is the exchange of message
in a syntactically agreed form. Extensibility is unlimited and achieved through
new constructions of the accepted syntax. Processing is local, where the
specific unique expertise for the locally unique understanding of the problem
is to be found.

Respectfully,

Walter Perry
Received on Wednesday, 17 May 2000 11:10:12 UTC