RDF11 Concepts - conflation of syntax and semantics from Gregg Reynolds on 2013-06-14 (public-rdf-comments@w3.org from June 2013)

From: Gregg Reynolds <dev@mobileink.com>
Date: Thu, 13 Jun 2013 22:54:49 -0500
To: public-rdf-comments <public-rdf-comments@w3.org>
Message-ID: <CAO40Mi=3dmonH6qX15R=tkYWk+hURhuoKm3=RA4s=4ghrqLi-g@mail.gmail.com>
With reference to
http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/#section-rdf-graph:

One problem (in my view) with RDF is that the official docs often fail
to distinguish clearly between syntax and semantics and sometimes fail
to make structure explicit.  Here's a stab at addressing what I see as
conceptual confusion; feel free to ignore if it's already been
addressed or is deemed unimportant or practically infeasible.

Section 3.1 of RDF Concepts and Abstract Syntax:

"An RDF triple consists of three components:

the subject, which is an IRI or a blank node
the predicate, which is an IRI
the object, which is an IRI, a literal or a blank node
An RDF triple is conventionally written in the order subject,
predicate, object."

It isn't clear to me if this is supposed to be a semantic or a
syntactic definition.  If this is a semantic definition it has two
problems.  One is that it does not explicitly define the structure of
an RDF triple. It's not enough to say a triple consists of three
components; a set of three elements satisfies that criterion.  The
second problem is that by referring to IRIs and mentioning writing
convention it conflates syntax and semantics.  If it's supposed to be
a syntactic definition then it should start "An RDF triple expression"
and "conventionally" should be dropped.  But since the concept of
blank node is essentially semantic it seems this cannot work as a
definition of syntax.

A better semantic definition might be something like:

"Semantically, an RDF triple is a sequence of three typed graph nodes
<s, p, o>, where s and p are nodes of type Identifier, and o may be of
type Identifier or literal.  By convention s is called the subject, p
the predicate, and o the object of the triple."

To make this work, type Identifier (a semantic value type) must be
defined, e.g. something like "An IRI denotes an identifier".  Compare:
 a sentence denotes a proposition.  Or use "expresses" instead of
"denotes". Note that the assertion on section 1.2, that "[a]ny IRI or
literal denotes some thing in the universe of discourse", is strictly
speaking untrue.  Or rather it's insufficiently detailed and
misleading.  An IRI (bit of syntax) first denotes a "value" (for lack
of a better term; I called it an "Identifier" above); any reference to
"some thing in the universe of discourse" must involve that value as
an intermediary.  Compare any proper noun, e.g. "New York": it cannot
refer directly to New York, it has to go through some kind of
intermediary in the mind of the interpreter.  Or compare the use of a
number to refer to something in the world.  '9' might be said to refer
to the number of planets, but what does the referring is the number 9,
not the figure '9'.  I don't see any rational way to get from IRI
syntax to graph node to real-world reference without making this sort
of distinction.  It's basic semiotics.

Blank nodes need not be distinguished at this point.  The whole blank
node mess is such a hairball I'm not going to touch it now, but for
what it's worth, much of the problem again stems from failure to
distinguish between syntax and semantics.

A better approach might be to follow the example of logicians, who
usually start with syntax, by defining syntactic primitives (alphabet,
words, whatever) and formation rules and leaving semantics for later.
The syntactic primitives of RDF are easily defined - IRIs, literals,
etc.  Formation rules would also be easy to define, but you'd have to
settle on a canonical syntax.  (I note that currently no such syntax
is defined, in spite of the title and many references to "abstract
syntax".  You can't write down abstractions anyway.)

You could avoid commitment to the details of a concrete syntax by
replacing the notion of abstract syntax with the notion of a syntactic
schema.  Then you'd get something like:  "RDF statements have the
schematic form <S P O>, where S, P, and O are (meta) variables
substitutable by IRIs or (in the case of O) literals.  (NOTE: concrete
syntaxes express this schematic form in a variety of ways.)"

Obviously doing something like this would entail some other, possibly
major, changes to the text.  That may not be practically feasible, but
still I think something should be done to address the points I've
raised.

Sincerely,

Gregg Reynolds
Received on Friday, 14 June 2013 03:55:16 UTC