RDF speficiations (was RE: Cutting the Patrician datatype knot)

I think that we are in the midst of a disagreement over what (an
implementation of) RDF is.

My view is that an implementation of RDF, or RDF Schema or RDF plus
datatypes, is supposed to implement the specification of RDF, or RDF Schema
or RDF plus datatypes.  The implementation is free to do this in any
effective way that it chooses, but it is not free to deviate from the
specification either by removal or by addition.

That's it.  Simple, no?  What could be the problem?


Well the problem is that the RDF and RDFS documents are silent on what the
specification of RDF or RDFS is!  This is very surprising, but lets see
what they do say.  The RDF Model and Syntax Specification does define a
formal grammar for RDF and does provide some indication on a mapping from
this formal grammar into RDF graphs.  However, there is no interface
defined for accessing RDF graphs, nor is there any interface for an RDF
implementation to indication whether the input it is given is actually
syntactically-valid RDF.

In the absence of such indications it is permissable---permissable but not
reasonable, by the way---to implement RDF as a sink that accepts any
input and produces no output at all.

Of course any reasonable implementator of RDF goes farther than this and
any reasonable reader of the RDF MSS reads more than this into the
specification.   A typical response is to believe that the RDF MSS also
specifies full access to the RDF graph and an out-of-band indication of
syntactic errors.  Under this reading, an RDF implementation is required to
parse RDF syntax, to construct the RDF graph that corresponds to that
input, and to provide access to the graph for use by applications.


So far so good.  There is now a reasonable specification and a reasonable
thing for implementations to do.


However, now along comes the RDF Core Working Group and they, perhaps
inadvertently, provide a different specification of what an RDF
implementation is supposed to do.

What is this specification?  It is the model theory.  

The model theory provides the meaning for RDF and RDF Schema and, moreover,
it provides an interface to this meaning, via entailment.

Now reasonable implementors and reasonable readers have a different and---
some would claim, myself among them---much better specification.  An RDF
implementation is supposed to accept RDF syntax and answer entailment
questions, nothing more, and nothing less.

There now seems to be an impasse.  There are two very different, competing
specifications for RDF.  What is an implementor supposed to do?

All is *not* lost, provided that the RDF Core Working Group does its job
correctly.  It should turn out that the two specifications are the same,
or, more precisely, that the graph specification and interface is just a
more-concrete description of what is happening in the model theory.  That
is, an implementation that constructs a graph and allows access to this
graph is just providing an alternative interface to the model theory and
entailment.   (Of course, this has not yet been proven.)



Now along comes datatypes, and the whole point of this note.


The datatype model theory is going to end up saying quite a lot about
datatypes.  It will provide a meaning for the datatype constructions,
including which datatype constructions make sense and which don't.  This
will mean that entailment has to take into account the meaning of such
constructions.

For example, the datatype model theory is going to have to answer under
what conditions
	<John> <age> "10".
entails
	<John> <age> "010".

Any RDF implementation that does not produce the answers demanded by the
model theory will not be in compliance with the model theory's
specification of RDF.  (Note that an RDF implementation is free to use any
means to implement this entailment, such as passing all its input through
an XML Schema validator that produces native, canonical representations for
typed literals.)

Now what about the RDF graph specification for RDF?  Well it either has to
comply with the model theory or there will be two differing specifications
for RDF plus datatypes, a very unhappy state of affairs.


So the disagreement here appears to be that I am looking at a model theory,
suitably extended for datatypes, and inferring what RDF has do to based on
this specification.  You appear to be looking at a graph specification that
does not correspond to the model theory.  I claim that your graph
specification, aside from not matching the model theory, is not capturing
a reasonable specification of RDF plus datatypes.  

I further claim that there are ways of extending the graph specification
that put all or almost all datatype syntax issues, including the
lexical-to-value mapping, in a syntax phase that preceeds any RDF-specific
processing.  (Note that this does not work for all datatype
specifications---some specifications need access to a black-box
lexical-to-value mappping at a later phase.)  Even further, I claim that
there is no way to implement a reasonable view of the RDF specification
without some processing of the datatype syntax with an RDF implemention
itself, if only to determine what is syntactically valid.


Given that an RDF plus datatypes implementation will have to process the
datatypes, why not then provide a native interface to the underlying data?
This interface will be much easier for applications than requiring them to
accept pairs consisting of a lexical form and a type.   

There is nothing in the above that requires XML Schema, by the way.  A
datatype extension for RDF that uses another datatype schema could be
devised.  It would also be possible to parameterize the RDF specification
so that any compatible datatype scheme could be used.  It would also be
possible, but somewhat harder, to parameterize a native interface.  It
would be somewhat easier, and I think probably the most reasonable path, to
provide a parameterized interface in terms of a subset of the datatype
schema.  

For example, for XML Schema, the interface could pass a pair like
<integer,10> or even <integer,"10"> instead of <decimal with 0
fractionDigits union string,"010">.  This would be much easier for
applications to handle than requiring them to understand all of XML Schema
constructed datatypes.

Peter F. Patel-Schneider
Bell Labs Research

Received on Monday, 3 December 2001 09:42:37 UTC