Re: RDF speficiations (was RE: Cutting the Patrician datatype knot)

From: "Jonathan Borden" <jborden@mediaone.net>
Subject: Re: RDF speficiations (was RE: Cutting the Patrician datatype knot)
Date: Mon, 3 Dec 2001 10:58:11 -0500

> Peter F. Patel-Schneider wrote:
> 
> > I think that we are in the midst of a disagreement over what (an
> > implementation of) RDF is.
> >
> > My view is that an implementation of RDF, or RDF Schema or RDF plus
> > datatypes, is supposed to implement the specification of RDF, or RDF
> Schema
> > or RDF plus datatypes.  The implementation is free to do this in any
> > effective way that it chooses, but it is not free to deviate from the
> > specification either by removal or by addition.
> >
> 
> Well that depends on what you mean by "addition"

How?  What sort of additions could be possible?

Maybe you mean that the implementation could provide additional interfaces.
Yes, I suppose that this would be allowable in a fully-conforming
implementation, provided that the additional interfaces could be reduced to
the specified ones.  However, if the additional interfaces allow
applications to make finer-grained distinctions than the specification
supports, then the implementation is *not* fully conforming.

> ...
> >
> > Now along comes datatypes, and the whole point of this note.
> 
> I think we all understand the issue, but we need to realize that RDF as it
> currently stands is effectively silent on the issue of datatypes. The only
> core "types" appear to be "resource" (whatever that really is) and
> "literal". One can compare literals by string comparison operators but there
> is _nothing_ which describes an _RDF_ mechanism to compare literal values
> using other/overloaded comparison operators, nor what such operators would
> be useful for. Of note, realize that the XML Namespaces recommendation
> states that _XML Namespace names_ (which are URI references according to RFC
> 2396) must be compared as _literal strings_, so that even reasonable URI
> comparison operators (such as those that expand relative URI references into
> absolute URIs before string comparison) may not be used to equate XML
> Namespace names. For example:
> 
> http://www.w3.org/foo and
> http://WWW.W3.ORG/foo
> 
> name _different_ XML Namespaces.

Sure, and so, maybe, RDF is not in compliance with XML.  If so then this is
a problem that should be worked out in the Semantic Web Coordination Group.

> [G]iven this draconian (yet well specified) definition what makes anyone think
> that "10" and "010" would be equated under RDF itself?

Under RDF itself, there is, of course, no way of determining that "10" and
"010" denote the same value.  However, a datatype extension may end up
requiring that "10" and "010" denote the same value in some circumstances.
Under these circumstances an implementation of the extended RDF *must*
behave accordingly.  To claim that an implementation of the non-datatype
RDF is in conformance with the extended RDF if it has different behavior
is just not correct.

> > The datatype model theory is going to end up saying quite a lot about
> > datatypes.  It will provide a meaning for the datatype constructions,
> > including which datatype constructions make sense and which don't.  This
> > will mean that entailment has to take into account the meaning of such
> > constructions.
> >
> > For example, the datatype model theory is going to have to answer under
> > what conditions
> > <John> <age> "10".
> > entails
> > <John> <age> "010".
> >
> > Any RDF implementation that does not produce the answers demanded by the
> > model theory will not be in compliance with the model theory's
> > specification of RDF.  (Note that an RDF implementation is free to use any
> > means to implement this entailment, such as passing all its input through
> > an XML Schema validator that produces native, canonical representations
> for
> > typed literals.)
> 
> He he. But suppose I create my own datatypes using XML Schema, including
> myxsd:integer which derives entirely from the lexical space (i.e. is defined
> by which characters are allowed in each position in the string). Now I
> submit that there is a clear and unambiguous 1:1 mapping from
> 
> myxsd:integer <=> xsd:integer
> 
> that is that the instance sets of tokens that form each type are identical.
> XML Schema provides no way to define this (I choose in this case _not_ to
> derive myxsd:integer from the builtin, i.e. value space derived,
> xsd:integer)

If you define myxsd:integer as something that defines a lexical-to-value
mapping in a different way than xsd:integer does, then, of course any
implementation of RDF plus datatypes that understands whatever method you
used to define myxsd:integer must follow your definition.  If the value
space of myxsd:integer is the same as the value space of xsd:integer and
the lexical-to-value mapping is the usual one, then, again of course, the
RDF plus datatypes implementation is *required* to answer affirmatively
that <John> <age> <myxsd:integer:010> entails <John> <age> <xsd:integer:10>
Otherwise the implementation is non-conforming.  How can it be otherwise?

> In such cases an XML Schema validator will correctly validate the _XML_
> input, but derive a unique post-Schema validation Infoset (psvi) according
> to which schema was used to validate. Are you are suggesting that ? the
> datatype model theory operate on the "psvi" graph rather than the input XML
> character stream?

The behavior of RDF plus datatypes must be fully specified in the RDF plus
datatypes specification.  If it is possible to implement a significant
portion of the specification using an XML Schema validator, either because
the specification explicitly references XML Schema datatypes or in the
extraordinarily-unlikely event that the independently-specified RDF plus
datatypes specification happens to have a close semantic relationship to
XML Schema datatypes, then that is a (very) happy occurence.  I happen,
moreover, to believe that it should be the goal of the RDF Core Working
Group to build any RDF plus datatypes specification so that this is the
case.

> > For example, for XML Schema, the interface could pass a pair like
> > <integer,10> or even <integer,"10"> instead of <decimal with 0
> > fractionDigits union string,"010">.  This would be much easier for
> > applications to handle than requiring them to understand all of XML Schema
> > constructed datatypes.
> 
> This is exactly the problem. It turns out that for XML Schema, whose
> formalism operates on lexical tokens, passing the pair <integer, 10> is not
> the same as <"decimal with 0 fractionDigits union string, "010">. 

It may be that XML Schema differentiates between <integer, "10"> and 
<decimal with 0 fractionDigits union string, "010">.  However, if the
datatype extension of the RDF model theory does not then that is all that
matters.   This is in exactly the same way that <foo /> and <foo></foo> are
lexically different, but XML Infoset treats them the same way, or that the
XQuery data model ignores non-significant white space (under some
conditions).

> Of course
> you may wish to limit your model theory to operate on the specific set of
> builtin XML Schema datatypes, but I suspect that you will eventually find
> that limiting, i.e. I would _like to_ be able to state:
> 
> <xsd:integer, 10> == <myxsd:integer, "010">
> 
> which might require something like daml:EquivalentTo unless the XML Schema
> formalism is changed to base _all_ the builtin datatypes on the lexical
> rather than the value space.

I'm not requiring that the model theory operate on only the built-in XML
Schema datatypes.  Where did I say that?  All I said is that the interface
between an RDF implementation and an application could be specified in a
way that passes the type integer instead of [decimal with 0 fractionDigits
union string].  


> Jonathan

Peter F. Patel-Schneider
Bell Labs Research

Received on Monday, 3 December 2001 11:38:36 UTC