Re: Datatyping - abstract syntax - test case from Patrick Stickler on 2002-09-11 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 11 Sep 2002 15:21:29 +0300
To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <002201c2598d$c20f7950$864416ac@NOE.Nokia.com>
[Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com]


----- Original Message ----- 
From: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>
To: "Patrick Stickler" <patrick.stickler@nokia.com>; "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>; <w3c-rdfcore-wg@w3.org>
Sent: 11 September, 2002 14:32
Subject: RE: Datatyping - abstract syntax - test case


> >
> > As far as I understood it, last Friday's concensus vote
> > was based on labeling typed literal nodes with a pairing
> > of datatype designation and lexical form, not with values.
> >
> > If you are proposing that we pull that stake out of the
> > ground and do something different, I think you need to
> > argue in terms of fatal flaws in how it is specified
> > at present.
> 
> Different interpretations of the level of commitment we made to Part I.
> 
> I didn't see what I was saying as trying to radically change Part I, more
> just articulate a detail with greater clarity.
> 
> I think the <xsd:decimal>"2.00" and <xsd:decimal>"2.0" case is interesting.

Is it as interesting as http://foo.com/blarg and HTTP://FOO.COM/blarg ?

> If, as you're proposing, we keep both the datatype URI and the string
> around, then we have ruled out a database implementations that stores the
> number in a native format (since the two cases would be identical).

Not at all. The abstract node <xsd:integer>"10" denotes the integer
value ten, and if an application wishes to represent that natively
in its internal datastore, fine. But that does *not* mean that the
node in the abstract graph bears the able ten rather than <xsd:integer>"10".

And if two distinct nodes in the abstract syntax, say <xsd:integer>"10" 
and <xsd:integer>"00010" have identical representation internally
in some application as the explicit value ten, fine. That does not
mean that they are not distinct nodes in the abstract syntax.

All the abstract syntax is meant to do is to communicate explicitly
and consistently the assertions that are being made in the RDF
knowledge base. Right? The application-specific realization of 
the abstract graph need not be identitical to the abstract graph,
though it must preserve the truth of the abstract graph. No?

Above all, the abstract graph must be consistent, explicit, and
*portable*. Trying to label typed literal nodes with the actual values
is directly contrary to that last quality.

> If, as I am proposing, we say both map to the value, then we get slightly
> different behaviour from implementations that are aware of any specific
> datatype (i.e. xsd:decimal in this case) and those that don't know it (which
> would have to fall back to your proposed solution).

Applications which are unnable to determine the actual value denoted by
a typed literal node are not able to infer anything about it. Your
argument could be equally well directed at applications which are
aware that case in the URI scheme prefix and domain name portion of
an http: URI are not relevant and any two URIrefs differing only in
case for those portions denote the very same resource -- and such
applications might behave differently from those that stupidly
muddle along with fully opaque URIrefs.

The RDF MT should not be able to say whether or not two string-inequal
lexical forms map to the same value for the same datatype, no matter
what that datatype is. That is completely outside the scope of RDF,
just as URI schemes are.

So, just as it is common practice to use, and encourage the use of,
canonicalized representations for URIrefs (lowercase prefixes and
domain names, etc.) likewise will it likely be common practice to
use and encourage the use of canonicalized lexical forms for typed
literals -- the goal in both cases to derive the benefit of graph
tidying of string-equal node labels, clarifying equality of reference.

Those who use non-canonical representations, for URIrefs or lexical
forms, will be at the mercy of applications with the special knowledge
to deduce the equality that otherwise would be provided by the RDF MT.

Cest la vie. I see no major show stoppers here. No more so that
already exist in RDF.

> I think it's more important to allow efficient database implementations, 

I agree. Though I see no implementation efficiency problems by not 
having actual value labels in the *abstract* graph. Having the
datatype URIref + lexical form pair in the abstract graph does
not prohibit applications from augmenting or replacing that
information with the actual value in its internal representation.

I think we are letting the tail (implementation) wag the dog (abstract
model) a bit here.

> and
> so prefer to live with the lack of clarity and small interoperability issues
> depending on how many datatypes are known by an implementation.

I do not.

Please let's keep in mind that RDF is a model for knowledge *INTERCHANGE*
and not for database design and implementation. Though we certainly want
it to be as easy as possible to map that abstract model to real
implementations, it is of *paramount* importance that we do not
sacrifice genericity, portability and interoperability for short term
ease of implementation, especially if that ease of implementation is 
possibly only percieved and not demonstrated.

Patrick
Received on Wednesday, 11 September 2002 08:21:31 UTC