Datatyping (was: Requirements for a possible "RDF 2.0") from Graham Klyne on 2010-01-20 (semantic-web@w3.org from January 2010)

From: Graham Klyne <GK-lists@ninebynine.org>
Date: Wed, 20 Jan 2010 10:14:20 +0000
To: Harry Halpin <hhalpin@ibiblio.org>
CC: semantic-web@w3.org
Message-ID: <4B56D77C.1070605@ninebynine.org>

Harry Halpin wrote:
> I think pretty strongly an extensible data-typing mechanism should be
> built into RDF++. Again, the XML community has long been complaining
> about the data-typing facilities of XML Schema not being extensible,
> see this post [1].

I tend to the opposite view.  One of the things I failed to realize in time to 
put my weight behind it was that an approach to datatyping based on 
interpretation properties, which was proposed by Dan Connolly, could be as 
convenient to use, if not more so, than the current datatyping scheme, and would 
keep the core of RDF very much simpler.  This led to us adding a complex feature 
to RDF that in hindsight I'm not convinced was really needed.

But of course we are where we are, and that's not going away.

For me, I think one of the key arguments that swayed me to support the current 
datatyping approach (back when it was being added to RDF) was a comment (I think 
it was R V Guha) that if we failed to provide a mechanism that could be used by 
OWL to capture cardinality constraints (i.e. to be able to represent at least 
integer literals) then we would have failed to provide a basis upon which other 
groups could build effectively.  (I'm not defending the decision to layer OWL on 
RDF, but at that time it was a stated requirement.)  So, if we support integers, 
why not other XSD datatypes?  The rest is history (which others may recall 
differently).

In some ways, it's arguably the least attractive option:  we have the complexity 
of datatyping machinery, but not capable of capturing arbitrary datatypes.  But 
I really worry about where things end up if we try to design a fully extensible 
datatyping framework.  Where does it end?  Do we want polymorphism support, 
generics, templating mechanisms, complex aggregates of different types, mixing 
literals and non-literal nodes,...  Looking at the work on datatypes in some 
functional programming languages, it becomes clear that this is a very complex 
and subtle area.

So when you talk about an extensible datatyping mechanism, how extensible should 
it be?  My inclination would be to try and restrict the mechanisms to simple 
primitive datatypes, and then develop a set of patterns that build upon the 
basic RDF framework for more complex data.  (I think some have argued against 
using RDF to build data structures on efficiency grounds, but if there are 
well-defined patterns for this then I think it becomes possible for 
implementations to optimize their handling of these.)

If I were (hypothetically) trying to design an alterantive approach to RDF 
datatyping, given where we are now, I think I'd like to try and design a way to 
capture most if not all of the present datatyped literals capabilities and 
semantics using basic RDF triples and plain literals only (datatype URIs and 
language URIs as interpretation properties, anyone?), so we can simplify the 
base level on which other stuff is built.  Then we'd have room to experiment and 
try different patterns, without having to uproot the standards that people are 
currently using.

#g

Received on Wednesday, 20 January 2010 10:16:10 UTC