Datatypes and Java (was Re: datatypes: new information - email delayed awaiting WG response)

It occurred to me that our discussion of datatypes and tidiness has a 
clear parallel with the Java language. Java has built-in datatypes (int, 
double etc.), whose instances do not have object identity. For each such 
datatype, there exist a "wrapper" class (Integer, Double, etc.), whose 
instances do have object identity.

It seems what we are struggling with is whether we want RDF datatypes to 
  be "int"s or "Integer"s. In fact, the stack-in-the-ground draft 
attempts to allow both ints and Integers to be used in such a way that 
one can be magically equivalent to another. The basic question about 
literals - tidy or untidy - can also be restated in a simplified form: 
do we want literals to have identity (be Strings) or not (be some kind 
of immutable char arrays).

In RDF, the inline idiom corresponds to using ints, whereas bNodes can 
be seen as instances of Integer. In Java, ints and Integers are, of 
course, different things. I doubt that it is beneficial do glue these 
different concepts together in RDF.

The motivation for using ints in Java is execution efficiency. The 
motivation for using the inline idiom in RDF is encoding efficiency. RDF 
is an information exchange language. I don't think that encoding 
efficiency is critical for RDF.

There seem to be two paths from where we stand:
(1) use XSD datatypes as ints. Extend the graph model to include the 
primitive datatypes in the graph syntax. Introduce "wrapper" datatypes 
as RDFDT classes if needed.
(2) use XSD datatypes directly as Integers. Forget about ints and 
wrapper datatypes.

Alternative (1) does not work if we have untidy literals. In this case, 
each literal has identity, including ints. Alternative (2) works both 
with tidy and untidy literals.

As to tidiness, I'm confident that both tidy and untidy literals can be 
made to work and that the problems with reification and subproperties in 
the untidy case can be resolved. If RDF is based on tidy literals, 
applications that need untidiness can utilize bNodes. If RDF is based on 
untidy literals, equality for literals can somehow be hooked into 
applications where needed. In this case, however, even simple 
containment tests (e.g., is [Bob age "10"] contained in the graph?) 
would require some kind of basic reasoning, lookups of property 
definitions, support of RDF Schema and RDF Datatyping etc. This may be a 
too long stick to swallow for a desperate Perl hacker. Still, I would 
agree with Pat that untidy literals could of course appear in the 
subject position - they became identity, why not?

Brian, if you are going to rephrase your message, the distinction 
between ints and Integers may be a helpful intuition.

Sergey



Brian McBride wrote:

> 
> I was about to send out the datatypes message to rdf interest, but I'm 
> concerned that we have new information that means I need to change it.
> 
> Have we agreed that test A does not hold for untidy literals?  Patrick 
> has, and I think Jeremy has.  If so then I need to modify (simplify) the 
> message - tests B and C can be dropped.
> 
> The reasons we might decide that test A does not hold are:
> 
>   o it requires a special case in the reification machinery
>   o it requires a special case in the container machinery
>   o it interacts badly with rdfs:subPropertyOf
> 
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jul/0004.html
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jul/0009.html
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jul/0011.html
> 
> I need some feedback from the WG on this.
> 
> Brian

Received on Thursday, 4 July 2002 11:11:56 UTC