Concrete datatype use case

Lets imagine we have a simple example where we have resources of type Foobar 
that have a unique property size which is an integer.

The use case is to:

   o merge two graphs using different lexical representations of an int


Approach 1: concrete types are represented by literals which are a pair 
consisting of a type and a lexical representation, the result of a dumb merge is:

_:foobar rdf:type util:Foobar .
_:foobar util:size xsd:int-"10" .
_:foobar util:size xsd:int-"010" .

Unless the software is smart enough to understand that "05" and "5" denote the 
same integer, this is pretty unsatisfactory, since there really is only one size 
property.

Conclusion: require a canonical representation of integers, or sw has to 
understnad how to process the the xsd:int type.

Approach 2: the DAML+OIL approach - a dumb merge results in:

_:foobar rdf:type  util:Foobar .
_:foobar rdf:size  _:size1 .
_:size1   rdf:type  xsd:int .
_:size1   rdf:value "10" .
_:size2   rdf:type  xsd:int .
_:size2   rdf:value "010" .

Smart sw can recognise that size1 and size2 are really the same thing.  I'm 
worried though, that that doing that will call for a compare against all anon 
resources in the graph.  If I'm adding _:size2 to the graph, would I be able to 
restrict the sw attention to checking only whether it was equal to size1.  Yes, 
it could; it need only consider arcs with the same blunt end and property.

Conclusion: Same as 1; to do the merge properly require either a canonical 
representation of ints, or the software has to understand ints their lexical 
representation.  Takes more triples this way.  Assert without proof, that this 
is harder to implement than approach 1, since it involves multiple triples.

Approach 3: DanC's approach

_:foobar rdf:type      util:Foobar .
_:foobar util:size     "10" .
_:foobar util:intSize  _:size1 .
_:size1  rdf:type      xsd:int .
_:size1  rdf:value     "10" .
_:foobar util.intSize  _:size2 .
_:size2  rdf:type      xsd:int .
_:size2  rdf:value     "010" .

Conclusion: same as approach 2.

Approach 4: Pat's approach (?)

_:foobar      rdf:type    util:Foobar .
_:foobar      util:size   size1:"10" .
_:size1:"10"  rdf:type    xsd:int .
_:foobar      util:size   size2:"010" .
_:size2:"010" rdf:type    xsd:int .

Software that's aware of how to process lexical representations of int's reduces 
this to three triples.  Less triple bloat than above.  Debatably extends M&S's 
model, but compatible with with current RDF in that types added as an extension.
Worry about DAML+OIL requirement that concrete types and resources are disjoint. 
  Specifically, rdf:type in above is illegal in DAML+OIL; no property can have a 
both a resource and a concrete type in its domain.

I guess what's botherimg me here is RDF's ability to separate out individual 
statements and if the original set was true, they should be true independently.  So

_:foobar      util:size   size1:"10" .

Is this really true?  is it the same as:

_:foobar      util:size   "10" .

The value of the size should be an integer, shouldn't it.


There may be other approaches I should cover; sorry I'm running out of time and 
patience(as probably are you).


   o query a graph where the graph and the query contain different lexical 
representations of an int.

It seems to me that its likely that implementations will want to store a 
canonical representation of an int, and will convert the query to that canonical 
representation.  So this isn't a problem.  What bothers me though, is, if the 
the graph implementation reads in a representation of an int as say from "010" 
but changes it intenally to some canonical representation which loses the fact 
it was originally represented as "010", have we lost anything?  Is there any 
distinction amongst the different approaches.

So if I've got:

_:foobar rdf:type  util:Foobar .
_:foobar rdf:size  _:size1 .
_:size1   rdf:type  xsd:int .
_:size1   rdf:value "10" .
_:size2   rdf:type  xsd:int .
_:size2   rdf:value "010" .


   o query a graph for all Foobar's whose size is less than 12.

In a large scale database, it would be crazy, from an implementation point of 
view, not to store the underlying data in a way that represented the ordering of 
integers so that the query can be done efficiently.  This implies using an 
internal canonical representation.

I'm left with the feeling that representing a concrete type as a pair will be 
easier to implement.  Your milage may vary.

Brian

Received on Tuesday, 23 October 2001 12:57:57 UTC