- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: Tue, 23 Oct 2001 17:53:23 +0100
- To: rdf core <w3c-rdfcore-wg@w3.org>
Lets imagine we have a simple example where we have resources of type Foobar that have a unique property size which is an integer. The use case is to: o merge two graphs using different lexical representations of an int Approach 1: concrete types are represented by literals which are a pair consisting of a type and a lexical representation, the result of a dumb merge is: _:foobar rdf:type util:Foobar . _:foobar util:size xsd:int-"10" . _:foobar util:size xsd:int-"010" . Unless the software is smart enough to understand that "05" and "5" denote the same integer, this is pretty unsatisfactory, since there really is only one size property. Conclusion: require a canonical representation of integers, or sw has to understnad how to process the the xsd:int type. Approach 2: the DAML+OIL approach - a dumb merge results in: _:foobar rdf:type util:Foobar . _:foobar rdf:size _:size1 . _:size1 rdf:type xsd:int . _:size1 rdf:value "10" . _:size2 rdf:type xsd:int . _:size2 rdf:value "010" . Smart sw can recognise that size1 and size2 are really the same thing. I'm worried though, that that doing that will call for a compare against all anon resources in the graph. If I'm adding _:size2 to the graph, would I be able to restrict the sw attention to checking only whether it was equal to size1. Yes, it could; it need only consider arcs with the same blunt end and property. Conclusion: Same as 1; to do the merge properly require either a canonical representation of ints, or the software has to understand ints their lexical representation. Takes more triples this way. Assert without proof, that this is harder to implement than approach 1, since it involves multiple triples. Approach 3: DanC's approach _:foobar rdf:type util:Foobar . _:foobar util:size "10" . _:foobar util:intSize _:size1 . _:size1 rdf:type xsd:int . _:size1 rdf:value "10" . _:foobar util.intSize _:size2 . _:size2 rdf:type xsd:int . _:size2 rdf:value "010" . Conclusion: same as approach 2. Approach 4: Pat's approach (?) _:foobar rdf:type util:Foobar . _:foobar util:size size1:"10" . _:size1:"10" rdf:type xsd:int . _:foobar util:size size2:"010" . _:size2:"010" rdf:type xsd:int . Software that's aware of how to process lexical representations of int's reduces this to three triples. Less triple bloat than above. Debatably extends M&S's model, but compatible with with current RDF in that types added as an extension. Worry about DAML+OIL requirement that concrete types and resources are disjoint. Specifically, rdf:type in above is illegal in DAML+OIL; no property can have a both a resource and a concrete type in its domain. I guess what's botherimg me here is RDF's ability to separate out individual statements and if the original set was true, they should be true independently. So _:foobar util:size size1:"10" . Is this really true? is it the same as: _:foobar util:size "10" . The value of the size should be an integer, shouldn't it. There may be other approaches I should cover; sorry I'm running out of time and patience(as probably are you). o query a graph where the graph and the query contain different lexical representations of an int. It seems to me that its likely that implementations will want to store a canonical representation of an int, and will convert the query to that canonical representation. So this isn't a problem. What bothers me though, is, if the the graph implementation reads in a representation of an int as say from "010" but changes it intenally to some canonical representation which loses the fact it was originally represented as "010", have we lost anything? Is there any distinction amongst the different approaches. So if I've got: _:foobar rdf:type util:Foobar . _:foobar rdf:size _:size1 . _:size1 rdf:type xsd:int . _:size1 rdf:value "10" . _:size2 rdf:type xsd:int . _:size2 rdf:value "010" . o query a graph for all Foobar's whose size is less than 12. In a large scale database, it would be crazy, from an implementation point of view, not to store the underlying data in a way that represented the ordering of integers so that the query can be done efficiently. This implies using an internal canonical representation. I'm left with the feeling that representing a concrete type as a pair will be easier to implement. Your milage may vary. Brian
Received on Tuesday, 23 October 2001 12:57:57 UTC