Re: Concrete datatype use case from Pat Hayes on 2001-10-24 (w3c-rdfcore-wg@w3.org from October 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Tue, 23 Oct 2001 23:11:19 -0500
To: Brian McBride <bwm@hplb.hpl.hp.com>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101053b7fbec95adff@[205.160.76.193]>
>Lets imagine we have a simple example where we have resources of 
>type Foobar that have a unique property size which is an integer.

Help already. Do you mean that Foobar is a datatype?
Which datatypes are you going to use to refer to integers?

>
>The use case is to:
>
>   o merge two graphs using different lexical representations of an int

That doesn't make sense to me. A datatyping specifies the value for a 
lexical representation. Do you mean that there are two different 
datatypes with the same value space, or one datatype which maps two 
different lexical forms (eg 0023 and 23 ) into the same value?

>
>
>Approach 1: concrete types are represented by literals which are a 
>pair consisting of a type and a lexical representation, the result 
>of a dumb merge is:
>
>_:foobar rdf:type util:Foobar .
>_:foobar util:size xsd:int-"10" .
>_:foobar util:size xsd:int-"010" .
>
>Unless the software is smart enough to understand that "05" and "5" 
>denote the same integer, this is pretty unsatisfactory, since there 
>really is only one size property.
>
>Conclusion: require a canonical representation of integers, or sw 
>has to understnad how to process the the xsd:int type.

??? Surely the point of using xsd: is that it refers one to a 
datatyping spec and *that spec* establishes that 10 and 010 have the 
same value.  If RDF has to do this itself, what's the point of using 
XSD?

>
>Approach 2: the DAML+OIL approach - a dumb merge results in:
>
>_:foobar rdf:type  util:Foobar .
>_:foobar rdf:size  _:size1 .
>_:size1   rdf:type  xsd:int .
>_:size1   rdf:value "10" .
>_:size2   rdf:type  xsd:int .
>_:size2   rdf:value "010" .
>
>Smart sw can recognise that size1 and size2 are really the same 
>thing.  I'm worried though, that that doing that will call for a 
>compare against all anon resources in the graph.  If I'm adding 
>_:size2 to the graph, would I be able to restrict the sw attention 
>to checking only whether it was equal to size1.  Yes, it could; it 
>need only consider arcs with the same blunt end and property.
>
>Conclusion: Same as 1; to do the merge properly require either a 
>canonical representation of ints, or the software has to understand 
>ints their lexical representation.  Takes more triples this way. 
>Assert without proof, that this is harder to implement than approach 
>1, since it involves multiple triples.
>
>Approach 3: DanC's approach
>
>_:foobar rdf:type      util:Foobar .
>_:foobar util:size     "10" .
>_:foobar util:intSize  _:size1 .
>_:size1  rdf:type      xsd:int .
>_:size1  rdf:value     "10" .
>_:foobar util.intSize  _:size2 .
>_:size2  rdf:type      xsd:int .
>_:size2  rdf:value     "010" .
>
>Conclusion: same as approach 2.
>
>Approach 4: Pat's approach (?)
>
>_:foobar      rdf:type    util:Foobar .
>_:foobar      util:size   size1:"10" .
>_:size1:"10"  rdf:type    xsd:int .
>_:foobar      util:size   size2:"010" .
>_:size2:"010" rdf:type    xsd:int .
>
>Software that's aware of how to process lexical representations of 
>int's reduces this to three triples.  Less triple bloat than above. 
>Debatably extends M&S's model, but compatible with with current RDF 
>in that types added as an extension.
>Worry about DAML+OIL requirement that concrete types and resources 
>are disjoint.  Specifically, rdf:type in above is illegal in 
>DAML+OIL; no property can have a both a resource and a concrete type 
>in its domain.

They would use[rdfs:range xsd:integer] to convey the required typing 
information. We could do that also, by the way. The point is that 
however the information is conveyed, the literals get interpreted 
according to the datatype conventions that are required by the 
rdf:type info, whether that is expressed explicitly (why not?) or 
implicitly (eg via rdfs:range).

>
>I guess what's botherimg me here is RDF's ability to separate out 
>individual statements and if the original set was true, they should 
>be true independently.  So
>
>_:foobar      util:size   size1:"10" .
>
>Is this really true?  is it the same as:
>
>_:foobar      util:size   "10" .

Yes.

>
>The value of the size should be an integer, shouldn't it.

It is. (?? I'm not following your worry here.)

>
>
>There may be other approaches I should cover; sorry I'm running out 
>of time and patience(as probably are you).
>
>
>   o query a graph where the graph and the query contain different 
>lexical representations of an int.
>
>It seems to me that its likely that implementations will want to 
>store a canonical representation of an int, and will convert the 
>query to that canonical representation.  So this isn't a problem. 
>What bothers me though, is, if the the graph implementation reads in 
>a representation of an int as say from "010" but changes it 
>intenally to some canonical representation which loses the fact it 
>was originally represented as "010", have we lost anything?

Not if we know that it was in a datatype which maps 010 and 10 to the 
same value; and if we didn't know that, then we wouldn't be justified 
in making the inference in the first place.

>  Is there any distinction amongst the different approaches.
>
>So if I've got:
>
>_:foobar rdf:type  util:Foobar .
>_:foobar rdf:size  _:size1 .
>_:size1   rdf:type  xsd:int .
>_:size1   rdf:value "10" .
>_:size2   rdf:type  xsd:int .
>_:size2   rdf:value "010" .
>
>
>   o query a graph for all Foobar's whose size is less than 12.
>
>In a large scale database, it would be crazy, from an implementation 
>point of view, not to store the underlying data in a way that 
>represented the ordering of integers so that the query can be done 
>efficiently.  This implies using an internal canonical 
>representation.
>
>I'm left with the feeling that representing a concrete type as a 
>pair will be easier to implement.

It probably would be, but it would ruffle a lot of XML feathers. I 
think we can allow it if people want to use it, and also allow more 
flexible but expensive schemes if people want to use those.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Wednesday, 24 October 2001 00:11:25 UTC