Re: heading toward datatyping telecon

Pat Hayes wrote:

[...]


> 
> Well, not all RDF graphs can be represented in RDF/XML, so why is that a 
> serious constraint?


Because we are chartered to explain to folks how to use XML datatypes with RDF. 
  I'd rather not say, do it this way, but wait till the next round of specs 
before you do.  I'll repeat however, we will not let the charter get in the way 
of doing the "right thing" on this issue.  I'm just exploring whether there is a 
way we can tripping up over the charter.


> 
>> However, in Pat's proposal bnodes can match literals as well as 
>> resources, so we could use bnodes for now and extend later.  Yes? If 
>> so we can focus our discussions now on the bnode representations.
> 
> 
> Not sure I follow this. (What do you mean by 'match' here?)


Don't those logicians keep you honest?  I was struggling for the right word 
here.  I mean we can use a bnode instead of a literal in Pat's scheme and it 
still works; we just have to add the rdf:value property on the bnode which 
wouldn't be needed if we had literals as subjects.


> 
>>
>> Pat's proposal defines a type to be a mapping from a lexical space to 
>> a value space.  That means that a hexadecimal integer is a different 
>> type from a decimal integer.
> 
> 
> Obviously the datatype mappings are not the same, but the value spaces 
> can be overlap or even be the same. We can make them rdfs:subclasses of 
> one another if you like.


Hmmm.  That would make them equivalent.  Its bending my head a little, but they 
are not equivalent, so that sounds like trouble.


> 
>> That's going to be confusing to programmers.
> 
> 
> Really?? I know some programmers who like things very strongly typed and 
> want to distinguish integers that happen to be positive from positive 
> integers. It all depends on what typing discipline you are comfortable 
> with.


Quite.  Do you know of any that prefer to have different types for integers 
represented by hexadecimal numerals from those represented by decimal numerals. 
  There is quite a difference in typing based on the value space (in this 
example integers, or subsets thereof) and the lexical space.  Progammers are 
used to typing based on the value space.  Your example is of typing based on the 
value space.


> 
>> I suppose we could define integer to be a super-class of hexadecimal 
>> integer and decimal interger, i.e. for each value space define a class 
>> that all types mapping to that value space are subclasses of.
> 
> 
> Right, that would be a 'safe' way to do it for almost everyone.
> 
>> Still somehow that does not sit right with my intuitions.
> 
> 
> Why not?


I think the graph merging example is the closest I can come to a concrete 
example of why not at the moment.


> 
>> In rdf schema I want to  say that the value of a property is an 
>> integer; after parsing I don't much care whether is was represented in 
>> decimal, binary or hieroglyphics.  rdf schema is about describing the 
>> data model, not the syntactic representation.
> 
> 
> I agree, but you might want to allow those who feel like doing so, to 
> import some distinctions into their data models that to you seem like 
> syntactic matters.


That is something we could ask the community whether they want/need it.


> 
>> How would I write a schema that would allow the value of a property to 
>> be either a decimal or a hexadecimal integer?
> 
> 
> If that distinction is going to 'do' datatyping (in *any* datatyping 
> scheme) then you need to retain it somehow. The safe way would be to 
> have three classes: the decimals and hexadecimals, which are datatype 
> classes, and the integers, which is a superclass of those two and is not 
> a datatype class (because there is no way to know , from knowing only 
> that "23" is an integer, whether it means 23 or 19 or even 203.)
> This really is reasonable, since there really are no such things as 
> decimal *integers*; decimals are (one kind of) *numeral*.


Right, its that difference between numeral and integer I'm trying to get clear 
in my head.  Given the above approach:

   o I'd have to declare the range of the property to be integer
   o type inferencing does not work, if I get a literal "12", we can't tell
     whether its decimal or hex, so we must include the representation info
     directly in the RDF/XML - which is where it belongs - its syntactic.
   o How do I tell that the integer is in fact a concrete datatype and I
     should handle it specially.  Hmmm, I guess we declare it as a subclass
     of ConcreteType, or something like that.


> 
>> How do the different approaches handle merging of graphs, as in:
>>
>> Consider RDF/XML serializations of two graphs each describing 
>> http://example/thingy.  Each has an eg:size property for 
>> http://example/thingy.  In one graph the size is represented by a 
>> decimal integer "12".  In the other the size is represented by the 
>> hexadecimal integer "C".
> 
> 
> In my simplest MT extension, you just merge the graphs. But be careful: 
> how is the datatyping information supplied in your two graphs? I would 
> need to know in order to answer the question fully.


Its your proposal.  How would you recommend doing it.  For example, you can't do 
this:

graph1:

   <http://example/thingy> <eg:size>   _:size .
   _:size                  <rdf:type>  <xsd.integer> .
   _:size                  <rdf:value> "12" .

graph2:

   <http://example/thingy> <eg:size>   _:size .
   _:size                  <rdf:type>  <eg:hexint> .
   _:size                  <rdf:value> "C" .

merge:

   <http://example/thingy> <eg:size>   _:size .
   _:size                  <rdf:type>  <xsd.integer> .
   _:size                  <rdf:value> "12" .
   <http://example/thingy> <eg:size>   _:size .
   _:size                  <rdf:type>  <eg:hexint> .
   _:size                  <rdf:value> "C" .

because you don't know which rdf:type goes with which rdf:value.  And I don't 
want to do:

merge:

  <http://example/thingy> <eg:size>   _:size1 .
  _:size1                 <rdf:type>  <xsd.integer> .
  _:size1                 <rdf:value> "12" .
  <http://example/thingy> <eg:size>   _:size2 .
  _:size2                 <rdf:type>  <eg:hexint> .
  _:size2                 <rdf:value> "C" .


because I know there is only one eg:size property.  Knowing that, I suppose I 
could throw one away, if I trust both, but that is throwing away information 
which feels bad.  And what if I want to store provenance information, and I've 
got conflicting data.

If however, we have:

graph1:

   <http://example/thingy> <eg:size>      _:size .
   _:size                  <xsd:integer>  "12" .
   _:size                  <rdf:type>     <eg:integer>

graph2:

   <http://example/thingy> <eg:size>      _:size .
   _:size                  <eg:hexint>    "C" .
   _:size                  <rdf:type>     <eg:integer>

merge:

   <http://example/thingy> <eg:size>      _:size .
   _:size                  <xsd:integer>  "12" .
   _:size                  <eg:hexint>    "C" .
   _:size                  <rdf:type>     <eg:integer>

A simple merge works fine (knowing that there is only value of the eg:size 
property which allows the bnodes to be smushed).

So the intuitions I am having trouble with are:

  o that rdf:type is about value spaces, not lexical spaces
  o that the node which represents a value in a graph should be independent
    of the lexical representation of the value, so that graphs with different
    lexical representations can be "smushed" to use Libby's wonderful term.

Brian

Received on Friday, 2 November 2001 04:40:06 UTC