incomplete datatyping (was: Re: datatypes and MT) from Pat Hayes on 2001-11-05 (w3c-rdfcore-wg@w3.org from November 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Mon, 5 Nov 2001 14:30:09 -0600
To: Sergey Melnik <melnik@db.stanford.edu>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101027b80ca1bd6d4d@[65.212.118.166]>
>Pat Hayes wrote:
>>  ...
>>  >>Now, this seems to me to have a fatal flaw, which arises from the
>>  >>fact that the value spaces of two different datatypes might
>>  >>overlap. For example, suppose that there are datatypes xxd:octal
>>  >>and xxd:decimal, then the following would seem to be perfectly true:
>>  >>
>>  >>_:1 rdf:type xxd:octal
>>  >>_:1 rdf:type xxd:decimal
>>  >>_:1 rdf:value "32"
>>  >>_:1 rdf:value "26"
>>  >
>>  >
>>  >But that is not how Sergey would write it.  He is proposing:
>>  >
>>  >   _:1 xxd:octal   "32" .
>>  >   _:1 xxd:decimal "26" .
>>
>>  Oh, I see. That does indeed avoid this problem, but it also throws
>>  away the advantages of the bnode way of doing things, since now it is
>>  impossible to be neutral about datatypes.
>
>Why? If you want to be neutral, use a bNode w/o any arcs attached to it
>(if I understand what you mean by neutral).

No, what I mean by 'neutral' is writing, say, that my shoe size is 10 
without giving the datatyping of the literal. That is what is 
impossible in this scheme: to use a literal before, or independently 
of, giving that literal a datatype (because in this scheme, as I 
understand it, the ONLY places that a literal label are allowed are 
the object ends of triples whose predicate is a datatype name). That 
is why I say it is only a notational variation on the simple idea of 
incorporating datatyping information in to the literal label itself. 
(BTW, I agree that this simple idea has its merits; but I think that 
if we are going to insist that literals *must* be explicitly 
datatyped, then we should impose this as an explicit syntactic 
constraint in the very syntax of the language.)

>  > This forces the datatyping
>>  information to be attached directly to the literal;
>
>Right.
>
>>  the only place
>>  literals can occur in an RDF graph is at the object end of links
>>  labelled with a datatype.
>
>True, although I personally do not see any problem with allowing
>
>"cat"  base64  "Y2F0"
>
>either.
>
>>  This seems to me to be simply a variation
>>  on the idea of incorporating the datatype label into the literal
>>  itself, eg by having literals be pairs of a datatype and a string.
>
>Not quite. Notice that we can refer both to the literal "typed" in such
>a way, and its type by means of arcs in RDF graphs (and, for example,
>provide additional information about the datatype or describe how the
>literal/string is represented using the base64 encoding, if that's all
>we know).

Suppose I know that some property is represented by a literal "Y2F0", 
but have (as yet) no information about the appropriate datatyping to 
be used to interpret that literal. How would you represent that state 
of information? And now suppose that I discover, perhaps from a 
different source, that the property in question is stated in terms of 
a base-64 integer encoding: how would you encode that information? 
And then how would I be able to put these two pieces of information 
together into a single graph, and be able to draw the obvious 
conclusion? Remember that it is not valid to merge bnodes from two 
different RDF graphs.

>  > Like that proposal, it forces datatype information to be given
>>  explictly and locally,
>
>Yes, and I believe that's a strength rather than a weakness. I think it
>is essential for the snippets of information dispersed on the Semantic
>Web to be as precise as possible and as self-contained as possible.

I think that is completely wrong-headed. The whole point of using an 
assertional language is to be able to put together pieces of 
information from various sources and draw reasonable conclusions from 
them.  If we start declaring that certain kinds of information 
*must* be linked to  others, or *can only* be inferred in a certain 
sort of way, we might as well use Java.

I would agree that IF the information is available locally then of 
course it should be possible to use it locally. But the reasoners 
should also be able to function even when all the information is not 
available locally; they should not barf just because the information 
provided is incomplete.

>  > and makes it impossible to infer datatyping
>>  information from other information in the graph, eg range information.
>
>I think it is still possible, although to a limited extent. You are
>right in that we'd always need to link typed thingies to literals in
>some form. However, range information and inference could still be quite
>useful. For example, take a built-in xxd:decimal datatyping property, as
>above. You could say:
>
>xxd:decimal rdfs:domain MyReal
>
>thereby naming the value space of xxd:decimal explicitly. ...

Oh, sure, we can always do things like that. (Well, this wouldn't 
actually *name* the value space, it would just require it to be a 
subclass of an undefined rdfs:Class.)
But what I had in mind was knowing that the range of some property 
like shoeSize was a certain datatype. I don't see how that kind of 
information is going to help your reasoner.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Monday, 5 November 2001 15:30:13 UTC