Re: incomplete datatyping (was: Re: datatypes and MT) from Pat Hayes on 2001-11-07 (w3c-rdfcore-wg@w3.org from November 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Tue, 6 Nov 2001 22:32:31 -0600
To: Sergey Melnik <melnik@db.stanford.edu>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101054b80e5e1c73c0@[65.212.118.166]>
.....
>_y firstName "Sergey"
>_y lastName "Melnik"
>_y monthOfBirth "July"
>_y dayOfBirth "Tuesday"
>
>Above, the interpretation of _y is (with a high probability) uniquely
>determined by the four statements. Would one call any of the four
>properties datatyping mappings?

Nope.

I'd see that as only meaningful if you also said how to interpret the 
literals. So for example I presume it might be kosher to assume
firstName rdfs:range xsd:string .
lastName rdfs:range xsd:string .
but that we might also need something like
monthOfBirth rdfs:range xxd:EnglishCalendarMonth

>....
>>  >....>
>>  >>  Suppose I know that some property is represented by a literal "Y2F0",
>>  >>  but have (as yet) no information about the appropriate datatyping to
>>  >>  be used to interpret that literal. How would you represent that state
>>  >>  of information?
>  > >
>>  >Oh, I think this leads us into inferencing, which seems to work just
>>  >fine in both approaches. In the above case, you might assert:
>>  >
>>  >_n propertyIDontKnowAnythingAbout "Y2F0"
>>
>>  ??But now you seem to have broken your own rules. The proposal, as I
>>  understood it, was that this would be syntactically illegal: the only
>>  place a literal label can occur is at the sharp end of an arc
>>  labelled with a datatype name.
>
>Oh, that's a misunderstanding. From your problem description I thought
>_n would stand for a literal like "cat". So the starting point is
>
>_tom label _:1
>_:1  propertyIDontKnowAnythingAbout "Y2F0"
>
>where _:1 would be "cat", right?

?? No. You really have lost me. The proposal, as I understand it, is 
that any triple of the form

aaa eg:prop <literal> .

will be rewritten, or required to be transcribed into, a different 
form, where the literal occurs only in the place indicated:

aaa eg:prop _:1 .
_:1 <datatypeproperty> <literal> .

So that first triple would simply be illegal; it would never occur in 
a well-formed RDF graph, and if anyone were to write it, it would 
either be rejected or automatically transformed into the second 
graph.  So now, my question is, under these circumstances, HOW does 
one say the equivalent of the first graph in the second form, if one 
does not know the datatype mapping?

If this understanding is incorrect, could you please say what would 
happen, in your proposal, if a triple of the first form were input? 
If it is legal RDF, what does it mean?

>
>Then, still the same rule (see comment about rules below) applies:
>
>(X propertyIDontKnowAnythingAbout Y)
>   -> (X base64 Y)
>
>when you "discover" that propertyIDontKnowAnythingAbout is in fact
>base64.
>
>[...]
>
>>  You have to write it in RDF. (If we could write Prolog then all our
>>  problems would be solved. :-)
>
>That's what we do when we define model-theoretic interpretations anyway,
>don't we? What about Sec. 6 "RDFS-entailment and RDFS closures" in the
>MT draft? To quote: "Apply the following rules recursively to generate
>all legal RDF triples ..."

Oh sure, but those 'rules' are just a way to define a set of 
expressions. (One could define the closure more mathematically, but I 
chose this way as I thought it would be more acceptable to the likely 
audience.) They aren't inference rules (though I guess you could use 
them that way.)

But this is different. Your 'rule' has to be applied at input; it is 
a syntactic constraint on wellformedness, not a semantic closure.

>....
>(BTW, would not it be worth the pain to have an RDF-based rule language
>with model-theoretic semantics?

Ask Stefan Decker about this. He has such an RDF rule language 
implemented, I believe.

>Then, no special model-theoretic
>definitions for subClassOf and subPropertyOf would be required, you'd
>just use rules to describe them.

One certainly could do that, indeed. Then one could prove a single 
completeness theorem for the rule language, instead of re-stating it 
for each closure.


.....

>  >... bnode to which the two kinds of information must be attached in a
>>  certain way. The MT extension, in contrast, allows them to be
>  > separated and treated as separate assertions, and uses RDFS inference
>>  machinery to make the connection between them.
>
>Objection, your honor! ;) Let's wait until we've seen a couple of
>examples of defining datatypes in the MT extension. My feeling is more
>that RDFS inference machinery would be required, i.e., you wouldn't be
>done by defining some denotational semantics for a property or a class,
>but a more substantial tweaking of the MT would be needed each time you
>introduce a new built-in datatype.

I don't think so.  What one needs to know is the uri of the scheme, 
is all, then the MT should be able to handle it. But lets try it out 
on some examples, to be sure.

>.....
>>  What if I write a graph and refuse to apply the rule: have I
>>  made an error of some kind? What kind?
>
>What if I write a graph and refuse to make a transitive closure on
>subClassOf: have I made an error? The answer to this question is the
>answer to your question ;)

OK, so in your example, the rule that converts

aaa eg:prop "10" .

into

aaa eg:prop _:x .
_:x xsd:integer "10" .

is an *inference* rule? If so, is it valid? If so, it must preserve 
truth. If so, that first triple must have a truthvalue in an 
interpretation. How is that truthvalue determined in your scheme?

....

>  > >
>>  >I think that both approaches on the table are equivalent in the sense
>>  >that they can ultimately provide a very similar high-level
>>  >interpretation to any given piece of RDF instance data, although using
>>  >quite different schemas and a different perspective. My feeling is,
>>  >however, that by giving a reasonable (not straightforward)
>>  >model-theoretic interpretation to (_x size "10") you finesse the fact
>>  >that this statement is "syntactic matter" that needs further
>>  >explanation, i.e., my means of rules.
>>
>>  I don't finesse it, I deny it outright. It is not syntactic in any
>>  way: it says that _x's shoe size is a literal value. Until we know
>>  what datatyping scheme to use, we do not know which literal value,
>
>You are saying that as long as there is no schema it's a literal value.
>Once a schema is there, it becomes a data value. To satisfy
>monotonicity, all data values must then necessarily be literal values,
>right?

I don't know what a data value is. By literal value I meant the 
semantic value of the literal, ie the value of the XL mapping applied 
to the literal. So a literal always denotes the same thing in a given 
datatype interpretation, but when no schema information is present, 
there are several alternative such interpretations which satisfy the 
graph, and it might denote something different in each one.

Nothing *changes* when schema information is added, exactly, but the 
set of satisfying datatype interpretations is reduced, in the usual 
way, and that may be enough to fix the interpretation of the literal 
in all the interpretations that satisfy the graph.

>....
>In other words, given an arbitrarily confusing RDF representation, you
>can make sense of it, leveraging the fact that literals can potentially
>denote anything.

They can't. I don't know where you got this strange idea. I'm sure I 
didn't say it, and the MT doesn't say this. If anything, literals 
have much more restricted interpretations than urirefs: they must 
denote according to datatype mappings.

>Thus, for the example below
>
>_b dc:Creator "Melnik"
>_x name       "Sergey"
>
>you could have an interpretation in which literal "Melnik" and bNode _x
>denote the same thing, i.e. myself,

Only if there was a datatyping scheme in which "Melnik" in the 
lexical space mapped to you in the value space, and this datatyping 
scheme had a URI. Otherwise there is *no way* for that literal to 
denote you; unlike a uriref or a blank node, its interpretation 
cannot be assigned freely in an interpretation.

>It's our job to make this idea more clear by working through some test
>cases of defining a couple of built-in datatypes using the MT extension.

Well, it is designed to handle (things like) XML schema, not to 
define new datatypes.

>  > It doesn't change any of the rest of the model
>>  theory, by the way: the extra machinery only comes into play when
>  > literals are around.
>
>Don't forget the change in the abstract syntax - the MT extension relies
>on untidy graphs, right?

Untidy on literal nodes, yes. But I think that any datatyping scheme 
(other than one in which all literals have globally fixed 
denotations) will have to allow this much freedom in the graph. When 
I wrote the MT I was assuming, naively, that literal labels would 
include their datatyping information in their syntax somehow, rather 
like Patrick's URVs, and hence be globally fixed in their 
interpretations.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Tuesday, 6 November 2001 23:32:51 UTC