Re: a test case for "literals must be self-evident" from Pat Hayes on 2001-12-10 (w3c-rdfcore-wg@w3.org from December 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Mon, 10 Dec 2001 17:50:03 -0600
To: Dan Connolly <connolly@w3.org>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101039b83ad342849c@[65.212.118.193]>
>Pat Hayes wrote:
>>
>>  Sorry this reply is delayed.
>>
>>  >OK, I blathered on about this requirement in...
>>  >
>>  >   literals must be self-evident
>>  >   Dan Connolly (Wed, Oct 17 2001)
>>  >   http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0338.html
>>  >
>>  >but recent discussion with Peter S. and Jeremy made me realize
>>  >I can reduce this to a real simple entailment test:
>>  >
>>  >Does dte-blunt.nt entail dte-pointy.nt?
>>  >
>>  >dte-blunt.nt:
>>  >
>>  >   <http://example/x> <http://example/y> "abc".
>>  >
>>  >dte-pointy.nt:
>>  >
>>  >   <http://example/x> <http://example/y> "abc".
>>  >
>>  >i.e. does an RDF document entail itself?
>>  >Surely the answer is yes, right?
>>  >I suggest that P/P++ do not guarantee this entailment;
>>  >they fail to specify that the answer to this
>>  >test is "yes".
>>  >
>>
>>  Well wait a minute. Are those the SAME graph, or two different but
>>  isomorphic graphs?
>
>I wasn't very clear; they're supposed to be the same RDF/xml
>document. The problem with the P++ scheme (as I understand it)
>is that an RDF/xml document doesn't pin down the graph.

Ah, if you put it that way, then indeed that is the case. But I see 
that as a general point about the relationship of RDF/XML to the RDF 
graph model. The same document might produce a new graph every time 
it is parsed.

>Let's use this as the test input:
>
><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>             xmlns:dc="http://purl.org/dc/elements/1.1/">
>   <rdf:Description rdf:about="http://www.w3.org/">
>     <dc:title>World Wide Web Consortium</dc:title>
>   </rdf:Description>
></rdf:RDF>
>
>
>>  Do you mean, does a document entail *itself*, or
>>  does it entail any other document with the same lexical form?
>
>I think I mean
>	does a document ential itself?

Ah. OK, I would say: meaningless question. The model theory is 
defined on the graph, not on the XML.

>
>or, in other words:
>
>	does a document completely determine the graph?
>
>>  In the
>>  P++ scheme, distinct literal nodes are treated as syntactically
>>  distinct entities, so the answer matters.
>>
>>  For example, suppose that we were to merge these two graphs. Would
>>  the result contain two triples or one?
>
>Yes, that's another way to phrase the question. If parse
>the RDF/xml document above, and then parse it again,
>will I ever get more than just the one triple?
>
>This self-evident idea requires that you get just one.

Well, tough. That seems to be clearly impossible. What if you parse 
the document and then someone in Uzbehkistan parses it a few 
milliseconds later? You surely aren't going to say that his RDF graph 
is the *same* graph as yours (are you?). After all, he can edit his 
without that doing anything to yours.

>
>>  If the answer is one, then
>>  they are the same document and this document entails itself (of
>>  course). If the answer is two, then they are two distinct but similar
>>  documents, and the answer then is, indeed, no in the P++ scheme,
>>  since those two different literal occurrences might be typed
>>  differently.
>
>Then perhaps I misunderstood. I understood that the
>very same document could end up with different graphs in P++.
>
>For example, take:
>
><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>             xmlns:dc="http://purl.org/dc/elements/1.1/"
>	xmlns:ex="http://example/vocab#">
>   <rdf:Description rdf:about="http://www.w3.org/">
>     <ex:orgDirectorShoeSize>10</ex:orgDirectorShoeSize>
>   </rdf:Description>
></rdf:RDF>
>
>now one minute, I get some schema info from
>http://example/vocab that says the range
>of ex:orgDirectorShoeSize is a numeral.

Did you mean 'integer' there? (Case A) Or not? (Case B)

>I parse that into my knowledge base, then I parse
>the document above. In the P++ scheme, the
>object of the triple I get from parsing is an integer,
>if I understand correctly.

In case B, you ought to get a numeral rather than an integer, unless 
I'm missing some XML subtlety. (Graham recently suggested that XML 
schema shouldnt be thought of as making semantic references at all, 
which rather bruises my fragile XML intuitions.)

>The next minute, somebody edits the example vocabulary;
>I restart my program and grab the schema; now
>it says the range of ex:orgDirectorShoeSize is numeral
>(a string constrained to [0-9]+, say). Then I parse
>the above document again. Now the object is a string.
>
>So I've parsed the same document twice,

But you didn't parse the graphs. You got two different RDF graphs by 
adding two different lots of datatyping information to them. In the 
P(++) proposals, the datatyping information is expressed in the graph 
itself, in the form of things like rdfs:Range triples, right? (You 
seem here to be absorbing this information into a parsing operation 
that doesn't put the information into the graph at all (??))

>and the two
>formulas I got don't entail each other.

In case B, seems to me that the RDF graphs will be the same. And in 
case A, you added contradictory information in the two cases, so what 
did you expect? Obviously, (P and Q ) might not entail (P and R) if R 
and Q contradict one another.

>
>Another related test case, using RDFS: does this
>
>   _:somebody ex:leftShoeSize "10".
>
>   ex:leftShoeSize s:subPropertyOf ex:showSize.
>
>RDFS-entail this?
>
>	_:somebody ex:shoeSize "10".
>

Right, good question. I can see that there is a reading in which it 
doesn't, and that seems bad. On the other hand it doesn't seem so bad 
if you think of the graphs. Hmmmm, maybe I have been hiding my head 
in graph-syntax sand, as it were. [Later, no, I don't think this is 
really to do with the graphs after all, see end of message.]

>since there are relevant issues related to the syntax
>of n-triples (and since I've used N3 QNames to abbreviate
>full n-triples), here are the documents in RDF/xml:
>
>premise:
>
><rdf:RDF xmlns="http://example/vocab#"
>     xmlns:log="http://www.w3.org/2000/10/swap/log#"
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>     xmlns:s="http://www.w3.org/2000/01/rdf-schema#">
>
>     <rdf:Description>
>         <leftShoeSize>10</leftShoeSize>
>     </rdf:Description>
>
>     <rdf:Description rdf:about="http://example/vocab#leftShoeSize">
>         <s:subPropertyOf rdf:resource="http://example/vocab#showSize"/>
>     </rdf:Description>
></rdf:RDF>
>
>conclusion:
>
><rdf:RDF xmlns="http://example/vocab#"
>     xmlns:log="http://www.w3.org/2000/10/swap/log#"
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>     xmlns:s="http://www.w3.org/2000/01/rdf-schema#">
>
>     <rdf:Description>
>         <shoeSize>10</shoeSize>
>     </rdf:Description>
></rdf:RDF>

I will confess to not knowing how to evaluate claims about entailment 
between RDF/XML documents, as I don't know the exact rules for 
getting an RDF graph from the XML.

>
>>  But this, seems to me, does not violate the guidelines you enunciated
>>  in
>>  http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0338.html
>>  since there you talk about an interpretation being CHANGED -
>>  redefined - by the addition of information. Here, nothing is being
>>  changed; if you add datatyping information, you are simply
>>  disambiguating the bare literal by adding more information about it,
>>  by removing some of its (datatyped) interpretations. This is just
>>  like normal RDF inference, right? The only difference is that every
>>  occurrence of a bare literal has to be treated as a separate
>>  syntactic entity. That gives inference a slightly unusual 'feel' on
>>  bare literals, perhaps, but it isn't anything disastrous. There is no
>>  nonmonotonicity, if you stick to the rules.
>
>Maybe I don't understant how P++ works, then. I got the impression
>that in P++, the RDF/xml form of a document didn't completely
>nail down the graph; that a parser decided between different
>graphs based on information from other places/documents.

I try as far as possible not to even think about RDF/XML, I have to 
admit. I see the MT work as starting with the RDF graph. Questions of 
literal identity are relatively clear there.

But maybe this is being too idealistic, and this issue of the 
relationship between documents and graphs needs to be attacked 
directly. However, since the P proposal (ie Peter's suggestion to use 
rdfs:Range to state datatypes) was apparently inspired by XML usage, 
and since it requires that literal occurrences be treated as 
syntactically distinct, the question of the proper syntactic 
description of literals in XML seems itself to be rather delicate.

It really does seem to me that literals, in the presence of a 
datatyping scheme, are intrinsically indexical in nature (in XML or 
RDF graphs or N-triples or any other syntax; they would be in KIF as 
well): they can't be treated as simple referring names, but have to 
be thought of in a way that treats each literal occurrence 
potentially needing its own context of use. The only alternative is 
to just flatly deny this sensitivity, declare them all to be strings, 
and then treat datatyping as another inference process. You need 
different syntaxes in the two cases, but inference works, er, 
'properly' with respect to the right syntax in either case.

The odd cases you have pointed out are where two pieces of syntax 
look the same on the page but in fact might be different. That is a 
syntactic oddity about indexicals, rather like the English word 'now' 
meaning different things every time it is uttered. So of course 'it 
is raining now' may not entail 'it is raining now' if the context is 
allowed to change between the two sentence-tokens, since the 
semantics is attached to the token rather than the word; and in the 
P(++)style proposals, any two literal occurrences have to be allowed 
to be in different literal contexts.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Monday, 10 December 2001 18:50:16 UTC