Re: RDF Semantics: rdfs entailment lemma / terHorst-proof-review from pat hayes on 2003-11-20 (www-rdf-comments@w3.org from October to December 2003)

From: pat hayes <phayes@ihmc.us>
Date: Thu, 20 Nov 2003 07:44:40 -0800
To: herman.ter.horst@philips.com
Cc: Brian_McBride <bwm@hplb.hpl.hp.com>, Dan Connolly <connolly@w3.org>, www-rdf-comments@w3.org
Message-Id: <p06001f0ebbe173e97cb5@[192.168.1.6]>
>When I went again over the proof of the rdfs entailment
>lemma, to double-check all the details, I noticed that
>there still seems to be an error in the lemma.
>Perhaps I didn't see this earlier because the location
>of the new error is extremely close to another, more
>conspicuous error that I noticed last week and that
>was solved with a change to the treatment of literals [1].
>
>The new error also involves literals.
>It can be described in terms of a testcase,
>and there is a clear solution to the problem.
>
>Testcase:
>graph G containing only the triples
>   v p l
>   p range C
>where l is a literal.
>Triple T:
>   b subClassOf l
>where b is a blank node.
>
>This gives a counterexample to the rdfs entailment lemma.
>According to the semantics, G rdfs-entails T.
>    (Proof: Let I be an arbitrary rdfs-interpretation that
>    satisfies G.  Then <I(v),I(l)> in IE(I(p))
>    and <I(p),I(C)> in IE(I(range)).  By the semantic
>    condition on range, we get I(l) in CEXT(I(C))=IC.
>    Since IEXT(I(subClassOf)) is reflexive on IC, it follows
>    that <I(l),I(l)> in IEXT(I(subClassOf)).
>    Therefore, I satisfies T.)
>However, the rdfs entailment rules do not derive T from G.
>    (ls derives        v p b    -blank node b allocated to l
>     rdfs3 derives     b t C
>     rdfs10 derives    b subClassOf b
>     Here it stops.)

Damn. Yes, you are right.

>Proposed solution:
>add new RDFS entailment rule:
>"rdfs14" if E contains
>    uuu aaa _:nnn where _:nnn is allocated to lll by lg
>then add:
>    uuu aaa lll
>In the construction of the RDFS closure, rule "rdfs14"
>should be freely applicable at all times.

It only needs to be done at the end of the closure-building process, 
once per new allocated-bnode-in-object triple.

I wonder if there is a less intrusive way of handling this. Adding an 
entire new rule seems rather a large cost for such a marginal case. 
This case can only arise from the idempotence of subClass and 
subProperty (replace C with P in the above example), since these are 
the only cases that can move a bnode from subject to object position. 
So we could instead add special-case additions to rules rdfs 6 and 
10. In any case it would be appropriate to remark that the only cases 
of rule rdfs14 that were not automatically redundant are cases where 
it is applied after an application of rdfs 6 or 10; and that the role 
of the rule (together with lg) is to guarantee that the closure 
contains a triple
a p l .
if and only if it contains the triple
a p _:b .
where _:b is allocated to l. As you note, the 'if' case is not 
currently guaranteed.

Another way to handle this would be to re-state the lemma so that it 
allowed two-way instantiation between a literal and its allocated 
bnode, for example by applying rule lg to all literals in both the 
antecedent *and consequent* to derive a 'literal-free' entailment 
relation.  Applying lg to the conclusion achieves the same effect as 
your proposed new rule which inverts lg. This means that lg is 
thought of less as a forward inference rule and more as a kind of 
normalization operation, like skolemizing FOL quantifiers to prepare 
them for resolution.  So define S' as the result of applying rule lg 
to S in all possible ways. Then S rdfs-entails E iff one can derive a 
graph from S' by the rdfs axioms and rules which either contains an 
XML clash or simply entails E' (not E, note). Then the proof can be 
stated referring only to literal-free graphs, using a more 
conventional Herbrand construction (and being easier to follow :-) 
This has the merit of not requiring any changes to the proof rules, 
and only a minimal change to the document text; though it would be 
appropriate to add a short explanatory paragraph.

I will find the least intrusive way to make the correction, under 
advice from the other RDF editors and system implementors.  Either 
way, it will be better with an explanatory paragraph added somewhere 
in the text to explain the intuition that allocated bnodes 'replace' 
literals to all intents and purposes.

>This new rule would be a kind of counterpart to rule lg.

Yes, a kind of inverse.  (If only we had the possibility of using 
literals as subjects, how straightforward this would all be!  Sigh.)

>It would also solve the problem with the proof.
>The problem with the proof is the following step, near
>the end:
>    given "D contains a triple sur(SH+A(s)) p sur(SH+A(o))"
>    conclude "this is an instance of" the triple spo
>    "under the instantiation mapping x ->sur(A(x))".
>The problem with this conclusion is that if o is a literal
>and if b is the blank node allocated to o by rule lg,
>then  sur(SH(o)) is b instead of o.
>So this is *not* an instance in this case.  Calling the
>instantiation mapping M, it can only be concluded that
>the triple
>    M(s) p b
>is in D.  When rule "rdfs14" would be present, we have
>    M(s) p o
>in D, as desired.
>
>I confirm that I consider the rdfs entailment lemma to be
>proved when this change would be made.
>
>===
>
>I am afraid that with the comment above I express the
>need to re-open the LC2 topic 'terHorst-proof-review'.
>I hope that this error could be corrected before the
>PR publication.  Fortunately, it could easily be
>handled with the indicated solution.

I am not sure of the exact procedure which would be appropriate at 
this stage, but expect that we can achieve an appropriate correction 
during the final editing process, provided the textual change can be 
kept to a minimum.

>===
>
>I should also mention that the change to the treatment of
>literals that was introduced last week seems to have made
>the Skolemization lemma false.

Yes, and the correction you note below should be made: specifically, 
to insert the words "containing only URI references" after "with 
respect to V" in line 2802.  (The text in the preceding paragraph 
makes clear that "new name" refers only to URI references. )

>Since plain literals now also count as vocabulary, and
>since plain literals cannot be "freely interpreted",
>the interpretation I' cannot be constructed in general
>in the way indicated in the proof.
>
>There is a straightforward solution to this problem, by
>sharpening the definition of Skolemization, for example
>in the following way:
>A Skolemization of an RDF graph G is a ground instance of
>G with respect to an 1-1 instance mapping that maps
>each blank node in G to a URI reference that does not appear
>in G.
>
>The Skolemization lemma plays only a peripheral role
>in the RDF Semantics document, since it is not on the
>path to the main completeness results.
>Still, the lemma is relevant since, as the document notes
>"Skolemization is routinely used in automatic inference
>systems".
>
>
>Herman ter Horst
>
>[1] http://lists.w3.org/Archives/Public/www-rdf-comments/2003OctDec/0161.html


-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 20 November 2003 10:44:58 UTC