# Re: RDF Semantics: rdfs entailment lemma / terHorst-proof-review

From: pat hayes <phayes@ihmc.us>
Date: Thu, 20 Nov 2003 07:44:40 -0800
Message-Id: <p06001f0ebbe173e97cb5@[192.168.1.6]>

Cc: Brian_McBride <bwm@hplb.hpl.hp.com>, Dan Connolly <connolly@w3.org>, www-rdf-comments@w3.org
```
>When I went again over the proof of the rdfs entailment
>lemma, to double-check all the details, I noticed that
>there still seems to be an error in the lemma.
>Perhaps I didn't see this earlier because the location
>of the new error is extremely close to another, more
>conspicuous error that I noticed last week and that
>was solved with a change to the treatment of literals [1].
>
>The new error also involves literals.
>It can be described in terms of a testcase,
>and there is a clear solution to the problem.
>
>Testcase:
>graph G containing only the triples
>   v p l
>   p range C
>where l is a literal.
>Triple T:
>   b subClassOf l
>where b is a blank node.
>
>This gives a counterexample to the rdfs entailment lemma.
>According to the semantics, G rdfs-entails T.
>    (Proof: Let I be an arbitrary rdfs-interpretation that
>    satisfies G.  Then <I(v),I(l)> in IE(I(p))
>    and <I(p),I(C)> in IE(I(range)).  By the semantic
>    condition on range, we get I(l) in CEXT(I(C))=IC.
>    Since IEXT(I(subClassOf)) is reflexive on IC, it follows
>    that <I(l),I(l)> in IEXT(I(subClassOf)).
>    Therefore, I satisfies T.)
>However, the rdfs entailment rules do not derive T from G.
>    (ls derives        v p b    -blank node b allocated to l
>     rdfs3 derives     b t C
>     rdfs10 derives    b subClassOf b
>     Here it stops.)

Damn. Yes, you are right.

>Proposed solution:
>"rdfs14" if E contains
>    uuu aaa _:nnn where _:nnn is allocated to lll by lg
>    uuu aaa lll
>In the construction of the RDFS closure, rule "rdfs14"
>should be freely applicable at all times.

It only needs to be done at the end of the closure-building process,
once per new allocated-bnode-in-object triple.

I wonder if there is a less intrusive way of handling this. Adding an
entire new rule seems rather a large cost for such a marginal case.
This case can only arise from the idempotence of subClass and
subProperty (replace C with P in the above example), since these are
the only cases that can move a bnode from subject to object position.
10. In any case it would be appropriate to remark that the only cases
of rule rdfs14 that were not automatically redundant are cases where
it is applied after an application of rdfs 6 or 10; and that the role
of the rule (together with lg) is to guarantee that the closure
contains a triple
a p l .
if and only if it contains the triple
a p _:b .
where _:b is allocated to l. As you note, the 'if' case is not
currently guaranteed.

Another way to handle this would be to re-state the lemma so that it
allowed two-way instantiation between a literal and its allocated
bnode, for example by applying rule lg to all literals in both the
antecedent *and consequent* to derive a 'literal-free' entailment
relation.  Applying lg to the conclusion achieves the same effect as
your proposed new rule which inverts lg. This means that lg is
thought of less as a forward inference rule and more as a kind of
normalization operation, like skolemizing FOL quantifiers to prepare
them for resolution.  So define S' as the result of applying rule lg
to S in all possible ways. Then S rdfs-entails E iff one can derive a
graph from S' by the rdfs axioms and rules which either contains an
XML clash or simply entails E' (not E, note). Then the proof can be
stated referring only to literal-free graphs, using a more
conventional Herbrand construction (and being easier to follow :-)
This has the merit of not requiring any changes to the proof rules,
and only a minimal change to the document text; though it would be
appropriate to add a short explanatory paragraph.

I will find the least intrusive way to make the correction, under
advice from the other RDF editors and system implementors.  Either
way, it will be better with an explanatory paragraph added somewhere
in the text to explain the intuition that allocated bnodes 'replace'
literals to all intents and purposes.

>This new rule would be a kind of counterpart to rule lg.

Yes, a kind of inverse.  (If only we had the possibility of using
literals as subjects, how straightforward this would all be!  Sigh.)

>It would also solve the problem with the proof.
>The problem with the proof is the following step, near
>the end:
>    given "D contains a triple sur(SH+A(s)) p sur(SH+A(o))"
>    conclude "this is an instance of" the triple spo
>    "under the instantiation mapping x ->sur(A(x))".
>The problem with this conclusion is that if o is a literal
>and if b is the blank node allocated to o by rule lg,
>then  sur(SH(o)) is b instead of o.
>So this is *not* an instance in this case.  Calling the
>instantiation mapping M, it can only be concluded that
>the triple
>    M(s) p b
>is in D.  When rule "rdfs14" would be present, we have
>    M(s) p o
>in D, as desired.
>
>I confirm that I consider the rdfs entailment lemma to be
>proved when this change would be made.
>
>===
>
>I am afraid that with the comment above I express the
>need to re-open the LC2 topic 'terHorst-proof-review'.
>I hope that this error could be corrected before the
>PR publication.  Fortunately, it could easily be
>handled with the indicated solution.

I am not sure of the exact procedure which would be appropriate at
this stage, but expect that we can achieve an appropriate correction
during the final editing process, provided the textual change can be
kept to a minimum.

>===
>
>I should also mention that the change to the treatment of
>literals that was introduced last week seems to have made
>the Skolemization lemma false.

Yes, and the correction you note below should be made: specifically,
to insert the words "containing only URI references" after "with
respect to V" in line 2802.  (The text in the preceding paragraph
makes clear that "new name" refers only to URI references. )

>Since plain literals now also count as vocabulary, and
>since plain literals cannot be "freely interpreted",
>the interpretation I' cannot be constructed in general
>in the way indicated in the proof.
>
>There is a straightforward solution to this problem, by
>sharpening the definition of Skolemization, for example
>in the following way:
>A Skolemization of an RDF graph G is a ground instance of
>G with respect to an 1-1 instance mapping that maps
>each blank node in G to a URI reference that does not appear
>in G.
>
>The Skolemization lemma plays only a peripheral role
>in the RDF Semantics document, since it is not on the
>path to the main completeness results.
>Still, the lemma is relevant since, as the document notes
>"Skolemization is routinely used in automatic inference
>systems".
>
>
>Herman ter Horst
>