Re: RDF Semantics: rdfs entailment lemma / terHorst-proof-review from herman.ter.horst@philips.com on 2003-11-21 (www-rdf-comments@w3.org from October to December 2003)

From: <herman.ter.horst@philips.com>
Date: Fri, 21 Nov 2003 18:58:40 +0100
To: pat hayes <phayes@ihmc.us>
Cc: Brian_McBride <bwm@hplb.hpl.hp.com>, Dan Connolly <connolly@w3.org>, www-rdf-comments@w3.org
Message-ID: <OFDC3DEB97.6585E2E3-ONC1256DE5.004E2EE6-C1256DE5.0062CFE8@diamond.philips.com>
I can understand the desire to arrive at a minimal-edit
solution to the problem.

Moreover, I assume that the primary purpose remains what
is stated in the first sentence of the document:
"This is a specification of a precise semantics, and
corresponding complete systems of inference rules, for
the Resource Description Framework (RDF) and RDF Schema
(RDFS)."

Completeness has not been met.  It has been proved that 
the system of RDFS rules now on the table is not complete.
The system would become complete by the addition of one rule.
It seems to me that this is the only minimal-edit change
that achieves the purpose of the document.
The document is still in the stage before PR, isn't it?
For the alternatives you describe below, it seems that
either completeness is unclear, or completeness may be
provable but then at the cost of much more than minimal
edits.
See below for more detailed comments.

>>When I went again over the proof of the rdfs entailment
>>lemma, to double-check all the details, I noticed that
>>there still seems to be an error in the lemma.
>>Perhaps I didn't see this earlier because the location
>>of the new error is extremely close to another, more
>>conspicuous error that I noticed last week and that
>>was solved with a change to the treatment of literals [1].
>>
>>The new error also involves literals.
>>It can be described in terms of a testcase,
>>and there is a clear solution to the problem.
>>
>>Testcase:
>>graph G containing only the triples
>>   v p l
>>   p range C
>>where l is a literal.
>>Triple T:
>>   b subClassOf l
>>where b is a blank node.
>>
>>This gives a counterexample to the rdfs entailment lemma.
>>According to the semantics, G rdfs-entails T.
>>    (Proof: Let I be an arbitrary rdfs-interpretation that
>>    satisfies G.  Then <I(v),I(l)> in IE(I(p))
>>    and <I(p),I(C)> in IE(I(range)).  By the semantic
>>    condition on range, we get I(l) in CEXT(I(C))=IC.
>>    Since IEXT(I(subClassOf)) is reflexive on IC, it follows
>>    that <I(l),I(l)> in IEXT(I(subClassOf)).
>>    Therefore, I satisfies T.)
>>However, the rdfs entailment rules do not derive T from G.
>>    (ls derives        v p b    -blank node b allocated to l
>>     rdfs3 derives     b t C
>>     rdfs10 derives    b subClassOf b
>>     Here it stops.)
>
>Damn. Yes, you are right.
>
>>Proposed solution:
>>add new RDFS entailment rule:
>>"rdfs14" if E contains
>>    uuu aaa _:nnn where _:nnn is allocated to lll by lg
>>then add:
>>    uuu aaa lll
>>In the construction of the RDFS closure, rule "rdfs14"
>>should be freely applicable at all times.
>
>It only needs to be done at the end of the closure-building process, 
>once per new allocated-bnode-in-object triple.

This may be true, but the completeness does not depend on it. 
This is in itself already a somewhat complicated statement.
In order to prove it, it is necessary to show that none of the
other entailment rules (RDF, RDFS) can generate a new triple
when applied to an RDFS closure to which rdfs14 has only been
applied at the end.  For this, you need to look in detail at
how vocabulary items and blank nodes move between subject,
predicate, object positions, for each of the 16 entailment rules.
This doesn't seem to be a minimal edit.  It is an optimization.
Moreover, you have said earlier that the RDFS closure is now
only an abstraction in the proof.

>
>I wonder if there is a less intrusive way of handling this. Adding an 
>entire new rule seems rather a large cost for such a marginal case. 
>This case can only arise from the idempotence of subClass and 
>subProperty (replace C with P in the above example), since these are 
>the only cases that can move a bnode from subject to object position. 
>So we could instead add special-case additions to rules rdfs 6 and 
>10. In any case it would be appropriate to remark that the only cases 
>of rule rdfs14 that were not automatically redundant are cases where 
>it is applied after an application of rdfs 6 or 10; and that the role 
>of the rule (together with lg) is to guarantee that the closure 
>contains a triple
>a p l .
>if and only if it contains the triple
>a p _:b .
>where _:b is allocated to l. As you note, the 'if' case is not 
>currently guaranteed.

It seems to me that rdfs14 is not much less marginal than rdfs6 and
rdfs10.  Each application of rdfs6 or rdfs10 to a blank node that
is allocated to a literal leads to an application of rdfs14.
I do not believe that this case is unlikely to arise in practice.
The rule for range moves items from object to subject position. 
When the author of an RDF graph consistently often uses literals,
this leads to allocated blank nodes in subject position.
The purpose is completeness, isn't it?

Moreover, you introduce here some (meta-)reasoning about the order
of rule applications.  This can easily lead to errors.
Ever since I wrestled with rule application orders in the context 
of the question whether the earlier version of the semantics 
(iff for domain and range) is complete [1], I dislike the complexity 
they introduce.
A very attractive aspect of the proof structure that you have found 
for the rdfs entailment lemma is that the details can be worked out to 
give a complete proof, without considering things like rule application
orders and the way in which items move between the three positions of 
a triple.

>
>Another way to handle this would be to re-state the lemma so that it 
>allowed two-way instantiation between a literal and its allocated 
>bnode, for example by applying rule lg to all literals in both the 
>antecedent *and consequent* to derive a 'literal-free' entailment 
>relation.  Applying lg to the conclusion achieves the same effect as 
>your proposed new rule which inverts lg. This means that lg is 
>thought of less as a forward inference rule and more as a kind of 
>normalization operation, like skolemizing FOL quantifiers to prepare 
>them for resolution.  So define S' as the result of applying rule lg 
>to S in all possible ways. Then S rdfs-entails E iff one can derive a 
>graph from S' by the rdfs axioms and rules which either contains an 
>XML clash or simply entails E' (not E, note). Then the proof can be 
>stated referring only to literal-free graphs, using a more 
>conventional Herbrand construction (and being easier to follow :-) 
>This has the merit of not requiring any changes to the proof rules, 
>and only a minimal change to the document text; though it would be 
>appropriate to add a short explanatory paragraph.

This could indeed be an interesting way to modularize the RDFS entailment
lemma, by "factoring out literals from the equation".
It would need to be worked out: the statement of the lemma would
be changed, the proof would be changed, including the RDFS Herbrand
construction.
This doesn't look like a minimal edit, however.

>
>I will find the least intrusive way to make the correction, under 
>advice from the other RDF editors and system implementors. 

And I will read it.  As I said, I don't share your expectation that
this can be done without the new rule.

>Either 
>way, it will be better with an explanatory paragraph added somewhere 
>in the text to explain the intuition that allocated bnodes 'replace' 
>literals to all intents and purposes.

This is indeed a very important point, which could indeed be strenghtened
in the document.  Rule rdfs14 amplifies this, by showing
that literals and allocated bnodes are totally 'interchangable' on the
object position of triples.

>
>>This new rule would be a kind of counterpart to rule lg.
>
>Yes, a kind of inverse.  (If only we had the possibility of using 
>literals as subjects, how straightforward this would all be!  Sigh.)

Doesn't the combination of lg and rdfs14 realize the use of literals
as subjects, more or less, in the 'form' of their allocated bnodes?
And couldn't that be used to advantage in implementations?

>
>>It would also solve the problem with the proof.
>>The problem with the proof is the following step, near
>>the end:
>>    given "D contains a triple sur(SH+A(s)) p sur(SH+A(o))"
>>    conclude "this is an instance of" the triple spo
>>    "under the instantiation mapping x ->sur(A(x))".
>>The problem with this conclusion is that if o is a literal
>>and if b is the blank node allocated to o by rule lg,
>>then  sur(SH(o)) is b instead of o.
>>So this is *not* an instance in this case.  Calling the
>>instantiation mapping M, it can only be concluded that
>>the triple
>>    M(s) p b
>>is in D.  When rule "rdfs14" would be present, we have
>>    M(s) p o
>>in D, as desired.

I would like to draw attention to the very simple way in which
*the proof of the RDFS entailment lemma is made complete* in
this way.
No special assumptions necessary.  A direct conclusion.

>>
>>I confirm that I consider the rdfs entailment lemma to be
>>proved when this change would be made.
>>
>>===
>>
>>I am afraid that with the comment above I express the
>>need to re-open the LC2 topic 'terHorst-proof-review'.
>>I hope that this error could be corrected before the
>>PR publication.  Fortunately, it could easily be
>>handled with the indicated solution.
>
>I am not sure of the exact procedure which would be appropriate at 
>this stage, but expect that we can achieve an appropriate correction 
>during the final editing process, provided the textual change can be 
>kept to a minimum.

The current document does not achieve completeness.
I cannot imagine that when the new rule is not included, and when only 
minimal edits are made to the document, that an understandably complete 
version is obtained.
I have not seen the textual change, so I cannot say that I am satisfied.

==

This is not a completely regular situation, I am sorry about that.
I send approval, and five days later, I report a new error in the
proof and thereby withdraw the approval of the proof-review.
In between there was a weekend in which I also thought that
this was finished.
But there were also other things about this review that were perhaps 
not completely regular.
The object of review underwent changes during the review.
With each change, the reviewer needed, in principle, to re-review 
everything in the light of the change.  There were non-trivial
changes.
When I reconsidered the material afterward, in the light of all the 
changes, I found two points that needed to be mentioned: the 
point about Skolemization and the problem that remains with 
RDFS completeness...


[1]  http://lists.w3.org/Archives/Public/www-rdf-comments/2003AprJun/0042.html
Received on Friday, 21 November 2003 12:59:24 UTC