- From: <herman.ter.horst@philips.com>
- Date: Fri, 21 Nov 2003 18:58:40 +0100
- To: pat hayes <phayes@ihmc.us>
- Cc: Brian_McBride <bwm@hplb.hpl.hp.com>, Dan Connolly <connolly@w3.org>, www-rdf-comments@w3.org
I can understand the desire to arrive at a minimal-edit solution to the problem. Moreover, I assume that the primary purpose remains what is stated in the first sentence of the document: "This is a specification of a precise semantics, and corresponding complete systems of inference rules, for the Resource Description Framework (RDF) and RDF Schema (RDFS)." Completeness has not been met. It has been proved that the system of RDFS rules now on the table is not complete. The system would become complete by the addition of one rule. It seems to me that this is the only minimal-edit change that achieves the purpose of the document. The document is still in the stage before PR, isn't it? For the alternatives you describe below, it seems that either completeness is unclear, or completeness may be provable but then at the cost of much more than minimal edits. See below for more detailed comments. >>When I went again over the proof of the rdfs entailment >>lemma, to double-check all the details, I noticed that >>there still seems to be an error in the lemma. >>Perhaps I didn't see this earlier because the location >>of the new error is extremely close to another, more >>conspicuous error that I noticed last week and that >>was solved with a change to the treatment of literals [1]. >> >>The new error also involves literals. >>It can be described in terms of a testcase, >>and there is a clear solution to the problem. >> >>Testcase: >>graph G containing only the triples >> v p l >> p range C >>where l is a literal. >>Triple T: >> b subClassOf l >>where b is a blank node. >> >>This gives a counterexample to the rdfs entailment lemma. >>According to the semantics, G rdfs-entails T. >> (Proof: Let I be an arbitrary rdfs-interpretation that >> satisfies G. Then <I(v),I(l)> in IE(I(p)) >> and <I(p),I(C)> in IE(I(range)). By the semantic >> condition on range, we get I(l) in CEXT(I(C))=IC. >> Since IEXT(I(subClassOf)) is reflexive on IC, it follows >> that <I(l),I(l)> in IEXT(I(subClassOf)). >> Therefore, I satisfies T.) >>However, the rdfs entailment rules do not derive T from G. >> (ls derives v p b -blank node b allocated to l >> rdfs3 derives b t C >> rdfs10 derives b subClassOf b >> Here it stops.) > >Damn. Yes, you are right. > >>Proposed solution: >>add new RDFS entailment rule: >>"rdfs14" if E contains >> uuu aaa _:nnn where _:nnn is allocated to lll by lg >>then add: >> uuu aaa lll >>In the construction of the RDFS closure, rule "rdfs14" >>should be freely applicable at all times. > >It only needs to be done at the end of the closure-building process, >once per new allocated-bnode-in-object triple. This may be true, but the completeness does not depend on it. This is in itself already a somewhat complicated statement. In order to prove it, it is necessary to show that none of the other entailment rules (RDF, RDFS) can generate a new triple when applied to an RDFS closure to which rdfs14 has only been applied at the end. For this, you need to look in detail at how vocabulary items and blank nodes move between subject, predicate, object positions, for each of the 16 entailment rules. This doesn't seem to be a minimal edit. It is an optimization. Moreover, you have said earlier that the RDFS closure is now only an abstraction in the proof. > >I wonder if there is a less intrusive way of handling this. Adding an >entire new rule seems rather a large cost for such a marginal case. >This case can only arise from the idempotence of subClass and >subProperty (replace C with P in the above example), since these are >the only cases that can move a bnode from subject to object position. >So we could instead add special-case additions to rules rdfs 6 and >10. In any case it would be appropriate to remark that the only cases >of rule rdfs14 that were not automatically redundant are cases where >it is applied after an application of rdfs 6 or 10; and that the role >of the rule (together with lg) is to guarantee that the closure >contains a triple >a p l . >if and only if it contains the triple >a p _:b . >where _:b is allocated to l. As you note, the 'if' case is not >currently guaranteed. It seems to me that rdfs14 is not much less marginal than rdfs6 and rdfs10. Each application of rdfs6 or rdfs10 to a blank node that is allocated to a literal leads to an application of rdfs14. I do not believe that this case is unlikely to arise in practice. The rule for range moves items from object to subject position. When the author of an RDF graph consistently often uses literals, this leads to allocated blank nodes in subject position. The purpose is completeness, isn't it? Moreover, you introduce here some (meta-)reasoning about the order of rule applications. This can easily lead to errors. Ever since I wrestled with rule application orders in the context of the question whether the earlier version of the semantics (iff for domain and range) is complete [1], I dislike the complexity they introduce. A very attractive aspect of the proof structure that you have found for the rdfs entailment lemma is that the details can be worked out to give a complete proof, without considering things like rule application orders and the way in which items move between the three positions of a triple. > >Another way to handle this would be to re-state the lemma so that it >allowed two-way instantiation between a literal and its allocated >bnode, for example by applying rule lg to all literals in both the >antecedent *and consequent* to derive a 'literal-free' entailment >relation. Applying lg to the conclusion achieves the same effect as >your proposed new rule which inverts lg. This means that lg is >thought of less as a forward inference rule and more as a kind of >normalization operation, like skolemizing FOL quantifiers to prepare >them for resolution. So define S' as the result of applying rule lg >to S in all possible ways. Then S rdfs-entails E iff one can derive a >graph from S' by the rdfs axioms and rules which either contains an >XML clash or simply entails E' (not E, note). Then the proof can be >stated referring only to literal-free graphs, using a more >conventional Herbrand construction (and being easier to follow :-) >This has the merit of not requiring any changes to the proof rules, >and only a minimal change to the document text; though it would be >appropriate to add a short explanatory paragraph. This could indeed be an interesting way to modularize the RDFS entailment lemma, by "factoring out literals from the equation". It would need to be worked out: the statement of the lemma would be changed, the proof would be changed, including the RDFS Herbrand construction. This doesn't look like a minimal edit, however. > >I will find the least intrusive way to make the correction, under >advice from the other RDF editors and system implementors. And I will read it. As I said, I don't share your expectation that this can be done without the new rule. >Either >way, it will be better with an explanatory paragraph added somewhere >in the text to explain the intuition that allocated bnodes 'replace' >literals to all intents and purposes. This is indeed a very important point, which could indeed be strenghtened in the document. Rule rdfs14 amplifies this, by showing that literals and allocated bnodes are totally 'interchangable' on the object position of triples. > >>This new rule would be a kind of counterpart to rule lg. > >Yes, a kind of inverse. (If only we had the possibility of using >literals as subjects, how straightforward this would all be! Sigh.) Doesn't the combination of lg and rdfs14 realize the use of literals as subjects, more or less, in the 'form' of their allocated bnodes? And couldn't that be used to advantage in implementations? > >>It would also solve the problem with the proof. >>The problem with the proof is the following step, near >>the end: >> given "D contains a triple sur(SH+A(s)) p sur(SH+A(o))" >> conclude "this is an instance of" the triple spo >> "under the instantiation mapping x ->sur(A(x))". >>The problem with this conclusion is that if o is a literal >>and if b is the blank node allocated to o by rule lg, >>then sur(SH(o)) is b instead of o. >>So this is *not* an instance in this case. Calling the >>instantiation mapping M, it can only be concluded that >>the triple >> M(s) p b >>is in D. When rule "rdfs14" would be present, we have >> M(s) p o >>in D, as desired. I would like to draw attention to the very simple way in which *the proof of the RDFS entailment lemma is made complete* in this way. No special assumptions necessary. A direct conclusion. >> >>I confirm that I consider the rdfs entailment lemma to be >>proved when this change would be made. >> >>=== >> >>I am afraid that with the comment above I express the >>need to re-open the LC2 topic 'terHorst-proof-review'. >>I hope that this error could be corrected before the >>PR publication. Fortunately, it could easily be >>handled with the indicated solution. > >I am not sure of the exact procedure which would be appropriate at >this stage, but expect that we can achieve an appropriate correction >during the final editing process, provided the textual change can be >kept to a minimum. The current document does not achieve completeness. I cannot imagine that when the new rule is not included, and when only minimal edits are made to the document, that an understandably complete version is obtained. I have not seen the textual change, so I cannot say that I am satisfied. == This is not a completely regular situation, I am sorry about that. I send approval, and five days later, I report a new error in the proof and thereby withdraw the approval of the proof-review. In between there was a weekend in which I also thought that this was finished. But there were also other things about this review that were perhaps not completely regular. The object of review underwent changes during the review. With each change, the reviewer needed, in principle, to re-review everything in the light of the change. There were non-trivial changes. When I reconsidered the material afterward, in the light of all the changes, I found two points that needed to be mentioned: the point about Skolemization and the problem that remains with RDFS completeness... [1] http://lists.w3.org/Archives/Public/www-rdf-comments/2003AprJun/0042.html
Received on Friday, 21 November 2003 12:59:24 UTC