Re: The entailment rules (was: Re: RDF Semantics Editors Draft?) from Pat Hayes on 2012-05-21 (public-rdf-wg@w3.org from May 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 21 May 2012 12:33:14 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: public-rdf-wg WG <public-rdf-wg@w3.org>
Message-Id: <F0AE1996-7425-4098-B62F-3582284E8376@ihmc.us>
On May 21, 2012, at 7:49 AM, Richard Cyganiak wrote:

> Hi Pat,
> 
> Reviving this thread from last week…
> 
> On 14 May 2012, at 06:11, Pat Hayes wrote:
>> Then let us have a wholly new document which just describes the rules, or perhaps in an appendix to the Primer. I would like to separate the rules from the semantics: they really are separate topics.
> 
> The rules are important because they make the the Semantics stuff accessible to readers without a background in model theory.

But that is exactly what I am afraid they do not do. They might make readers FEEL that they understand the semantics, but that seems to be a (dangerous) illusion. 

> If we cut this connection, e.g. by not claiming completeness of the rules

I am against claiming completeness for several reasons.

1. Proving completeness is really an arcane business which even people who know basic model theory might well not want to see. I very much doubt that more than three or four people, all of whom I know personally, have even attempted to understand the details of the proofs in the 2004 appendices. Minor but logically peculiar aspects of the RDF specs (such as the presence of rdf:XMLLIteral in the basic RDF machinery, and the irrational prohibition against literals in subject position) meant that these proofs had to introduce a lot of non-standard and confusing extra machinery, none of which is really important for understanding RDF itself.

2. It is also quite hard to get right. There are errors in the proofs in the 2004 appendices, and those results are not in fact correct. The errors were not found until months after publication, and after a great deal of extremely careful checking. I still don't know how Herman terHorst was able to figure out the bug in the RDFS rules. 

3. In order to understand the very idea of completeness and what it means, one has to have already understood enough basic model theory that the model-theoretic description would be readable in any case.

4. If one is really relying on the rules, then completeness is kind of irrelevant. Most reasoners will not use those exact rules in any case, so one would have to re-establish completeness for the rules one was actually using, if completeness were in fact important. (Or for the tableaux reasoner or the resolution reasoner, etc..) And this would, as I say, require a far deeper grasp of the model theory than just understanding it in the first place. 

> , or hiding it away in a document that no one reads like the test cases, we achieve two things:
> 
> 1. Less people will pay attention to the model theory because fewer readers can make sense of it.

I am confident that people who need to understand it will be able to make sense of it. It really isnt very hard, its just odd if you aren't used to reading things in this way. Some model theories really are tricky to follow, eg the whole topic of finite model satisfiablity, but compared to this, RDF is really noddy stuff.

> 
> 2. We increase demand for *someone else*, *outside of W3C*, to write up a more easily-digestible version of the semantics, and chances are that this will not get the same level of scrutiny and review.
> 
>> The rules are a kind of introduction to implementing reasoners, although they are pretty awful when viewed in this light. 
> 
> You may have intended the rules as an introduction to implementing reasoners. But this is *not* why I and Ivan and others care about them.
> 
> We like the rules because they are easy to understand. Reading the rules is a great way of understanding, for example, how rdfs:domain or rdfs:subPropertyOf really works. Trying to work out how, say, rdfs:subPropertyOf works just based on the model theory takes a good few hours of work and study because one first has to understand the document's formalisms and get a good part of the entailment regime into one's head. Each of the rules, on the other hand, can be read and understood pretty much in isolation.

Thats exactly why they are dangerous. You get the nice impression that you can isolate each nugget of meaning in a rule or two, when in fact these rules interact in all kinds of ways, not all of them easy to see. I think it is actually harder to understand these interaction patterns than to understand the basic model-theoretic conditions. (For me, it is *much* harder.) Also, relying on this style gives the impression that all semantics can be described this way, which is completely false. OWL entailment cannot be reduced to any such set of rules, for example. 

Don't misunderstand me. I'm all for providing a set of rules which is as complete as we can make it, in a document that people will actually read. (Perhaps in an appendix to the primer?) But I'm arguing against expressing the actual semantics using this kind of formalism. I'm also reluctant to claim, and strongly averse to publishing a proof, that these rules are formally complete. 

> That makes the relevant rule a great addition to the less formal language that defines rdfs:subPropertyOf in RDF Schema.
> 
>>>> We will try to make this as complete as we can, but will not claim nor set out to prove that they are indeed complete. (That alone will get rid of pages of impenetrable maths.) 
>>> 
>>> Well, we have a form that many readers reportedly find easier to understand, and one that they find harder. As long as they're known to be equivalent,
>> 
>> They are not equivalent.
> 
> The RDF Semantics document contains proofs of equivalence between the various rulesets and entailment regimes.

Well, those proofs have bugs in them. But more seriously, completeness is not equivalence. Two rule sets might both be complete but still not be equivalent, and there are complete inference methods which do not use if-then rules at all, but instead grow trees of partial interpretation structures. 

>>> I think it doesn't matter too much which one is informative and which one is normative.
>> 
>> I think it matters hugely.
> 
> Why? You have proven that they are equivalent.

See above. But the main point is that making a given set of rules normative means, as I understand it, that the use of other sets of rules, and other inference mechanisms, violates the specifications. (If you don't think it means this, perhaps you can say, briefly, what it means to say that a set of rules is normative.) 

>>>> I think they should be in a separate doument entirely, perhaps as part of the test cases,
>>> 
>>> This would be inappropriate. The test cases are supposed to be machine-processable so that one can have a test harness that automatically verifies an implementation.
>> 
>> I don't see why this would prohibit the rules being there. All a rule is, is a pattern of an entailment. The test cases have entailments in them now. It would be easy to rephrase the rules in a test-case-like format, using tables of patterns. They would be used to test reasoners, which should be able to demonstrate the conclusion when told to assume the antecedents. 
> 
> The rules are good for teaching the semantics. This means they should be someplace where they will be read and where people that look for them will find them. Test case documents are read by vendors who want to certify their implementation, not by general users trying to work out what rdfs:subPropertyOf does.

OK, then lets put the rules in an appendix to the primer. Or even a separate (new) document altogether. 

> 
>>>> and no claims should be made as to their completeness, and no long and extremely opaque (and flawed) completeness proofs should be included, even in an appendix. Nobody gives a damn about completeness in any case.) 
>>> 
>>> -1. I do give a damn about the completeness of the rules.
>> 
>> I really do not understand your position. How can you care about completeness when you dont want us to even have the content which allows us to define the very concept of completeness? 
> 
> 1. I believe that we should present all content in the most digestible form possible. This is why I like the rules.
> 
> 2. If we present the same content in multiple forms (which can be a good idea), then it is important to characterize the relationship between the different representations. Stating that they are equivalent is a strong characterization.
> 
> 3. The form that's most readily accessible should be normative.

Accessible in what sense? For example, the OWL specs have some intricate documents establishing the semantic relationships between RDF and various flavors of OWL. IF RDF semantics were normatively expressed using rules, I don't even know how this kind of sematnic relationship to another language could be stated. I aso don't know how one would set out to relate a set of such rules to a consistency checking engine or a tableaux reasoner. 

> This doesn't diminish the value of the model theory as a means for proving that the rules are complete.
> 
> 4. Ultimately, correctness is not what's written in some dusty tome sitting on W3C's webserver. Correctness is what's actually interoperably implemented. In that regard, RDF Semantics is, I have to say, a failure. I want to help making RDF Semantics easier to implement, and closer connected to what's actually being implemented out there.

I really don't know what you mean by "implementing" a model theory. The semantics defines the notions of truth, consistency and entailment. If these definitions are in a dusty tome, that is fine by me. Implementations do things like transform graphs, draw conclusions, and have graphs interact with other formalisms which might also have a semantics. You only have to check the implementation against the semantics once. It can then go off and do its thing without opening the dusty tome every time, which is fine. Think of society and the law. Most of the time we all get along great without checking the actual laws, but when they do need to be checked, its important to get them exactly right, even if it takes a lawyer to follow what they are supposed to be saying. 

> 
>>> And making the model theory normative instead of the rules saves us from typos in the normative parts how?
>> 
>> The actual statement of the truth conditions can be checked fairly easily. Small errors in rules, and especially missing cases needed for completeness, are MUCH harder to spot. There are errors in the 2004 rules, in fact, which were only discovered months after publication and after the most thorough vetting by some extremely careful people (try reading the email logs shortly before LC to see how anal this got at times). Even the early implementers (Jos deRoo had a rule engine in 2003 which we used often to check completeness) did not notice them. 
> 
> So you are saying that transforming the semantics into a form that is easy to implement is next to impossible

I think its meaningless, as stated. 

> , and can't even be done with the considerable community attention that comes with the W3C standardization process. So implementations are basically guaranteed to be broken.

No. Broken means making inferences that are invalid, incorrect. Im confident that almost all implementations are correct in this sense (or when they are not, there is a good reason why). Completeness means not missing any correct inference at all, no matter how piddling or unimportant it is. Completeness is hard to prove and IMO not worth having, most of the time. Hardly any applications in the linked data world will be complete in this sense. (Exceptions might include validity checkers for conformance, and engines designed to find inconsistencies in large complicated K bases.) 

> This strengthens the case that making RDF Semantics a normative recommendation in the first place was a big mistake.

Possibly, although if it had not been, then OWL and RIF would not even be connected to RDF by this time. 

> 
>> Semantics does not describe or even mandate ANY behavior. An "engine" which inputs RDF and does exactly nothing to it except print it back out unchanged, is a perfectly correct and conformant RDF engine.
> 
> That is nonsense. There is no such thing as a conformant RDF engine.

I meant, it would not violate the 2004 Semantics specs. 

> For such a thing to exist, there would have to be a spec that defines conformance criteria for such a beast. I would be *thrilled* if this existed, and I would be *thrilled* if RDF 1.1 Semantics would define some conformance criteria.
> 
> (Also, if all one has to do to conform to RDF Semantics is sit there and print back the triples unchanged, then the spec could have said that in a lot less words.)

My point was that a semantics does not mandate that any inferences actually get made. It simply tells you how to determine which inferences preserve truth and which do not. Behavior of any engine can be tested against this, but that does not make the engine into an implementation of the semantics. 

> 
>>>> What would it mean to make those rules normative? Would an efficient tableax-based reasoner be then illegal?
>>> 
>>> No, why would it?
>> 
>> Because if would not be using rules at all; but the rules are normative. Which I take to mean, if you aren't using the rules, you are are doing it wrong.
> 
> If you get the same results by whatever means, you aren't.

If you follow this through and try to make it precise, (eg what is a "result"?), you will gradually re-invent a theory of truth, which will probably be model theory. 

> I said this several times already in this thread.
> 
> And still, the (vast?) majority of RDFS “reasoners” are rule-based.
> 
>>>>>> I think we all agree that this is *not* the document you should be looking at in your first encounter with RDF.
>>>> 
>>>> If you know nothing about logical methods, inference engines, machine inference? Yes, it might not be a good starting point if you are this ignorant, indeed.
>>> 
>>> Inappropriate display of arrogance.
>> 
>> Why? There seems to be a ground assumption here that our typical reader knows absolutely nothing about semantics, logic, reasoners, reasoning, or indeed almost anything about the technical field in which these specifications are situated
> 
> For RDF to be successful, many people — domain modellers, implementers, technical writers — need to understand, for example, what can be inferred from a handful of triples under RDFS-entailment. This doesn't mean we need to write for idiots. But it means we need to write for non-logicians.

Having enough of a grasp of semantics to be able to understand RDF and RDFS hardly amounts to being a logician. 

But even if I acknowledge your point, it is about writing tutorials, not about writing the specs. Its not the job of a technical specification to teach the basics of a subject to newbies. I confess to having allowed myself to be persuaded to try to make the 2004 Semantics document have something of an introductory flavor, which compromises the document by blurring tutorial with technical specification. As often happens when one tries to do two things at once, you do neither of them well. I want to get these two aspects separated back out. Let us have a tight, crisp semantics document which specifies the actual normative semantics as exactly as possible with minimal fuss, and a separate semantics-primer document which sets out to explain the ideas intutively and which can provide rules and so on for, as you say, non-logicians. 

(You know, you and Ivan are the only people who have resisted the idea of model-theoretic semantics with such vehemence. Most of the feedback I have seen regarding the RDF Semantics has been about the details (like, why is the range of a datatype L2V mapping IR rather than LV?), questions which clearly can only arise in the mind of someone who is at least following the main ideas.)

> 
>>>>> This problem could be solved easily with the insertion of some language in the introduction pointing to the Primer in the first paragraph (instead of just the Vocab and Concepts, as it does now).
>>>> 
>>>> See above. It refers to the primer in the second paragraph of the document. 
>>> 
>>> In the section that nobody reads. The Introduction is silent on the topic of which other documents you should already be familiar with before starting this one.
>> 
> 
>> By all means let us copy paragraph 2 of the Status section into the Introduction, where people might possibly see it.  I presume this editorial decision will apply to all the technical documents.
> 
> All documents except RDF Semantics already spend quite a bit of their introductions explaining their role in the big picture of RDF and their relationships to the other documents. RDF Semantics is the odd one out. But ok, we can easily fix that.

True, and that needs fixing. 

> 
>>>> I couild try to write a short explanation of how to test a proposed inference for validity by constructing a formal counterexample. This might give naive readers a better grasp of how to connect interpretations with rules, in fact. It could also go into the 'test cases' document, or even into the primer somewhere (?) 
>>> 
>>> Not keen on having this in the test cases. Covering more of the semantics in the Primer is an intriguing idea.
>> 
>> It might help. It can be expressed in non-mathematical terms. I will take a stab at producing some text for this.
> 
> If you need a clueless non-mathematician to test this on, I'm happy to read and comment.

OK, you're on. :-)

Pat

> 
> Best,
> Richard

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 21 May 2012 17:34:13 UTC