Re: [jena-dev] Re: Use cases for Reification in RDF Triplestores from pat hayes on 2003-01-10 (www-rdf-interest@w3.org from January 2003)

From: pat hayes <phayes@ai.uwf.edu>
Date: Fri, 10 Jan 2003 12:34:00 -0800
To: Bob MacGregor <macgregor@ISI.EDU>
Cc: seth@robustai.net, Dave Reynolds <der@hplb.hpl.hp.com>, www-rdf-interest@w3.org
Message-Id: <p05111b21ba44ca05fc94@[10.0.100.253]>
(I should probably say that all the comments below are made by me 
personally, not in the name of the RDF WG, which would be much more 
diplomatic in its responses than I am.)

>At 04:25 PM 1/9/2003 -0800, pat hayes wrote:
>>>(3) If you want to explicity refer to a statement in RDF (which 
>>>many KR systems
>>>do on a routine basis
>>Evidence for this? I do not know of any.
>
>There seems to be an amazing disconnect here, and I'm not sure where the
>divergence first occurs.  I've been assuming that the terms "statement" and
>"logical sentence" are synonymous.  If that's a mistake, please let me know.

A statement is made by using a sentence, and the connection is so 
close that most logical systems do not bother with the distinction. 
The critical word here is 'use', however.

>So, let me stick with 'sentence', in the KIF sense.  Let S1 be a sentence.
>I might want to assert that "S1 is true", "S1 is false", "S1 is true 
>in context
>C1".  In these cases, I'm each case I'm explicitly referring to the
>sentence/statement S1.

No, you aren't. At least, not usually. To say that S1 is true, you 
(usually) simply assert S1, ie you USE the sentence; you make it into 
an  axiom. That is not the same as referring to it. You do not 
(usually) *refer to* S1 by name, and then use a truth-predicate to 
say that it is true. You *can* do that, but it is a very weird (and 
dangerous) way to proceed, and can't be done in most KR systems since 
they don't have truth predicates, for good reasons. Similarly, to say 
that S1 is false, you assert (not S1), and to say that something is 
true in a context you presumably use some logic of contexts.
In KIF-3 (which does have a truth-predicate) the difference is 
between writing, say:

(forall (?x)(implies (P ?x)(Q a))

as an axiom, ie asserting it is true, ie using it, on the one hand, and writing

(wtr (quote (forall (?x)(implies (P ?x)(Q a))))

which REFERS to the sentence by using its quoted form, then says 
explicitly that it is true by using the truth predicate 'wtr', whose 
truth conditions by the way are rather subtle. Like I say, one CAN do 
that, but its not the normal way to proceed.

>
>Or, I might want to assert that "The likelihood of S1 being true has 
>probability P".  Again,
>I'm explicitly referring to a sentence.

That one, maybe, I will concede you: but even there, one usually 
simply asserts the sentence with an associated probability, rather 
than *describing* the sentence. That is, you usually say, "the 
probability of S1 is P", in effect.

>
>Or, I might want to assert that "S1 is true at time T1".  Once again, a rather
>ordinary reference to a sentence.

Agasin, you usually use some temporal logic (eg a modal logic) or use 
explictly time-sensitive predicates (location at a time, temperature 
at a time) to describe this, rather than REFERRING to sentences and 
relating them to times. At least, everyone else I know does. You may 
be an exception, of course :-).

>
>Now, all of these examples can be handled by the nesting of 
>sentences/statements
>within other sentences/statements.  To me, a statement S2 that contains
>S1 as an argument is explicitly referring to S1.

That is just flat wrong. Sentences do not refer to their components. 
For example, the previous sentence in this paragraph has a 
subcomponent which is a verb phrase "refer to their components"; the 
sentence however does not REFER to that verb phrase. It USES it, one 
might say, but not by reference.

>
>Hence, we have another possible source of disconnect.  Pat claims not to know
>of any cases where one would want to refer to a statement.  To me, all of the
>above cases are evidence of this.  Again, maybe some other terminology should
>be used, so if someone can suggest the appropriate restatement of
>the above examples, I'd be glad to see it.

Check out a textbook on the topic of use versus mention. Quine and 
Tarski have both said quite a bit about this topic.

Look, suppose I write an axiom, say

(P and Q) or (R and S)

By treating it as an axiom, I'm claiming it is true: i'm asserting 
it. (Treating it as a goal or a query would be a different thing to 
do with it.). BUt I'm not *mentioning* it or *referring* to it. If I 
were to say, "that axiom has no quantifers in it" then I would be 
referring to it, ie talking ABOUT it. To refer to something, you USE 
an expression which REFERS to the thing. Propositional expressions 
refer to truth-values, not expressions. They certainly do not refer 
to themselves.

When I use a sentence, I (automatically) use its sub-expressions, eg 
the "(R and S)" part of the above axiom. But I don't REFER to them, 
since the

>
>Some languages, including RDF, forbid
>nested statements.  In that case, if we still want to make any of the above
>statements, we need some other mechanism.  Some systems use quotation
>for this.

That is just a basic error, in my view, ie to use quotation to do 
nesting. It doesn't, unless you invoke an implicit dereferencing step 
which amounts to an implicit truth-predicate, which is a VERY 
dangerous thing to do since it is almost impossible to avoid the Liar 
paradox if you allow truth-predicates into the language freely. This 
is a very old and well-investigated topic, by the way, and there are 
libraries of stuff written about it.

>  However, when you quote a statement, the resulting object loses
>a portion of its semantics,

It loses ALL of its semantics. The object is simply an expression. 
That's why you need the truth-predicate to get the semantics back.

>so quoting seems to be less prevalent than it once
>was.
>
>Another possible out is reification.  To 'reify' in the dictionary 
>sense means 'to
>make concrete', which is a somewhat abstract notion that is open to more
>than one interpretation.  In the past, many of us have used 'reify' to mean
>'to convert a tuple into an instance'.  The intent of this operation 
>was to permit
>the appearance of a (representative of a) tuple in locations where 
>the tuple was
>syntactically illegal.
>However, there was no intent to subvert the semantics of the tuple, say from
>being a statement into being an event/occurence of the assertion of a
>statement (i.e., a stating).  Instead, reification was considered as 
>a functional
>operation, yielding exactly one result per application.

I cannot make any sense of the above. The usual meaning of 
'reification' in this context amounts to it being a form of 
quotation, in effect.

>
>So, the question is, how should I represent any of the above examples
>in RDF?

You probably can't. RDF is a VERY simple language, with very limited 
expressivity.

>   I can't use nesting (except in Jena, which apparently will be
>phasing it out).  I can't use quotation; I can't use reification as I have
>just described it.  There is no notion of 'event' or 'occurrence' in any
>of my above examples, except for the temporal one, so the use of a
>stating would be odd indeed.  But lets investigate it:
>
>Now, its true that if I observe the existence of a triple in an RDF triple
>store, there must have been some agent that caused that triple to have
>appeared there.  That means that there must have been a stating

Perhaps we have not been communicating. My sense of 'stating' does 
not require any agency; it just amounts to the distinction between a 
particular token of a triple in a document, and an abstract, Platonic 
triple considered as a grammatical form.

>(perhaps even more than one) that caused that triple to appear.  And each
>those statings implicitly refer to my triple.

No, they USE (instances of) your triple. They do not (usually) refer 
to triples (ie unless they use reification).

>  So if I were to locate
>a resource representing one of the statings, then I could pretend that
>the stating event

I have no notion of a 'stating event'.

>was not an event at all, but was a stand-in for the
>statement, and just reference it.  Or, I could invent a whole new class
>of predicates, so instead of saying "S1 is false", I could say find a
>stating for S3, and say "the statement referred to by S3 is false".  Or
>I could say "the statement referred to by S3 is true with probability P",
>or "the statement referred to by S3 is true at time T".
>
>I could do that, but it would be wrong (to quote Richard M.).  And it would
>be relatively inefficient.

I have noticed that people often argue that one or two extra triples 
will lead to inefficiency and triple-bloat when they are perfectly 
happy to use reification, which increases the number of triples by a 
multiplicative factor of 4, or RDF collection syntax (ie LISP lists 
encoded as triples) which uses 3 triples and one new bnode for every 
item. I have therefore stopped listening to such cries unless they 
are adequately documented.

>
>>>(6) Jena and other RDF-compliant triple stores will likely not 
>>>give us direct
>>>support for reasoning about statements in the future.
>>
>>If you are willing to use a bit of OWL expressivity (see above) you 
>>will probably be OK.
>
>OWL is a language.  Expressing things in OWL buys me nothing unless there
>is a software tool that efficiently implements whatever that construct is.
>At present, therefore, you are not offering me anything that I could use.

OWL is in exactly the same category as RDF, so I fail to see what 
distinction you are making here.

>
>>>  That means that we will
>>>have to work harder to get the preferred semantics from a triple store, and
>>>will have to work harder to achieve good performance.
>>I am honestly mystified as to what kind of performance you have in 
>>mind here that will be so hard to get.
>
>I believe you.  We would be happy to send you the source code for 
>Loom or PowerLoom if you
>want to see what an efficient (and very clever) context mechanism 
>looks like (the
>cleverness is primarily due to Austin Tate's group in Edinburgh).

Don't send me the code, but I'd like to know what you think a 
'context mechanism' actually is. I have come across about 40 or 50 
distinct senses of 'context' over the years, so I have no idea what 
you mean by that over-used word. I'd also like to know what contexts, 
in any sense, have got to do with reification or nesting. They seem 
like at least 3 distinct topics to me, so you seem to be dancing all 
over the landscape here.

>
>But the fact is, if the "shortcut" reification that now exists in Jena
>were modified only slightly, we could get significant performance improvements
>for reasoning with the kinds of examples listed above.  That 
>shortcut lies outside
>of the bounds of RDF, since it involves nested statements, but with 
>the correct
>interpretation of reification, the shortcut could be justified as 
>being isomorphic to
>a much clumsier and slower Reified Statement construct.  Now, that opportunity
>for efficiency seems to have gone out the window.

OK, to sum up: as one group using RDF, you have noticed that a change 
to RDF would provide you with some significant benefit, due to some 
idiosyncratic aspects of some code that you have. Many RDF users and 
user groups have been in a similar position, and some of them have 
been reviewing the RDF WG proceedings and responding to our very 
public requests for comments. (All of the WG's activities are about 
as open as it is possible to make them, including email archives and 
IRC logs of all the many protracted disputes and arguments we have 
had about every little issue: we practically publish our toilet 
paper.) To the best of my recollection, you have not expressed any 
comments on this issue before, certainly not when we were taking the 
relevant decisions. The decisions were taken, therefore, without any 
knowledge of the benefits that you would acquire from our taking a 
different decision. Had we known of this we might have been 
influenced to decide differently, but it does not seem appropriate 
for you to announce that our decision is a 'blunder' approximately 
one week before final call on the RDF document suite, just because it 
causes your group to have to rewrite some software. There are 
companies in the W3C who would have had to rewrite millions of RDF 
triples if we had taken a different decision.

>
>We may be able to invent
>caching techniques to compensate for the loss in performace -- I'm 
>not sure yet.
>
>>>thus widening the gap between what RDF provides and what a KR system needs.
>>
>>There are two separate issues here: reification on the one hand, 
>>and allowing nested expression structure on the other.  RDF does 
>>not allow nested structures, and it is beyond the WG's charter to 
>>extend it to do so, but that has got nothing to do with reification.
>>It is indeed a pity that RDF is so propositionally weak and 
>>inexpressive, if one wants to think of it as a full KR language. 
>>Clearly it is not a full KR language.  To extend it to allow full 
>>propositional expressivity would have gone beyond the WG's charter. 
>>I would suggest that you think of RDF as a useful way of encoding 
>>simple atomic assertions and use some extension of RDF to encode 
>>more elaborate propositional content. The new OWL-full standard 
>>shows that this is indeed possible, with some care, and a bunch of 
>>us are working on an extension which will provide full FOL 
>>expressivity, and you can choose your own subset.
>
>The really neat thing about RDF is the emergence of the notion of a 
>triple store
>(e.g., Jena).  RDF itself is just an unpleasant-looking 
>serialization syntax, which
>we don't actually use very often.  OWL is just another syntax.

I think you need to get up to speed on what is going on. OWL (well, 
strictly, OWL-full) is an RDF semantic extension, which is to say 
that it also can be encoded in an RDF triple store, It is however 
MUCH more expressive than simple RDF.

>Perhaps you are
>saying that a ground-up implementation for an FOL version OWL would 
>give me the kind of
>performance I'm looking for.  That may well be true, but (1) that 
>implementation
>will likely be a long time in coming,

There are many FOL inference engines you could be using right now, 
some of the best ones being open source (SNARK, GANDALF).

>  and (2) it probably won't be able to exploit today's
>triple stores, since by following RDF dictates they are probably 
>insuring that their performance
>for full FOL will be suboptimal.

I do not follow your reasoning here. How do you plan to get full FOL 
into a triple store and still retain optimality? The question has 
nothing to do with RDF, and still less to do with reification. Just 
take something like
(forall (?x ?y ?z)(implies (R ?x ?y ?z) (exists (?u)(Q ?y ?x ?u ?z))))
and try mapping it into triples.

>
>>Many people have suggested using reification to simulate expression 
>>nesting in recursive syntax, but  this kind of usage for 
>>reification was a mistake from the start. A complex expression like
>>(A and B ) or (C and D)
>>does not *mention* its disjunctive components,
>
>I disagree.

Then you are just plain wrong. Sorry, but this isnt a matter for 
discussion. Read a book about it.

>
>>so to use reification to encode complex propositional expressions 
>>in a triple store was a semantic disaster waiting to happen. (This 
>>point has been labored to death in many email forums.)  If the WG 
>>decision on reification has rescued you from doing that, be 
>>thankful.
>
>Certainly I'm not thankful, since I don't know what I might have 
>been rescued from.
>I could imagine that there are esoteric problems (e.g., paradox) 
>that could have been
>easily legislated away without eliminating this form of reification.

Well, it took some of the best minds of the 20th century several 
lifetimes to discover that this 'legislating away' is a damn sight 
harder than it looks, and that the 'esoteric' problems are in fact 
central.

>You seem to have thrown the baby out with the bathwater.

Im trying to get you to see that there never was a baby in the water here.

>
>Bizarre thought:  It occurs to me that maybe you don't want me to be 
>able to express any of
>my above examples in RDF.

I have no particular agenda on this one way or the other. I wish that 
RDF were better than it is. Ive been working hard to try to make it 
more of a 'foundation' for the SW, along the lines of Tim B-L's 
layer-cake diagram. But the fact is that triple stores are a limited 
tool, and RDF is committed to the use of this very restricted tool. I 
think it is a really bad mistake to *pretend* it is better than it 
is. Triple stores are very good for some things, and probably will be 
fine for 95% of the SW traffic: why not be happy with that, and stop 
thinking that you have found the philosophers stone?

>I'm supposed to wait for OWL superheavy, or something.
>Is that the problem?

Well, the OWL and RDF standards are going to appear almost at the 
same time, and certainly before April 2003.  And there already are 
OWL implementations out there in use.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola              			(850)202 4440   fax
FL 32501           				(850)291 0667    cell
phayes@ai.uwf.edu	          http://www.coginst.uwf.edu/~phayes
s.pam@ai.uwf.edu   for spam
Received on Friday, 10 January 2003 16:03:45 UTC