Re: RIF-in-RDF: Requirement 4 from Dave Reynolds on 2010-07-27 (public-rif-wg@w3.org from July 2010)

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Tue, 27 Jul 2010 23:36:39 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: public-rif-wg <public-rif-wg@w3.org>
Message-ID: <1280270199.2281.62.camel@dave-desktop>
Hi Sandro,

On Tue, 2010-07-27 at 11:09 -0400, Sandro Hawke wrote: 
> Thanks for all your work on this;

You've done a lot more than I have.

> I understand and respect it if you
> don't want to continue this discussion, but in case you do, here's my
> response.

It's sort of moot now that we've resolved but here's some responses ...

The response to your final ASIDE is, however, not moot. 
Jos, what's your opinion on that bit (final example from Sandro and my
comment, you might have to scroll down a bit :) )?

> 
> On Mon, 2010-07-26 at 23:06 +0100, Dave Reynolds wrote:
> > On Mon, 2010-07-26 at 12:58 -0400, Sandro Hawke wrote: 
> > > On Mon, 2010-07-26 at 08:38 +0100, Dave Reynolds wrote:
> > > > On Sun, 2010-07-25 at 18:59 -0400, Sandro Hawke wrote: 
> > > > > On Sun, 2010-07-25 at 23:16 +0100, Dave Reynolds wrote:
> > > > > > Hi Sandro,
> > > > > > 
> > > > > > Thanks for the detailed description.
> > > > > > 
> > > > > > First, note that I very much agree about the value of RIF for vocabulary
> > > > > > translation - I seem to recall proposing & drafting the original
> > > > > > vocabulary translation example for the requirements document :)
> > > > > > 
> > > > > > Second, I agree that RIF can be used to transform RIF. Though that's a
> > > > > > less compelling case because the RIF syntax is so verbose. In practice I
> > > > > > would expect to translate RIF rules to/from a compact, normalized
> > > > > > abstract syntax form and perform transformations on that abstract syntax
> > > > > > - using RIF. 
> > > > > > 
> > > > > > Now, turning to your example ...
> > > > > > 
> > > > > > > Let's try (1) first, since it's more terse.  Our input looks like
> > > > > > > this:
> > > > > > > 
> > > > > > >       ...
> > > > > > >       <if>       <!-- or something else that can have an And in it -->
> > > > > > >          <my:Conjunction>
> > > > > > >              <my:conjunct>$1</my:conjunct>
> > > > > > >              <my:conjunct>$2</my:conjunct>
> > > > > > >              ...
> > > > > > >          </my:Conjunction>
> > > > > > >       </if>
> > > > > > >       ...
> > > > > > > 
> > > > > > > and we'll just "replace" the element names.
> > > > > > > 
> > > > > > > However, since we don't have a way to "replace" things in this
> > > > > > > "overlapping" style, we'll just add a second <if> property, and the
> > > > > > > serializer or consumer will discard this one, since it contains an
> > > > > > > element not allowed by the dialect syntax.   
> > > > > > > 
> > > > > > > So, the rule will add new triples, but leave the old ones intact.
> > > > > > > The rule will leave us with this:
> > > > > > > 
> > > > > > > 
> > > > > > >       ...
> > > > > > >       <if>       <!-- or something else that can have an And in it -->
> > > > > > >          <my:Conjunction>
> > > > > > >              <my:conjunct>$1</my:conjunct>
> > > > > > >              <my:conjunct>$2</my:conjunct>
> > > > > > >              ...
> > > > > > >          </my:Conjunction>
> > > > > > >       </if>
> > > > > > >       <if>      <!-- the same property, whatever it was -->
> > > > > > >          <And>
> > > > > > >              <formula>$1</formula>
> > > > > > >              <formula>$2</formula>
> > > > > > >              ...
> > > > > > >          </And>
> > > > > > >       </if>
> > > > > > >       ...
> > > > > > > 
> > > > > > > Here's the rule:
> > > > > > > 
> > > > > > >  forall ?parent ?prop ?old ?conjunct ?new
> > > > > > >  if And( 
> > > > > > >    ?parent[?prop->?old]
> > > > > > >    my:Conjunction#?old[my:conjunct->?conjunct]
> > > > > > >    ?new = wrapped(?old)  <!-- use a logic function to create a new node -->
> > > > > > >  ) then And (
> > > > > > >    ?parent[?prop->?new]
> > > > > > >    rif:And#?new[rif:formula->?conjunct]
> > > > > > >  )
> > > > > > > 
> > > > > > > This works fine, as long as the reasoning is complete.  However, if
> > > > > > > the reasoning is ever incomplete, we end up with undetectably
> > > > > > > incorrect results.  Rules that were "if and(a b c) then d" might get
> > > > > > > turned into "if and(a b) then d"!   
> > > > > > > 
> > > > > > > I don't think it's sensible to expect reasoners to be complete.  It's
> > > > > > > great to have termination conditions arise from the rules; it's not
> > > > > > > good to require the reasoner to run until it knows all possible
> > > > > > > inferences have been made.  With the above approach, there's no
> > > > > > > termination condition other than "make all the inferences possible".
> > > > > > 
> > > > > > Here we differ.
> > > > > > 
> > > > > > There is a difference between completeness, termination and premature
> > > > > > termination. You seem to be wanting robustness against the latter and I
> > > > > > don't think that is either possible or desirable.
> > > > > > 
> > > > > > A rule system (or indeed other proof procedure) may reliably terminate
> > > > > > but still be incomplete.
> > > > > > 
> > > > > > To me any admissible vocabulary transformation rule set has to be
> > > > > > terminating (reaches a (fixed) point where no new derivations are
> > > > > > possible). If you run a transformation rule set and it doesn't terminate
> > > > > > within the resources you can devote to it then you can do nothing with
> > > > > > the result. There is no reason to expect an aborted transformation to be
> > > > > > well-formed or usable. That's the nature of such transformations.
> > > > > > 
> > > > > > > Alternatively, if we use the list encoding, the rule is very similar:
> > > > > > > 
> > > > > > >  forall ?parent ?prop ?old ?conjuncts ?new
> > > > > > >  if And( 
> > > > > > >    ?parent[?prop->?old]
> > > > > > >    my:Conjunction#?old[my:conjuncts->?conjuncts]
> > > > > > >    ?new = wrapped(?old)
> > > > > > >  ) then And (
> > > > > > >    ?parent[?prop->?new]
> > > > > > >    rif:And#?new[rif:formulas->?conjuncts]
> > > > > > >  )
> > > > > > 
> > > > > > That is an accident of this example. I can easily imagine more complex
> > > > > > transformations needing e.g. to recursively walk the list constructing a
> > > > > > new transformed list. Imaging a transformation to reverse the terms in
> > > > > > the AND for example, which didn't use the DTB built-in. If you abort
> > > > > > such a rule set part way then you will get an incomplete transformation,
> > > > > > it may even look like a well-formed list but it will be missing values.
> > > > > 
> > > > > I don't think that's the case.   As I mentioned in my message a minute
> > > > > ago to Michael, I'm thinking about this via backward chaining not
> > > > > forward chaining, and perhaps that's leading us to different
> > > > > conclusions.
> > > > 
> > > > Direction of chaining makes no difference, in the backward chaining case
> > > > either it delivers a solution to your entire query or not. For syntactic
> > > > transformations then your query has to be for all entailments. You can't
> > > > just query for one triple and expect it to mean anything. 
> > > > 
> > > > > What I'm used to from Prolog is having rules which generate many correct
> > > > > results.   (in this case, each correct result is a transformation of a
> > > > > rif:Document which is implied by some translation rules.)  You can
> > > > > abort while it's looking for the first solution, and then you simply
> > > > > have no result (and you know that).   Or you can get the first solution,
> > > > > and then decide not to look for any more.  Then you have one result (and
> > > > > you know it).   You can keep looking as long as you want, getting more
> > > > > results, or abort when you like.  Eventually, the system may tell you
> > > > > it's found all the solutions.
> > > > > 
> > > > > With the list-style, each solution is a correct transformation.  With
> > > > > the repeated-property style, you need to let the system run until it's
> > > > > found all solutions (and it has to be a complete reasoner).
> > > > 
> > > > You are confusing a triple result and a solution. For a syntactic
> > > > transformation the solution has to be a complete derived graph. Even in
> > > > the list case not all transformations will be restricted to a single
> > > > list.
> > > 
> > > What I mean by "result" and "solution" might be more precisely called an
> > > "extraction".    (Section 6, defining the extraction mapping XTr isn't
> > > quite right; it was a first attempt.)
> > > 
> > > Procedurally: imagine we are implementing the "extract" function which
> > > creates a RIF XML tree, given an RDF Graph G, a focus-node N, and an
> > > XML indication of the dialect we want (schema+root).  
> > > 
> > > With the repeated-properties style, extract() would be doing a lot of
> > > searches of the entire graph, finding all matches to each graph
> > > pattern.  For example, when it's working on a rif:And node, it would be
> > > searching for all the rif:formula arcs off that node.  It needs to find
> > > all the matches, in each case, before it can return the extraction.  
> > > If it missed a triple, it would end up extracting things like a&c=>d
> > > when it was supposed to return a&b&c=>d.
> > 
> > Correct but that is non issue, you query for all the results you get
> > them all. I see no reason you should "miss a triple". I routinely
> > transform things by materializing entailed graphs.
> 
> Note that there are kinds of reasoning that cannot be done this way. 
> Specifically, anything with a disjunction in the head, as you might get
> with FOL or OWL DL.  

Sure, but as I said before, I wouldn't count such a thing as an
admissible transformation rule. 

The only reasoning going on here is at most RDF simple entailment, RDFS
closure with respect to the RIF-in-RDF vocabulary and the RIF
transformation rules you are applying. If you write transformation rules
(using some future RIF extension) which are either non-deterministic,
not decidable or not tractable then the RIF syntax is the least of your
worries :)

> > > In contrast, with the list style, extract() needs to find one match to
> > > each of its triple patterns.  From the rif:And node, it needs to find
> > > one rif:allTrue arc.  If no match can be found, then no extract is
> > > possible.  If additional matches can be found, then there are multiple
> > > extractions possible; that's what what I meant by multiple
> > > results/solutions, above.  
> > 
> > For a well-designed transformation there should not be multiple
> > extractions possible.
> > 
> > Simple transformations should be deterministic.
> > 
> > If you really want to write a non-deterministic generate/test type
> > transformer then you need to ensure that each solution is a distinct
> > structure - mint new nodes for each transformed term. Transformation in
> > place (other than rewrite transformations where the old form is deleted)
> > would be a nightmare. [All of which means you need to be outside Core.]
> > 
> > Furthermore, as I've said, your example is particularly simple since all
> > the changes are within the one list.  Consider more complex
> > transformations, such as where you are transferring terms from one list
> > to another (e.g. to map to a normal form). Now your extraction needs to
> > make sure it gets compatible versions of an arbitrary number of
> > different lists. Or consider transformations which involve multiple
> > rules (e.g. Lloyd-Topor type transforms). Simply having well formed
> > lists is not a general termination criteria or guarantee you are
> > extracting a sensible rule set.
> 
> [addressed below]
> 
> > > If the graph G is a plain old graph, a collection of RDF triples stored
> > > in a multiply-indexed Set data structure in memory, then the
> > > repeated-properties style is certainly simpler.   Searching the whole
> > > graph is quick and easy.    Certainly easier than traversing rdf:Lists.
> > > 
> > > But, in my world of foaf:name == foaf:firstName + " " + foaf:lastName,
> > > we're querying G under inference. 
> > 
> > Transforming FOAF is a different case from transforming RIF.
> > 
> > In the FOAF case each triple is independently a valid conclusion, FOAF
> > is an application of RDF as a KR language.
> > 
> > In RIF-in-RDF this is not the case, we are not doing KR here but using
> > RDF to syntactically encode a data structure. 
> 
> This may be the heart of the matter: when is RDF a KR?  The OWL WG
> concluded, with great frustration: always, when using OWL.  The
> SPARQL WG said each server could decide for itself.  At the recent RDF
> Next Steps workshop, Peter Patel-Schneider (who has probably done the
> most serious work on the subject) presented an argument for changing
> RDF, to make it not be a KR.  People were intrigued, but it still polled
> fairly low in the priorities for RDF work, for what that's worth.
> 
> When I'm thinking about RIF, I definitely think of RDF as a KR.  In
> particular, I don't want RIF-in-RDF to force RDF to not be a KR.  Since
> the repeated-property encoding is incompatible with RDF being a KR, it
> seems to me we can't use that encoding.

The question is not the nature of RDF but the nature of the RIF-in-RDF
proposal.

You need to do a *lot* more to make RIF-in-RDF a use of RDF as a KR than
just use lists!!

The RDF/OWL issues arise because OWL is layered on top of RDF, it is
trying to be a semantic extension and the semantics of OWL is
expressible as a RDF level semantics. That was a major undertaking, of
course.

None of that is true for RIF. The current proposed encoding is not a
layering of RIF on top of RDF, there is no proposal (and no chance) to
give the RDF terms in RIF-in-RDF a semantics which would correspond to
RIF. Even if it could be done for RIF-Core it definitely couldn't be
done for RIF-PR in any reasonable manner. Lists are just not the central
issue in that, they might be necessary but they are a long way from
being sufficient.

> > Any triple level results
> > of a rule essentially mean "this triple is part of the final syntactic
> > transform" and is not meaningfully interpretable on its own without all
> > the rest of entailed triples.
> 
> To me, each triple in the RIF-in-RDF encoding is, indeed, a fact. 
> They're facts about the syntactic structures of that particular
> (imagined) RIF document.

Exactly.

> > >   There will typically be one
> > > extraction which comes out of ground facts (the plain old graph), but
> > > there may be more that appear as reasoning is done.  
> > 
> > I just don't buy this as an approach to transformation.
> > 
> > If you are going to take a rule set and transform it you need a clean
> > way to separate the original from the transformed rule set. I don't see
> > how you can to do triple level inference and expect each possible
> > extraction from the result graph to be an intended rule set, you'll end
> > up with a mix of transformed and untransformed rules and any
> > transformations which are not sufficiently localized will be broken.
> > 
> > Possible ways to ensure you have clean separation or original and
> > transformed rules include:
> >   - use rewrite rules
> >   - only do an extraction from the entailed graph, not the source graph
> > (how we do it in Jena)
> >   - construct a new data structure to contain the transformed rules
> 
> I believe it works, as in my very-brief example, to branch the structure
> wherever there is an alternative, as in having two "if" properties --
> one leading to the old (non-core) subtree, the other leading to the new
> (core) subtree.   Obviously this is the kind of thing that needs to be
> fully implemented and pass test cases before being fully accepted, but
> I've gone through the code in my head several times.

To be clean the transformed rule needs to be a complete new structure
with a different URI, and the new "if" property attached to that. At the
very least the metadata on the source rule is not applicable to
transformed rule.

> I wanted to have a running implementation to show you before having this
> discussion, but decided it was unlikely I'd find time for that as
> quickly.  If I were to proceed with that implementation, do you know
> what you'd want to see (or hear about) in it, to reassure you this
> approach was workable, or even nice?

I've no doubt your example problem is implementable. I don't think
exhibiting the implementation of the simplified example would help much.

> > None of these are possible in RIF-CORE running over the RIF-in-RDF
> > encoding. You either need a recursive data structure (e.g. lists),
> > deletion (RIF-PR) or the ability to mint new resources. Personally I
> > expect most RIF-CORE implementations targeted at RDF to include the sort
> > of Skolem bNode allocation function we nearly got into RIF and they will
> > use those for transformations.
> 
> Agreed, this approach needs the ability to mint new nodes.
> 
> So my example used BLD.
> 
> <ASIDE>
> 
> However, I'm pretty sure we did add new-node-construction into RIF
> Core, although we may not have realized it at the time.  Specifically,
> the list constructor (or at least the list builtins) act like a Skolem
> function. 

See my "e.g. lists" comment. You could design a complete S-expression
type encoding of RIF as lists and then RIF-Core could do a lot more
interesting transformations. That would be a much more fun
representation.

>    Of course, the slightly sneaky way this is done may mean
> it wont be correctly implemented.  Someone should provide some test
> cases on this...
> 
> Let's see:
> 
>    if x uncle y then exists z: x father z and y brother z
> 
> in BLD, with a Skolem function sk1 replacing the existential:
> 
>    forall ?X ?Y (
>       And( ?X[parent->sk1(?X ?Y)] 
>            sk1(?X ?Y)[brother->?Y] ) :- ?X[uncle->?Y]
> 
> and in Core, using a list constructor, and a per-application skolem
> constant my:sk1 (which should be unique per rule):
> 
>    forall ?X ?Y (
>       And( ?X[parent->func:make-list(my:sk1 ?X ?Y)] 
>            func:make-list(my:sk1 ?X ?Y)][brother->?Y] ) :- ?X[uncle->?Y]
> 
> and fact:
>    my:sandro[uncle->my:martin]
>    
> entails:
>    exists ?P And(my:sandro[parent->?P and ?P[brother->my:martin])
> 
> 
> Any comments on that now, or should I put it on the wiki?   (cleaning
> up the syntax/qname a bit.)

Hmmm. 

I guess that's correct but you are right it won't be correctly
implemented. Most RDF systems essentially use skolem constants for
bNodes, including list elements, so identifying two lists because they
have the same elements would lead to problems, especially with SPARQL. 

A SPARQL query of the form:

    SELECT ?P {my:sandro my:parent ?P. ?P my:brother my:martin. }

over a plain graph:

   my:sandro my:parent (my:sk1 my:sandro my:martin) .
   (my:sk1 my:sandro my:martin)  my:brother my:martin .

returns no results.

This may be an issue for the SPARQL RIF entailment regime. Does this
mean that the entailment regime should specify that the RDF graph
closure entailed by RIF should be "leaned" (as in "made lean") before
applying the SPARQL graph pattern? 

That way the behaviour of the SPARQL query would match the corresponding
"exists ?P .." RIF formula.

At the cost of introducing an NP complete operation in the middle of
what would otherwise be a polynomial problem. Ugh.

Dave
Received on Tuesday, 27 July 2010 22:37:18 UTC