Re: RIF-in-RDF: Requirement 4

Thanks for all your work on this; I understand and respect it if you
don't want to continue this discussion, but in case you do, here's my
response.

On Mon, 2010-07-26 at 23:06 +0100, Dave Reynolds wrote:
> On Mon, 2010-07-26 at 12:58 -0400, Sandro Hawke wrote: 
> > On Mon, 2010-07-26 at 08:38 +0100, Dave Reynolds wrote:
> > > On Sun, 2010-07-25 at 18:59 -0400, Sandro Hawke wrote: 
> > > > On Sun, 2010-07-25 at 23:16 +0100, Dave Reynolds wrote:
> > > > > Hi Sandro,
> > > > > 
> > > > > Thanks for the detailed description.
> > > > > 
> > > > > First, note that I very much agree about the value of RIF for vocabulary
> > > > > translation - I seem to recall proposing & drafting the original
> > > > > vocabulary translation example for the requirements document :)
> > > > > 
> > > > > Second, I agree that RIF can be used to transform RIF. Though that's a
> > > > > less compelling case because the RIF syntax is so verbose. In practice I
> > > > > would expect to translate RIF rules to/from a compact, normalized
> > > > > abstract syntax form and perform transformations on that abstract syntax
> > > > > - using RIF. 
> > > > > 
> > > > > Now, turning to your example ...
> > > > > 
> > > > > > Let's try (1) first, since it's more terse.  Our input looks like
> > > > > > this:
> > > > > > 
> > > > > >       ...
> > > > > >       <if>       <!-- or something else that can have an And in it -->
> > > > > >          <my:Conjunction>
> > > > > >              <my:conjunct>$1</my:conjunct>
> > > > > >              <my:conjunct>$2</my:conjunct>
> > > > > >              ...
> > > > > >          </my:Conjunction>
> > > > > >       </if>
> > > > > >       ...
> > > > > > 
> > > > > > and we'll just "replace" the element names.
> > > > > > 
> > > > > > However, since we don't have a way to "replace" things in this
> > > > > > "overlapping" style, we'll just add a second <if> property, and the
> > > > > > serializer or consumer will discard this one, since it contains an
> > > > > > element not allowed by the dialect syntax.   
> > > > > > 
> > > > > > So, the rule will add new triples, but leave the old ones intact.
> > > > > > The rule will leave us with this:
> > > > > > 
> > > > > > 
> > > > > >       ...
> > > > > >       <if>       <!-- or something else that can have an And in it -->
> > > > > >          <my:Conjunction>
> > > > > >              <my:conjunct>$1</my:conjunct>
> > > > > >              <my:conjunct>$2</my:conjunct>
> > > > > >              ...
> > > > > >          </my:Conjunction>
> > > > > >       </if>
> > > > > >       <if>      <!-- the same property, whatever it was -->
> > > > > >          <And>
> > > > > >              <formula>$1</formula>
> > > > > >              <formula>$2</formula>
> > > > > >              ...
> > > > > >          </And>
> > > > > >       </if>
> > > > > >       ...
> > > > > > 
> > > > > > Here's the rule:
> > > > > > 
> > > > > >  forall ?parent ?prop ?old ?conjunct ?new
> > > > > >  if And( 
> > > > > >    ?parent[?prop->?old]
> > > > > >    my:Conjunction#?old[my:conjunct->?conjunct]
> > > > > >    ?new = wrapped(?old)  <!-- use a logic function to create a new node -->
> > > > > >  ) then And (
> > > > > >    ?parent[?prop->?new]
> > > > > >    rif:And#?new[rif:formula->?conjunct]
> > > > > >  )
> > > > > > 
> > > > > > This works fine, as long as the reasoning is complete.  However, if
> > > > > > the reasoning is ever incomplete, we end up with undetectably
> > > > > > incorrect results.  Rules that were "if and(a b c) then d" might get
> > > > > > turned into "if and(a b) then d"!   
> > > > > > 
> > > > > > I don't think it's sensible to expect reasoners to be complete.  It's
> > > > > > great to have termination conditions arise from the rules; it's not
> > > > > > good to require the reasoner to run until it knows all possible
> > > > > > inferences have been made.  With the above approach, there's no
> > > > > > termination condition other than "make all the inferences possible".
> > > > > 
> > > > > Here we differ.
> > > > > 
> > > > > There is a difference between completeness, termination and premature
> > > > > termination. You seem to be wanting robustness against the latter and I
> > > > > don't think that is either possible or desirable.
> > > > > 
> > > > > A rule system (or indeed other proof procedure) may reliably terminate
> > > > > but still be incomplete.
> > > > > 
> > > > > To me any admissible vocabulary transformation rule set has to be
> > > > > terminating (reaches a (fixed) point where no new derivations are
> > > > > possible). If you run a transformation rule set and it doesn't terminate
> > > > > within the resources you can devote to it then you can do nothing with
> > > > > the result. There is no reason to expect an aborted transformation to be
> > > > > well-formed or usable. That's the nature of such transformations.
> > > > > 
> > > > > > Alternatively, if we use the list encoding, the rule is very similar:
> > > > > > 
> > > > > >  forall ?parent ?prop ?old ?conjuncts ?new
> > > > > >  if And( 
> > > > > >    ?parent[?prop->?old]
> > > > > >    my:Conjunction#?old[my:conjuncts->?conjuncts]
> > > > > >    ?new = wrapped(?old)
> > > > > >  ) then And (
> > > > > >    ?parent[?prop->?new]
> > > > > >    rif:And#?new[rif:formulas->?conjuncts]
> > > > > >  )
> > > > > 
> > > > > That is an accident of this example. I can easily imagine more complex
> > > > > transformations needing e.g. to recursively walk the list constructing a
> > > > > new transformed list. Imaging a transformation to reverse the terms in
> > > > > the AND for example, which didn't use the DTB built-in. If you abort
> > > > > such a rule set part way then you will get an incomplete transformation,
> > > > > it may even look like a well-formed list but it will be missing values.
> > > > 
> > > > I don't think that's the case.   As I mentioned in my message a minute
> > > > ago to Michael, I'm thinking about this via backward chaining not
> > > > forward chaining, and perhaps that's leading us to different
> > > > conclusions.
> > > 
> > > Direction of chaining makes no difference, in the backward chaining case
> > > either it delivers a solution to your entire query or not. For syntactic
> > > transformations then your query has to be for all entailments. You can't
> > > just query for one triple and expect it to mean anything. 
> > > 
> > > > What I'm used to from Prolog is having rules which generate many correct
> > > > results.   (in this case, each correct result is a transformation of a
> > > > rif:Document which is implied by some translation rules.)  You can
> > > > abort while it's looking for the first solution, and then you simply
> > > > have no result (and you know that).   Or you can get the first solution,
> > > > and then decide not to look for any more.  Then you have one result (and
> > > > you know it).   You can keep looking as long as you want, getting more
> > > > results, or abort when you like.  Eventually, the system may tell you
> > > > it's found all the solutions.
> > > > 
> > > > With the list-style, each solution is a correct transformation.  With
> > > > the repeated-property style, you need to let the system run until it's
> > > > found all solutions (and it has to be a complete reasoner).
> > > 
> > > You are confusing a triple result and a solution. For a syntactic
> > > transformation the solution has to be a complete derived graph. Even in
> > > the list case not all transformations will be restricted to a single
> > > list.
> > 
> > What I mean by "result" and "solution" might be more precisely called an
> > "extraction".    (Section 6, defining the extraction mapping XTr isn't
> > quite right; it was a first attempt.)
> > 
> > Procedurally: imagine we are implementing the "extract" function which
> > creates a RIF XML tree, given an RDF Graph G, a focus-node N, and an
> > XML indication of the dialect we want (schema+root).  
> > 
> > With the repeated-properties style, extract() would be doing a lot of
> > searches of the entire graph, finding all matches to each graph
> > pattern.  For example, when it's working on a rif:And node, it would be
> > searching for all the rif:formula arcs off that node.  It needs to find
> > all the matches, in each case, before it can return the extraction.  
> > If it missed a triple, it would end up extracting things like a&c=>d
> > when it was supposed to return a&b&c=>d.
> 
> Correct but that is non issue, you query for all the results you get
> them all. I see no reason you should "miss a triple". I routinely
> transform things by materializing entailed graphs.

Note that there are kinds of reasoning that cannot be done this way. 
Specifically, anything with a disjunction in the head, as you might get
with FOL or OWL DL.  Well, you could materialize a particular entailed
graph, when it's finite, but there will be multiple distinct entailed
graphs, each of which might give you a different extraction, giving you
non-determinism even if there's only one extraction per graph.

> > In contrast, with the list style, extract() needs to find one match to
> > each of its triple patterns.  From the rif:And node, it needs to find
> > one rif:allTrue arc.  If no match can be found, then no extract is
> > possible.  If additional matches can be found, then there are multiple
> > extractions possible; that's what what I meant by multiple
> > results/solutions, above.  
> 
> For a well-designed transformation there should not be multiple
> extractions possible.
> 
> Simple transformations should be deterministic.
> 
> If you really want to write a non-deterministic generate/test type
> transformer then you need to ensure that each solution is a distinct
> structure - mint new nodes for each transformed term. Transformation in
> place (other than rewrite transformations where the old form is deleted)
> would be a nightmare. [All of which means you need to be outside Core.]
> 
> Furthermore, as I've said, your example is particularly simple since all
> the changes are within the one list.  Consider more complex
> transformations, such as where you are transferring terms from one list
> to another (e.g. to map to a normal form). Now your extraction needs to
> make sure it gets compatible versions of an arbitrary number of
> different lists. Or consider transformations which involve multiple
> rules (e.g. Lloyd-Topor type transforms). Simply having well formed
> lists is not a general termination criteria or guarantee you are
> extracting a sensible rule set.

[addressed below]

> > If the graph G is a plain old graph, a collection of RDF triples stored
> > in a multiply-indexed Set data structure in memory, then the
> > repeated-properties style is certainly simpler.   Searching the whole
> > graph is quick and easy.    Certainly easier than traversing rdf:Lists.
> > 
> > But, in my world of foaf:name == foaf:firstName + " " + foaf:lastName,
> > we're querying G under inference. 
> 
> Transforming FOAF is a different case from transforming RIF.
> 
> In the FOAF case each triple is independently a valid conclusion, FOAF
> is an application of RDF as a KR language.
> 
> In RIF-in-RDF this is not the case, we are not doing KR here but using
> RDF to syntactically encode a data structure. 

This may be the heart of the matter: when is RDF a KR?  The OWL WG
concluded, with great frustration: always, when using OWL.  The
SPARQL WG said each server could decide for itself.  At the recent RDF
Next Steps workshop, Peter Patel-Schneider (who has probably done the
most serious work on the subject) presented an argument for changing
RDF, to make it not be a KR.  People were intrigued, but it still polled
fairly low in the priorities for RDF work, for what that's worth.

When I'm thinking about RIF, I definitely think of RDF as a KR.  In
particular, I don't want RIF-in-RDF to force RDF to not be a KR.  Since
the repeated-property encoding is incompatible with RDF being a KR, it
seems to me we can't use that encoding.


> Any triple level results
> of a rule essentially mean "this triple is part of the final syntactic
> transform" and is not meaningfully interpretable on its own without all
> the rest of entailed triples.

To me, each triple in the RIF-in-RDF encoding is, indeed, a fact. 
They're facts about the syntactic structures of that particular
(imagined) RIF document.

> >   There will typically be one
> > extraction which comes out of ground facts (the plain old graph), but
> > there may be more that appear as reasoning is done.  
> 
> I just don't buy this as an approach to transformation.
> 
> If you are going to take a rule set and transform it you need a clean
> way to separate the original from the transformed rule set. I don't see
> how you can to do triple level inference and expect each possible
> extraction from the result graph to be an intended rule set, you'll end
> up with a mix of transformed and untransformed rules and any
> transformations which are not sufficiently localized will be broken.
> 
> Possible ways to ensure you have clean separation or original and
> transformed rules include:
>   - use rewrite rules
>   - only do an extraction from the entailed graph, not the source graph
> (how we do it in Jena)
>   - construct a new data structure to contain the transformed rules

I believe it works, as in my very-brief example, to branch the structure
wherever there is an alternative, as in having two "if" properties --
one leading to the old (non-core) subtree, the other leading to the new
(core) subtree.   Obviously this is the kind of thing that needs to be
fully implemented and pass test cases before being fully accepted, but
I've gone through the code in my head several times.

I wanted to have a running implementation to show you before having this
discussion, but decided it was unlikely I'd find time for that as
quickly.  If I were to proceed with that implementation, do you know
what you'd want to see (or hear about) in it, to reassure you this
approach was workable, or even nice?

> None of these are possible in RIF-CORE running over the RIF-in-RDF
> encoding. You either need a recursive data structure (e.g. lists),
> deletion (RIF-PR) or the ability to mint new resources. Personally I
> expect most RIF-CORE implementations targeted at RDF to include the sort
> of Skolem bNode allocation function we nearly got into RIF and they will
> use those for transformations.

Agreed, this approach needs the ability to mint new nodes.

So my example used BLD.

<ASIDE>

However, I'm pretty sure we did add new-node-construction into RIF
Core, although we may not have realized it at the time.  Specifically,
the list constructor (or at least the list builtins) act like a Skolem
function.    Of course, the slightly sneaky way this is done may mean
it wont be correctly implemented.  Someone should provide some test
cases on this...

Let's see:

   if x uncle y then exists z: x father z and y brother z

in BLD, with a Skolem function sk1 replacing the existential:

   forall ?X ?Y (
      And( ?X[parent->sk1(?X ?Y)] 
           sk1(?X ?Y)[brother->?Y] ) :- ?X[uncle->?Y]

and in Core, using a list constructor, and a per-application skolem
constant my:sk1 (which should be unique per rule):

   forall ?X ?Y (
      And( ?X[parent->func:make-list(my:sk1 ?X ?Y)] 
           func:make-list(my:sk1 ?X ?Y)][brother->?Y] ) :- ?X[uncle->?Y]

and fact:
   my:sandro[uncle->my:martin]
   
entails:
   exists ?P And(my:sandro[parent->?P and ?P[brother->my:martin])


Any comments on that now, or should I put it on the wiki?   (cleaning
up the syntax/qname a bit.)

</ASIDE>

> > This gives us RIF fallback. The extractor is actually trying to extract
> > a RIF document that matches a given dialect (eg Core).  But in this
> > scenario, what's in the plain-old-graph includes some extensions, so
> > the extractor will fail in its attempt to find something that matches
> > RIF Core syntax, using only the plain-old-graph.  But it will keep
> > looking, and if the fallback rules are run (using exactly the same
> > mechanism as the foaf:name rules are run), they'll produce additional
> > triples, eventually allowing a RIF Core extraction to be found.
> > 
> > It may be that multiple RIF Core extractions can be found, because
> > several fallbacks are available.  Ideally, they can include data which
> > helps fallback processors decide which extraction to use, and when it
> > can stop looking for better ones.  
> 
> Hmm.
> 
> > I think there's a level at which this argument is theoretically
> > equivalent to the KR argument in the parallel thread, having to do with
> > RDF being a KR not a data structure, but I can't think of how to say
> > that clearly right now. 
> >  I guess the point is that even if we're not
> > doing RIF fallback processing, we should understand that querying RDF
> > graphs may involve reasoning, and querying for all triple-matches can
> > be vastly harder than querying for triple-matches one at a time as you
> > need them. 
> 
> I completely disagree with "vastly harder".
> 
> >   Of course, lots of RDF applications are being built which
> > assume they can do complete queries; I think the ones which rely
> > strongly on that assumption -- producing harmful results when some
> > triples aren't returned -- will need to be redesigned as RDF inference
> > becomes commonplace.  
> 
> If you assume, for a specific application, some form of negation as
> failure then you need to ensure the inference procedure you are building
> over is complete. Sure.
> 
> That's one of the nice things about RIF we can now define precise
> inference profiles to rely on, as Ivan has argued. 
> 
> 
> Bottom line for me remains that I won't object, certainly not formally,
> to using lists in the encoding. I don't see it as harmful and may even
> be helpful.  
> 
> I do object somewhat to justifying it on the grounds of this sort of
> non-deterministic transformation approach. That's not an approach I'd
> like to see recommended.
> 
> Dave
> 
> P.S. I see there is a RIF call tomorrow. I've registered my regrets but
> may possibly make it for the last 30mins, in which case you can tell me
> how this one got resolved :)

And we're about to discuss it right now.  :-)

    -- Sandro

Received on Tuesday, 27 July 2010 15:09:26 UTC