Re: RIF-in-RDF: Requirement 4

On Mon, 2010-07-26 at 12:58 -0400, Sandro Hawke wrote: 
> On Mon, 2010-07-26 at 08:38 +0100, Dave Reynolds wrote:
> > On Sun, 2010-07-25 at 18:59 -0400, Sandro Hawke wrote: 
> > > On Sun, 2010-07-25 at 23:16 +0100, Dave Reynolds wrote:
> > > > Hi Sandro,
> > > > 
> > > > Thanks for the detailed description.
> > > > 
> > > > First, note that I very much agree about the value of RIF for vocabulary
> > > > translation - I seem to recall proposing & drafting the original
> > > > vocabulary translation example for the requirements document :)
> > > > 
> > > > Second, I agree that RIF can be used to transform RIF. Though that's a
> > > > less compelling case because the RIF syntax is so verbose. In practice I
> > > > would expect to translate RIF rules to/from a compact, normalized
> > > > abstract syntax form and perform transformations on that abstract syntax
> > > > - using RIF. 
> > > > 
> > > > Now, turning to your example ...
> > > > 
> > > > > Let's try (1) first, since it's more terse.  Our input looks like
> > > > > this:
> > > > > 
> > > > >       ...
> > > > >       <if>       <!-- or something else that can have an And in it -->
> > > > >          <my:Conjunction>
> > > > >              <my:conjunct>$1</my:conjunct>
> > > > >              <my:conjunct>$2</my:conjunct>
> > > > >              ...
> > > > >          </my:Conjunction>
> > > > >       </if>
> > > > >       ...
> > > > > 
> > > > > and we'll just "replace" the element names.
> > > > > 
> > > > > However, since we don't have a way to "replace" things in this
> > > > > "overlapping" style, we'll just add a second <if> property, and the
> > > > > serializer or consumer will discard this one, since it contains an
> > > > > element not allowed by the dialect syntax.   
> > > > > 
> > > > > So, the rule will add new triples, but leave the old ones intact.
> > > > > The rule will leave us with this:
> > > > > 
> > > > > 
> > > > >       ...
> > > > >       <if>       <!-- or something else that can have an And in it -->
> > > > >          <my:Conjunction>
> > > > >              <my:conjunct>$1</my:conjunct>
> > > > >              <my:conjunct>$2</my:conjunct>
> > > > >              ...
> > > > >          </my:Conjunction>
> > > > >       </if>
> > > > >       <if>      <!-- the same property, whatever it was -->
> > > > >          <And>
> > > > >              <formula>$1</formula>
> > > > >              <formula>$2</formula>
> > > > >              ...
> > > > >          </And>
> > > > >       </if>
> > > > >       ...
> > > > > 
> > > > > Here's the rule:
> > > > > 
> > > > >  forall ?parent ?prop ?old ?conjunct ?new
> > > > >  if And( 
> > > > >    ?parent[?prop->?old]
> > > > >    my:Conjunction#?old[my:conjunct->?conjunct]
> > > > >    ?new = wrapped(?old)  <!-- use a logic function to create a new node -->
> > > > >  ) then And (
> > > > >    ?parent[?prop->?new]
> > > > >    rif:And#?new[rif:formula->?conjunct]
> > > > >  )
> > > > > 
> > > > > This works fine, as long as the reasoning is complete.  However, if
> > > > > the reasoning is ever incomplete, we end up with undetectably
> > > > > incorrect results.  Rules that were "if and(a b c) then d" might get
> > > > > turned into "if and(a b) then d"!   
> > > > > 
> > > > > I don't think it's sensible to expect reasoners to be complete.  It's
> > > > > great to have termination conditions arise from the rules; it's not
> > > > > good to require the reasoner to run until it knows all possible
> > > > > inferences have been made.  With the above approach, there's no
> > > > > termination condition other than "make all the inferences possible".
> > > > 
> > > > Here we differ.
> > > > 
> > > > There is a difference between completeness, termination and premature
> > > > termination. You seem to be wanting robustness against the latter and I
> > > > don't think that is either possible or desirable.
> > > > 
> > > > A rule system (or indeed other proof procedure) may reliably terminate
> > > > but still be incomplete.
> > > > 
> > > > To me any admissible vocabulary transformation rule set has to be
> > > > terminating (reaches a (fixed) point where no new derivations are
> > > > possible). If you run a transformation rule set and it doesn't terminate
> > > > within the resources you can devote to it then you can do nothing with
> > > > the result. There is no reason to expect an aborted transformation to be
> > > > well-formed or usable. That's the nature of such transformations.
> > > > 
> > > > > Alternatively, if we use the list encoding, the rule is very similar:
> > > > > 
> > > > >  forall ?parent ?prop ?old ?conjuncts ?new
> > > > >  if And( 
> > > > >    ?parent[?prop->?old]
> > > > >    my:Conjunction#?old[my:conjuncts->?conjuncts]
> > > > >    ?new = wrapped(?old)
> > > > >  ) then And (
> > > > >    ?parent[?prop->?new]
> > > > >    rif:And#?new[rif:formulas->?conjuncts]
> > > > >  )
> > > > 
> > > > That is an accident of this example. I can easily imagine more complex
> > > > transformations needing e.g. to recursively walk the list constructing a
> > > > new transformed list. Imaging a transformation to reverse the terms in
> > > > the AND for example, which didn't use the DTB built-in. If you abort
> > > > such a rule set part way then you will get an incomplete transformation,
> > > > it may even look like a well-formed list but it will be missing values.
> > > 
> > > I don't think that's the case.   As I mentioned in my message a minute
> > > ago to Michael, I'm thinking about this via backward chaining not
> > > forward chaining, and perhaps that's leading us to different
> > > conclusions.
> > 
> > Direction of chaining makes no difference, in the backward chaining case
> > either it delivers a solution to your entire query or not. For syntactic
> > transformations then your query has to be for all entailments. You can't
> > just query for one triple and expect it to mean anything. 
> > 
> > > What I'm used to from Prolog is having rules which generate many correct
> > > results.   (in this case, each correct result is a transformation of a
> > > rif:Document which is implied by some translation rules.)  You can
> > > abort while it's looking for the first solution, and then you simply
> > > have no result (and you know that).   Or you can get the first solution,
> > > and then decide not to look for any more.  Then you have one result (and
> > > you know it).   You can keep looking as long as you want, getting more
> > > results, or abort when you like.  Eventually, the system may tell you
> > > it's found all the solutions.
> > > 
> > > With the list-style, each solution is a correct transformation.  With
> > > the repeated-property style, you need to let the system run until it's
> > > found all solutions (and it has to be a complete reasoner).
> > 
> > You are confusing a triple result and a solution. For a syntactic
> > transformation the solution has to be a complete derived graph. Even in
> > the list case not all transformations will be restricted to a single
> > list.
> 
> What I mean by "result" and "solution" might be more precisely called an
> "extraction".    (Section 6, defining the extraction mapping XTr isn't
> quite right; it was a first attempt.)
> 
> Procedurally: imagine we are implementing the "extract" function which
> creates a RIF XML tree, given an RDF Graph G, a focus-node N, and an
> XML indication of the dialect we want (schema+root).  
> 
> With the repeated-properties style, extract() would be doing a lot of
> searches of the entire graph, finding all matches to each graph
> pattern.  For example, when it's working on a rif:And node, it would be
> searching for all the rif:formula arcs off that node.  It needs to find
> all the matches, in each case, before it can return the extraction.  
> If it missed a triple, it would end up extracting things like a&c=>d
> when it was supposed to return a&b&c=>d.

Correct but that is non issue, you query for all the results you get
them all. I see no reason you should "miss a triple". I routinely
transform things by materializing entailed graphs.

> In contrast, with the list style, extract() needs to find one match to
> each of its triple patterns.  From the rif:And node, it needs to find
> one rif:allTrue arc.  If no match can be found, then no extract is
> possible.  If additional matches can be found, then there are multiple
> extractions possible; that's what what I meant by multiple
> results/solutions, above.  

For a well-designed transformation there should not be multiple
extractions possible.

Simple transformations should be deterministic.

If you really want to write a non-deterministic generate/test type
transformer then you need to ensure that each solution is a distinct
structure - mint new nodes for each transformed term. Transformation in
place (other than rewrite transformations where the old form is deleted)
would be a nightmare. [All of which means you need to be outside Core.]

Furthermore, as I've said, your example is particularly simple since all
the changes are within the one list.  Consider more complex
transformations, such as where you are transferring terms from one list
to another (e.g. to map to a normal form). Now your extraction needs to
make sure it gets compatible versions of an arbitrary number of
different lists. Or consider transformations which involve multiple
rules (e.g. Lloyd-Topor type transforms). Simply having well formed
lists is not a general termination criteria or guarantee you are
extracting a sensible rule set.

> If the graph G is a plain old graph, a collection of RDF triples stored
> in a multiply-indexed Set data structure in memory, then the
> repeated-properties style is certainly simpler.   Searching the whole
> graph is quick and easy.    Certainly easier than traversing rdf:Lists.
> 
> But, in my world of foaf:name == foaf:firstName + " " + foaf:lastName,
> we're querying G under inference. 

Transforming FOAF is a different case from transforming RIF.

In the FOAF case each triple is independently a valid conclusion, FOAF
is an application of RDF as a KR language.

In RIF-in-RDF this is not the case, we are not doing KR here but using
RDF to syntactically encode a data structure. Any triple level results
of a rule essentially mean "this triple is part of the final syntactic
transform" and is not meaningfully interpretable on its own without all
the rest of entailed triples.

>   There will typically be one
> extraction which comes out of ground facts (the plain old graph), but
> there may be more that appear as reasoning is done.  

I just don't buy this as an approach to transformation.

If you are going to take a rule set and transform it you need a clean
way to separate the original from the transformed rule set. I don't see
how you can to do triple level inference and expect each possible
extraction from the result graph to be an intended rule set, you'll end
up with a mix of transformed and untransformed rules and any
transformations which are not sufficiently localized will be broken.

Possible ways to ensure you have clean separation or original and
transformed rules include:
  - use rewrite rules
  - only do an extraction from the entailed graph, not the source graph
(how we do it in Jena)
  - construct a new data structure to contain the transformed rules

None of these are possible in RIF-CORE running over the RIF-in-RDF
encoding. You either need a recursive data structure (e.g. lists),
deletion (RIF-PR) or the ability to mint new resources. Personally I
expect most RIF-CORE implementations targeted at RDF to include the sort
of Skolem bNode allocation function we nearly got into RIF and they will
use those for transformations.

> This gives us RIF fallback. The extractor is actually trying to extract
> a RIF document that matches a given dialect (eg Core).  But in this
> scenario, what's in the plain-old-graph includes some extensions, so
> the extractor will fail in its attempt to find something that matches
> RIF Core syntax, using only the plain-old-graph.  But it will keep
> looking, and if the fallback rules are run (using exactly the same
> mechanism as the foaf:name rules are run), they'll produce additional
> triples, eventually allowing a RIF Core extraction to be found.
> 
> It may be that multiple RIF Core extractions can be found, because
> several fallbacks are available.  Ideally, they can include data which
> helps fallback processors decide which extraction to use, and when it
> can stop looking for better ones.  

Hmm.

> I think there's a level at which this argument is theoretically
> equivalent to the KR argument in the parallel thread, having to do with
> RDF being a KR not a data structure, but I can't think of how to say
> that clearly right now. 
>  I guess the point is that even if we're not
> doing RIF fallback processing, we should understand that querying RDF
> graphs may involve reasoning, and querying for all triple-matches can
> be vastly harder than querying for triple-matches one at a time as you
> need them. 

I completely disagree with "vastly harder".

>   Of course, lots of RDF applications are being built which
> assume they can do complete queries; I think the ones which rely
> strongly on that assumption -- producing harmful results when some
> triples aren't returned -- will need to be redesigned as RDF inference
> becomes commonplace.  

If you assume, for a specific application, some form of negation as
failure then you need to ensure the inference procedure you are building
over is complete. Sure.

That's one of the nice things about RIF we can now define precise
inference profiles to rely on, as Ivan has argued. 


Bottom line for me remains that I won't object, certainly not formally,
to using lists in the encoding. I don't see it as harmful and may even
be helpful.  

I do object somewhat to justifying it on the grounds of this sort of
non-deterministic transformation approach. That's not an approach I'd
like to see recommended.

Dave

P.S. I see there is a RIF call tomorrow. I've registered my regrets but
may possibly make it for the last 30mins, in which case you can tell me
how this one got resolved :)

Received on Monday, 26 July 2010 22:06:51 UTC