Re: RIF-in-RDF: Requirement 4 from Sandro Hawke on 2010-07-26 (public-rif-wg@w3.org from July 2010)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 26 Jul 2010 12:58:45 -0400
To: Dave Reynolds <dave.e.reynolds@gmail.com>
Cc: public-rif-wg <public-rif-wg@w3.org>
Message-ID: <1280163525.1863.347.camel@waldron>
On Mon, 2010-07-26 at 08:38 +0100, Dave Reynolds wrote:
> On Sun, 2010-07-25 at 18:59 -0400, Sandro Hawke wrote: 
> > On Sun, 2010-07-25 at 23:16 +0100, Dave Reynolds wrote:
> > > Hi Sandro,
> > > 
> > > Thanks for the detailed description.
> > > 
> > > First, note that I very much agree about the value of RIF for vocabulary
> > > translation - I seem to recall proposing & drafting the original
> > > vocabulary translation example for the requirements document :)
> > > 
> > > Second, I agree that RIF can be used to transform RIF. Though that's a
> > > less compelling case because the RIF syntax is so verbose. In practice I
> > > would expect to translate RIF rules to/from a compact, normalized
> > > abstract syntax form and perform transformations on that abstract syntax
> > > - using RIF. 
> > > 
> > > Now, turning to your example ...
> > > 
> > > > Let's try (1) first, since it's more terse.  Our input looks like
> > > > this:
> > > > 
> > > >       ...
> > > >       <if>       <!-- or something else that can have an And in it -->
> > > >          <my:Conjunction>
> > > >              <my:conjunct>$1</my:conjunct>
> > > >              <my:conjunct>$2</my:conjunct>
> > > >              ...
> > > >          </my:Conjunction>
> > > >       </if>
> > > >       ...
> > > > 
> > > > and we'll just "replace" the element names.
> > > > 
> > > > However, since we don't have a way to "replace" things in this
> > > > "overlapping" style, we'll just add a second <if> property, and the
> > > > serializer or consumer will discard this one, since it contains an
> > > > element not allowed by the dialect syntax.   
> > > > 
> > > > So, the rule will add new triples, but leave the old ones intact.
> > > > The rule will leave us with this:
> > > > 
> > > > 
> > > >       ...
> > > >       <if>       <!-- or something else that can have an And in it -->
> > > >          <my:Conjunction>
> > > >              <my:conjunct>$1</my:conjunct>
> > > >              <my:conjunct>$2</my:conjunct>
> > > >              ...
> > > >          </my:Conjunction>
> > > >       </if>
> > > >       <if>      <!-- the same property, whatever it was -->
> > > >          <And>
> > > >              <formula>$1</formula>
> > > >              <formula>$2</formula>
> > > >              ...
> > > >          </And>
> > > >       </if>
> > > >       ...
> > > > 
> > > > Here's the rule:
> > > > 
> > > >  forall ?parent ?prop ?old ?conjunct ?new
> > > >  if And( 
> > > >    ?parent[?prop->?old]
> > > >    my:Conjunction#?old[my:conjunct->?conjunct]
> > > >    ?new = wrapped(?old)  <!-- use a logic function to create a new node -->
> > > >  ) then And (
> > > >    ?parent[?prop->?new]
> > > >    rif:And#?new[rif:formula->?conjunct]
> > > >  )
> > > > 
> > > > This works fine, as long as the reasoning is complete.  However, if
> > > > the reasoning is ever incomplete, we end up with undetectably
> > > > incorrect results.  Rules that were "if and(a b c) then d" might get
> > > > turned into "if and(a b) then d"!   
> > > > 
> > > > I don't think it's sensible to expect reasoners to be complete.  It's
> > > > great to have termination conditions arise from the rules; it's not
> > > > good to require the reasoner to run until it knows all possible
> > > > inferences have been made.  With the above approach, there's no
> > > > termination condition other than "make all the inferences possible".
> > > 
> > > Here we differ.
> > > 
> > > There is a difference between completeness, termination and premature
> > > termination. You seem to be wanting robustness against the latter and I
> > > don't think that is either possible or desirable.
> > > 
> > > A rule system (or indeed other proof procedure) may reliably terminate
> > > but still be incomplete.
> > > 
> > > To me any admissible vocabulary transformation rule set has to be
> > > terminating (reaches a (fixed) point where no new derivations are
> > > possible). If you run a transformation rule set and it doesn't terminate
> > > within the resources you can devote to it then you can do nothing with
> > > the result. There is no reason to expect an aborted transformation to be
> > > well-formed or usable. That's the nature of such transformations.
> > > 
> > > > Alternatively, if we use the list encoding, the rule is very similar:
> > > > 
> > > >  forall ?parent ?prop ?old ?conjuncts ?new
> > > >  if And( 
> > > >    ?parent[?prop->?old]
> > > >    my:Conjunction#?old[my:conjuncts->?conjuncts]
> > > >    ?new = wrapped(?old)
> > > >  ) then And (
> > > >    ?parent[?prop->?new]
> > > >    rif:And#?new[rif:formulas->?conjuncts]
> > > >  )
> > > 
> > > That is an accident of this example. I can easily imagine more complex
> > > transformations needing e.g. to recursively walk the list constructing a
> > > new transformed list. Imaging a transformation to reverse the terms in
> > > the AND for example, which didn't use the DTB built-in. If you abort
> > > such a rule set part way then you will get an incomplete transformation,
> > > it may even look like a well-formed list but it will be missing values.
> > 
> > I don't think that's the case.   As I mentioned in my message a minute
> > ago to Michael, I'm thinking about this via backward chaining not
> > forward chaining, and perhaps that's leading us to different
> > conclusions.
> 
> Direction of chaining makes no difference, in the backward chaining case
> either it delivers a solution to your entire query or not. For syntactic
> transformations then your query has to be for all entailments. You can't
> just query for one triple and expect it to mean anything. 
> 
> > What I'm used to from Prolog is having rules which generate many correct
> > results.   (in this case, each correct result is a transformation of a
> > rif:Document which is implied by some translation rules.)  You can
> > abort while it's looking for the first solution, and then you simply
> > have no result (and you know that).   Or you can get the first solution,
> > and then decide not to look for any more.  Then you have one result (and
> > you know it).   You can keep looking as long as you want, getting more
> > results, or abort when you like.  Eventually, the system may tell you
> > it's found all the solutions.
> > 
> > With the list-style, each solution is a correct transformation.  With
> > the repeated-property style, you need to let the system run until it's
> > found all solutions (and it has to be a complete reasoner).
> 
> You are confusing a triple result and a solution. For a syntactic
> transformation the solution has to be a complete derived graph. Even in
> the list case not all transformations will be restricted to a single
> list.

What I mean by "result" and "solution" might be more precisely called an
"extraction".    (Section 6, defining the extraction mapping XTr isn't
quite right; it was a first attempt.)

Procedurally: imagine we are implementing the "extract" function which
creates a RIF XML tree, given an RDF Graph G, a focus-node N, and an
XML indication of the dialect we want (schema+root).  

With the repeated-properties style, extract() would be doing a lot of
searches of the entire graph, finding all matches to each graph
pattern.  For example, when it's working on a rif:And node, it would be
searching for all the rif:formula arcs off that node.  It needs to find
all the matches, in each case, before it can return the extraction.  
If it missed a triple, it would end up extracting things like a&c=>d
when it was supposed to return a&b&c=>d.

In contrast, with the list style, extract() needs to find one match to
each of its triple patterns.  From the rif:And node, it needs to find
one rif:allTrue arc.  If no match can be found, then no extract is
possible.  If additional matches can be found, then there are multiple
extractions possible; that's what what I meant by multiple
results/solutions, above.  With this approach, extract() is
non-deterministic; in Python I would implement it as a generator; in my
1990s Java I would implement it as an iterator.  In Prolog I could just
use a non-deterministic predicate.

If the graph G is a plain old graph, a collection of RDF triples stored
in a multiply-indexed Set data structure in memory, then the
repeated-properties style is certainly simpler.   Searching the whole
graph is quick and easy.    Certainly easier than traversing rdf:Lists.

But, in my world of foaf:name == foaf:firstName + " " + foaf:lastName,
we're querying G under inference.   There will typically be one
extraction which comes out of ground facts (the plain old graph), but
there may be more that appear as reasoning is done.  

This gives us RIF fallback. The extractor is actually trying to extract
a RIF document that matches a given dialect (eg Core).  But in this
scenario, what's in the plain-old-graph includes some extensions, so
the extractor will fail in its attempt to find something that matches
RIF Core syntax, using only the plain-old-graph.  But it will keep
looking, and if the fallback rules are run (using exactly the same
mechanism as the foaf:name rules are run), they'll produce additional
triples, eventually allowing a RIF Core extraction to be found.

It may be that multiple RIF Core extractions can be found, because
several fallbacks are available.  Ideally, they can include data which
helps fallback processors decide which extraction to use, and when it
can stop looking for better ones.  

I think there's a level at which this argument is theoretically
equivalent to the KR argument in the parallel thread, having to do with
RDF being a KR not a data structure, but I can't think of how to say
that clearly right now.  I guess the point is that even if we're not
doing RIF fallback processing, we should understand that querying RDF
graphs may involve reasoning, and querying for all triple-matches can
be vastly harder than querying for triple-matches one at a time as you
need them.   Of course, lots of RDF applications are being built which
assume they can do complete queries; I think the ones which rely
strongly on that assumption -- producing harmful results when some
triples aren't returned -- will need to be redesigned as RDF inference
becomes commonplace.   I realize I'm in foggy future territory now;
feel free to ignore this paragraph.  :-)

     -- Sandro
Received on Monday, 26 July 2010 16:58:57 UTC