Re: RIF-in-RDF: Requirement 4

On Wed, 2010-07-28 at 10:16 -0400, Sandro Hawke wrote: 
> On Tue, 2010-07-27 at 23:36 +0100, Dave Reynolds wrote:
> > Hi Sandro,
> > 
> > On Tue, 2010-07-27 at 11:09 -0400, Sandro Hawke wrote: 
> > > Thanks for all your work on this;
> > 
> > You've done a lot more than I have.
> > 
> > > I understand and respect it if you
> > > don't want to continue this discussion, but in case you do, here's my
> > > response.
> > 
> > It's sort of moot now that we've resolved but here's some responses ...
> > 
> > The response to your final ASIDE is, however, not moot. 
> > Jos, what's your opinion on that bit (final example from Sandro and my
> > comment, you might have to scroll down a bit :) )?
> 
> [moving that to separate thread]
> 
> > > 
> > > On Mon, 2010-07-26 at 23:06 +0100, Dave Reynolds wrote:
> > > > On Mon, 2010-07-26 at 12:58 -0400, Sandro Hawke wrote: 
> > > > > On Mon, 2010-07-26 at 08:38 +0100, Dave Reynolds wrote:
> > > > > > On Sun, 2010-07-25 at 18:59 -0400, Sandro Hawke wrote: 
> > > > > > > On Sun, 2010-07-25 at 23:16 +0100, Dave Reynolds wrote:
> > > > > > > > Hi Sandro,
> > > > > > > > 
> > > > > > > > Thanks for the detailed description.
> > > > > > > > 
> > > > > > > > First, note that I very much agree about the value of RIF for vocabulary
> > > > > > > > translation - I seem to recall proposing & drafting the original
> > > > > > > > vocabulary translation example for the requirements document :)
> > > > > > > > 
> > > > > > > > Second, I agree that RIF can be used to transform RIF. Though that's a
> > > > > > > > less compelling case because the RIF syntax is so verbose. In practice I
> > > > > > > > would expect to translate RIF rules to/from a compact, normalized
> > > > > > > > abstract syntax form and perform transformations on that abstract syntax
> > > > > > > > - using RIF. 
> > > > > > > > 
> > > > > > > > Now, turning to your example ...
> > > > > > > > 
> > > > > > > > > Let's try (1) first, since it's more terse.  Our input looks like
> > > > > > > > > this:
> > > > > > > > > 
> > > > > > > > >       ...
> > > > > > > > >       <if>       <!-- or something else that can have an And in it -->
> > > > > > > > >          <my:Conjunction>
> > > > > > > > >              <my:conjunct>$1</my:conjunct>
> > > > > > > > >              <my:conjunct>$2</my:conjunct>
> > > > > > > > >              ...
> > > > > > > > >          </my:Conjunction>
> > > > > > > > >       </if>
> > > > > > > > >       ...
> > > > > > > > > 
> > > > > > > > > and we'll just "replace" the element names.
> > > > > > > > > 
> > > > > > > > > However, since we don't have a way to "replace" things in this
> > > > > > > > > "overlapping" style, we'll just add a second <if> property, and the
> > > > > > > > > serializer or consumer will discard this one, since it contains an
> > > > > > > > > element not allowed by the dialect syntax.   
> > > > > > > > > 
> > > > > > > > > So, the rule will add new triples, but leave the old ones intact.
> > > > > > > > > The rule will leave us with this:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >       ...
> > > > > > > > >       <if>       <!-- or something else that can have an And in it -->
> > > > > > > > >          <my:Conjunction>
> > > > > > > > >              <my:conjunct>$1</my:conjunct>
> > > > > > > > >              <my:conjunct>$2</my:conjunct>
> > > > > > > > >              ...
> > > > > > > > >          </my:Conjunction>
> > > > > > > > >       </if>
> > > > > > > > >       <if>      <!-- the same property, whatever it was -->
> > > > > > > > >          <And>
> > > > > > > > >              <formula>$1</formula>
> > > > > > > > >              <formula>$2</formula>
> > > > > > > > >              ...
> > > > > > > > >          </And>
> > > > > > > > >       </if>
> > > > > > > > >       ...
> > > > > > > > > 
> > > > > > > > > Here's the rule:
> > > > > > > > > 
> > > > > > > > >  forall ?parent ?prop ?old ?conjunct ?new
> > > > > > > > >  if And( 
> > > > > > > > >    ?parent[?prop->?old]
> > > > > > > > >    my:Conjunction#?old[my:conjunct->?conjunct]
> > > > > > > > >    ?new = wrapped(?old)  <!-- use a logic function to create a new node -->
> > > > > > > > >  ) then And (
> > > > > > > > >    ?parent[?prop->?new]
> > > > > > > > >    rif:And#?new[rif:formula->?conjunct]
> > > > > > > > >  )
> > > > > > > > > 
> > > > > > > > > This works fine, as long as the reasoning is complete.  However, if
> > > > > > > > > the reasoning is ever incomplete, we end up with undetectably
> > > > > > > > > incorrect results.  Rules that were "if and(a b c) then d" might get
> > > > > > > > > turned into "if and(a b) then d"!   
> > > > > > > > > 
> > > > > > > > > I don't think it's sensible to expect reasoners to be complete.  It's
> > > > > > > > > great to have termination conditions arise from the rules; it's not
> > > > > > > > > good to require the reasoner to run until it knows all possible
> > > > > > > > > inferences have been made.  With the above approach, there's no
> > > > > > > > > termination condition other than "make all the inferences possible".
> > > > > > > > 
> > > > > > > > Here we differ.
> > > > > > > > 
> > > > > > > > There is a difference between completeness, termination and premature
> > > > > > > > termination. You seem to be wanting robustness against the latter and I
> > > > > > > > don't think that is either possible or desirable.
> > > > > > > > 
> > > > > > > > A rule system (or indeed other proof procedure) may reliably terminate
> > > > > > > > but still be incomplete.
> > > > > > > > 
> > > > > > > > To me any admissible vocabulary transformation rule set has to be
> > > > > > > > terminating (reaches a (fixed) point where no new derivations are
> > > > > > > > possible). If you run a transformation rule set and it doesn't terminate
> > > > > > > > within the resources you can devote to it then you can do nothing with
> > > > > > > > the result. There is no reason to expect an aborted transformation to be
> > > > > > > > well-formed or usable. That's the nature of such transformations.
> > > > > > > > 
> > > > > > > > > Alternatively, if we use the list encoding, the rule is very similar:
> > > > > > > > > 
> > > > > > > > >  forall ?parent ?prop ?old ?conjuncts ?new
> > > > > > > > >  if And( 
> > > > > > > > >    ?parent[?prop->?old]
> > > > > > > > >    my:Conjunction#?old[my:conjuncts->?conjuncts]
> > > > > > > > >    ?new = wrapped(?old)
> > > > > > > > >  ) then And (
> > > > > > > > >    ?parent[?prop->?new]
> > > > > > > > >    rif:And#?new[rif:formulas->?conjuncts]
> > > > > > > > >  )
> > > > > > > > 
> > > > > > > > That is an accident of this example. I can easily imagine more complex
> > > > > > > > transformations needing e.g. to recursively walk the list constructing a
> > > > > > > > new transformed list. Imaging a transformation to reverse the terms in
> > > > > > > > the AND for example, which didn't use the DTB built-in. If you abort
> > > > > > > > such a rule set part way then you will get an incomplete transformation,
> > > > > > > > it may even look like a well-formed list but it will be missing values.
> > > > > > > 
> > > > > > > I don't think that's the case.   As I mentioned in my message a minute
> > > > > > > ago to Michael, I'm thinking about this via backward chaining not
> > > > > > > forward chaining, and perhaps that's leading us to different
> > > > > > > conclusions.
> > > > > > 
> > > > > > Direction of chaining makes no difference, in the backward chaining case
> > > > > > either it delivers a solution to your entire query or not. For syntactic
> > > > > > transformations then your query has to be for all entailments. You can't
> > > > > > just query for one triple and expect it to mean anything. 
> > > > > > 
> > > > > > > What I'm used to from Prolog is having rules which generate many correct
> > > > > > > results.   (in this case, each correct result is a transformation of a
> > > > > > > rif:Document which is implied by some translation rules.)  You can
> > > > > > > abort while it's looking for the first solution, and then you simply
> > > > > > > have no result (and you know that).   Or you can get the first solution,
> > > > > > > and then decide not to look for any more.  Then you have one result (and
> > > > > > > you know it).   You can keep looking as long as you want, getting more
> > > > > > > results, or abort when you like.  Eventually, the system may tell you
> > > > > > > it's found all the solutions.
> > > > > > > 
> > > > > > > With the list-style, each solution is a correct transformation.  With
> > > > > > > the repeated-property style, you need to let the system run until it's
> > > > > > > found all solutions (and it has to be a complete reasoner).
> > > > > > 
> > > > > > You are confusing a triple result and a solution. For a syntactic
> > > > > > transformation the solution has to be a complete derived graph. Even in
> > > > > > the list case not all transformations will be restricted to a single
> > > > > > list.
> > > > > 
> > > > > What I mean by "result" and "solution" might be more precisely called an
> > > > > "extraction".    (Section 6, defining the extraction mapping XTr isn't
> > > > > quite right; it was a first attempt.)
> > > > > 
> > > > > Procedurally: imagine we are implementing the "extract" function which
> > > > > creates a RIF XML tree, given an RDF Graph G, a focus-node N, and an
> > > > > XML indication of the dialect we want (schema+root).  
> > > > > 
> > > > > With the repeated-properties style, extract() would be doing a lot of
> > > > > searches of the entire graph, finding all matches to each graph
> > > > > pattern.  For example, when it's working on a rif:And node, it would be
> > > > > searching for all the rif:formula arcs off that node.  It needs to find
> > > > > all the matches, in each case, before it can return the extraction.  
> > > > > If it missed a triple, it would end up extracting things like a&c=>d
> > > > > when it was supposed to return a&b&c=>d.
> > > > 
> > > > Correct but that is non issue, you query for all the results you get
> > > > them all. I see no reason you should "miss a triple". I routinely
> > > > transform things by materializing entailed graphs.
> > > 
> > > Note that there are kinds of reasoning that cannot be done this way. 
> > > Specifically, anything with a disjunction in the head, as you might get
> > > with FOL or OWL DL.  
> > 
> > Sure, but as I said before, I wouldn't count such a thing as an
> > admissible transformation rule. 
> > 
> > The only reasoning going on here is at most RDF simple entailment, RDFS
> > closure with respect to the RIF-in-RDF vocabulary and the RIF
> > transformation rules you are applying. If you write transformation rules
> > (using some future RIF extension) which are either non-deterministic,
> > not decidable or not tractable then the RIF syntax is the least of your
> > worries :)
> 
> I think this may come down to a philosophical difference about how much
> we should be constraining things at the RDF interface.   Taken to
> extremes, I'm aiming for a very loose interface where folks assert facts
> to the global database, and later ask it for facts (stated or implied by
> trusted sources).   My guess is you put more emphasis on making
> something that works today, without worrying so much about the hazy
> future.

Are you talking specifically about RIF here?

In terms of RDF in general we are on the same page. 

The problem is RIF. To me there are two clean ways to do RIF in RDF:

One is the create a vocabulary for expressing rules in RDF which can be
given an RDF level semantics. Analogous to OWL. This seems to what you'd
prefer but I think we agree that this is well out of scope, if possible
at all.  

The other is to do a really pure embedding, encode individual rules as
XMLLiterals and have an RDF vocabulary which just provides rule set
grouping and metadata (including predicate usage metadata to support
search without having to descend into the literals). I remember
proposing this early on and spending some time on it at one of the first
RIF workshops.

With either of those options you can adopt the loose fact-by-fact
approach you are talking about.

The current option of syntactic encoding of RIF within RDF is the one
which is the pragmatic compromise. I'm okay with it because in practice
rules are easier to query and transform that way (since our semantic web
standards are so embarrassingly bad at handling XML literals).

> > > > > In contrast, with the list style, extract() needs to find one match to
> > > > > each of its triple patterns.  From the rif:And node, it needs to find
> > > > > one rif:allTrue arc.  If no match can be found, then no extract is
> > > > > possible.  If additional matches can be found, then there are multiple
> > > > > extractions possible; that's what what I meant by multiple
> > > > > results/solutions, above.  
> > > > 
> > > > For a well-designed transformation there should not be multiple
> > > > extractions possible.
> > > > 
> > > > Simple transformations should be deterministic.
> > > > 
> > > > If you really want to write a non-deterministic generate/test type
> > > > transformer then you need to ensure that each solution is a distinct
> > > > structure - mint new nodes for each transformed term. Transformation in
> > > > place (other than rewrite transformations where the old form is deleted)
> > > > would be a nightmare. [All of which means you need to be outside Core.]
> > > > 
> > > > Furthermore, as I've said, your example is particularly simple since all
> > > > the changes are within the one list.  Consider more complex
> > > > transformations, such as where you are transferring terms from one list
> > > > to another (e.g. to map to a normal form). Now your extraction needs to
> > > > make sure it gets compatible versions of an arbitrary number of
> > > > different lists. Or consider transformations which involve multiple
> > > > rules (e.g. Lloyd-Topor type transforms). Simply having well formed
> > > > lists is not a general termination criteria or guarantee you are
> > > > extracting a sensible rule set.
> > > 
> > > [addressed below]
> > > 
> > > > > If the graph G is a plain old graph, a collection of RDF triples stored
> > > > > in a multiply-indexed Set data structure in memory, then the
> > > > > repeated-properties style is certainly simpler.   Searching the whole
> > > > > graph is quick and easy.    Certainly easier than traversing rdf:Lists.
> > > > > 
> > > > > But, in my world of foaf:name == foaf:firstName + " " + foaf:lastName,
> > > > > we're querying G under inference. 
> > > > 
> > > > Transforming FOAF is a different case from transforming RIF.
> > > > 
> > > > In the FOAF case each triple is independently a valid conclusion, FOAF
> > > > is an application of RDF as a KR language.
> > > > 
> > > > In RIF-in-RDF this is not the case, we are not doing KR here but using
> > > > RDF to syntactically encode a data structure. 
> > > 
> > > This may be the heart of the matter: when is RDF a KR?  The OWL WG
> > > concluded, with great frustration: always, when using OWL.  The
> > > SPARQL WG said each server could decide for itself.  At the recent RDF
> > > Next Steps workshop, Peter Patel-Schneider (who has probably done the
> > > most serious work on the subject) presented an argument for changing
> > > RDF, to make it not be a KR.  People were intrigued, but it still polled
> > > fairly low in the priorities for RDF work, for what that's worth.
> > > 
> > > When I'm thinking about RIF, I definitely think of RDF as a KR.  In
> > > particular, I don't want RIF-in-RDF to force RDF to not be a KR.  Since
> > > the repeated-property encoding is incompatible with RDF being a KR, it
> > > seems to me we can't use that encoding.
> > 
> > The question is not the nature of RDF but the nature of the RIF-in-RDF
> > proposal.
> > 
> > You need to do a *lot* more to make RIF-in-RDF a use of RDF as a KR than
> > just use lists!!
> > 
> > The RDF/OWL issues arise because OWL is layered on top of RDF, it is
> > trying to be a semantic extension and the semantics of OWL is
> > expressible as a RDF level semantics. That was a major undertaking, of
> > course.
> > 
> > None of that is true for RIF. The current proposed encoding is not a
> > layering of RIF on top of RDF, there is no proposal (and no chance) to
> > give the RDF terms in RIF-in-RDF a semantics which would correspond to
> > RIF. Even if it could be done for RIF-Core it definitely couldn't be
> > done for RIF-PR in any reasonable manner. Lists are just not the central
> > issue in that, they might be necessary but they are a long way from
> > being sufficient.
> 
> This may be.  This is probably why I retreat to talking about running
> code.  For instance, some challenges certainly arise once we define
> import-from-rdf and can create self-referential graphs/rulesets.  My
> approach for now is to say "don't do that", and if the functionality
> turns out to be useful, I expect some theory folks will come along and
> hash out the nasty details.  I have in the past looked into some of the
> technical solutions, eg Perlis' technique of moving all negation inward,
> used in KIF's wtr assertion operator.  I'm not proposing RIF say
> anything about that stuff now; I think leaving it unspecified or
> forbidden is good enough for now. 

Certainly not a problem to worry about right now.

> What I don't want to do is adopt a solution which rules out the KR
> approach (like using repeated properties).
> 
> > > > Any triple level results
> > > > of a rule essentially mean "this triple is part of the final syntactic
> > > > transform" and is not meaningfully interpretable on its own without all
> > > > the rest of entailed triples.
> > > 
> > > To me, each triple in the RIF-in-RDF encoding is, indeed, a fact. 
> > > They're facts about the syntactic structures of that particular
> > > (imagined) RIF document.
> > 
> > Exactly.
> > 
> > > > >   There will typically be one
> > > > > extraction which comes out of ground facts (the plain old graph), but
> > > > > there may be more that appear as reasoning is done.  
> > > > 
> > > > I just don't buy this as an approach to transformation.
> > > > 
> > > > If you are going to take a rule set and transform it you need a clean
> > > > way to separate the original from the transformed rule set. I don't see
> > > > how you can to do triple level inference and expect each possible
> > > > extraction from the result graph to be an intended rule set, you'll end
> > > > up with a mix of transformed and untransformed rules and any
> > > > transformations which are not sufficiently localized will be broken.
> > > > 
> > > > Possible ways to ensure you have clean separation or original and
> > > > transformed rules include:
> > > >   - use rewrite rules
> > > >   - only do an extraction from the entailed graph, not the source graph
> > > > (how we do it in Jena)
> > > >   - construct a new data structure to contain the transformed rules
> > > 
> > > I believe it works, as in my very-brief example, to branch the structure
> > > wherever there is an alternative, as in having two "if" properties --
> > > one leading to the old (non-core) subtree, the other leading to the new
> > > (core) subtree.   Obviously this is the kind of thing that needs to be
> > > fully implemented and pass test cases before being fully accepted, but
> > > I've gone through the code in my head several times.
> > 
> > To be clean the transformed rule needs to be a complete new structure
> > with a different URI, and the new "if" property attached to that. At the
> > very least the metadata on the source rule is not applicable to
> > transformed rule.
> 
> Going back to the fallback discussions, my observation was that every
> fallback transformation had some kind of impact. The key, I suggested,
> was to indicate that impact and keep track of it so downstream software
> could decide how to act differently.  You probably remember this table:
> 
> http://www.w3.org/2005/rules/wg/wiki/Arch/Extensibility_Design_Choices#line-497

Yes.

> So, I don't think all transforms mean we have to discard all metadata,
> but there is some balancing there, with a lot of UI work to do.   I can
> imagine a rule editor that offers a reduce-to-RIF-Core button, and that
> operation would provide lots of metadata around the replaced bits, but
> leave the rest of the document alone.  Maybe there's some indication at
> the top, too.

At some level if you transform the rule it is not the same rule and
should not have the same URI. Sometimes it might be a different
representation of the same rule but not alway.

In terms of metadata you might copy across some metadata (e.g.
copyright), copy and augment some (e.g. the provenance chain) and
omit/replace some (e.g. the dialect). I agree you don't discard *all*,
but the point is that you must not force identity otherwise you can't
say *anything* different about the before and after rules.

If you are publishing a rule within some UI then this is all irrelevant,
the URI for the rule is what you say it is when you publish it, any
internal transformations along the way are within your system.

However, if you are transforming some else's published rule then you
ought to  give it a different URI and must do so if you are going to
republish.

> > > I wanted to have a running implementation to show you before having this
> > > discussion, but decided it was unlikely I'd find time for that as
> > > quickly.  If I were to proceed with that implementation, do you know
> > > what you'd want to see (or hear about) in it, to reassure you this
> > > approach was workable, or even nice?
> > 
> > I've no doubt your example problem is implementable. I don't think
> > exhibiting the implementation of the simplified example would help much.
> 
> I didn't mean the simplified example.  What if I implemented some of the
> useful things on that wishlist, like Lloyd-Topor, or rewriting-away
> named-argument uniterms?    Would that make you significantly more
> comfortable with this approach?   Maybe a shell-command version of that
> reduce-to-Core button....  Not that I necessarily have the time, but
> your answer will shift my priorities a bit.

I don't see how this would make a difference. 

I perfectly believe you can do all the transformations in a
representation with lists. It would make more sense for me to do that
with the non-list representation to show how easy it is. Except that it
would be pointless since we have already passed the resolution.

I do claim that writing such rules in a non-deterministic way with the
pieces of original and transformed rules overlaid on top of each other
is a bad approach and not one to encourage. I don't dispute you
personally could write such a beast and get it working but I don't think
the implementation would reassure me; I'd still worry the style would
make my life a misery on jena-dev if other people copied you.

Dave

Received on Wednesday, 28 July 2010 15:47:00 UTC