Re: RIF-in-RDF: Requirement 4 from Michael Kifer on 2010-07-25 (public-rif-wg@w3.org from July 2010)

From: Michael Kifer <kifer@cs.stonybrook.edu>
Date: Sun, 25 Jul 2010 18:52:45 -0400
To: Sandro Hawke <sandro@w3.org>
CC: public-rif-wg <public-rif-wg@w3.org>
Message-ID: <20100725185245.6cad37a0@kiferserv>
pls see within.

On Sun, 25 Jul 2010 18:46:02 -0400
Sandro Hawke <sandro@w3.org> wrote:

> On Sun, 2010-07-25 at 17:57 -0400, Michael Kifer wrote:
> > Sandro,
> > I don't understand your argument. So, you are proposing that somebody would
> > explicitly write
> >
> > foo[my:conjuncts->list(bar1 bar2 bar3)]
> >
> > If so, why can't the same one write this:
> >
> > foo[my:conjuncts->bar1]
> > foo[my:conjuncts->bar2]
> > foo[my:conjuncts->bar3]
> >
> > ?
> 
> The key point is that after the translation rules are done, we need to
> have inferred something like:
> 
>    foo[rif:formulas->list(bar1 bar2 bar3)]
> 
> instead of
> 
>    foo[rif:formula->bar1]
>    foo[rif:formula->bar2]
>    foo[rif:formula->bar3]
> 
> And we need this, because otherwise we don't know when to stop looking
> for more solutions.   With the first form, we can stop as soon as we get
> the first match (in the context of a large matching process, trying to
> find a match for rif:Document).
> 
> In the second form, however, it's not clear when to stop.  We will only
> know we have all the appropriate values for rif:formula when a complete
> reasoner has run until termination.  But I think many RIF reasoners will
> not be complete, and I think many sets of fallback rules will produce
> lots of unwanted solutions, perhaps never terminating.
> 
> If we can stop after finding the first solution, or the first few good
> ones (as in the list style), we're okay -- if we have to find them all
> before knowing what a correct translation looks like, I think that will
> be a problem.

If you can infer foo[rif:formulas->list(bar1 bar2 bar3)] at all then you can
infer 

   foo[rif:formula->bar1]
   foo[rif:formula->bar2]
   foo[rif:formula->bar3]

AND terminate. You are chasing a non-issue, I think.

michael


> 
> (Obviously, I'm thinking in terms of a backward-chaining BLD system
> here, trying to extract a rif:Document.   I don't understand termination
> conditions in PRD well enough to know how to handle this, there, or if
> it's even possible.)
> 
>     -- Sandro
> 
> > michael
> >
> >
> >
> > On Sun, 25 Jul 2010 16:49:52 -0400
> > Sandro Hawke <sandro@w3.org> wrote:
> >
> > > Dave [1], Harold [2], and Michael [3] have all expressed a desire to
> > > have the RIF-in-RDF mapping more closely follow the XML syntax.  In
> > > particular, they suggest it use repeated properties instead of
> > > gathering all the values of the properties into a list.
> > >
> > > I'm extremely sympathetic to this desire.  If you look back at the
> > > history of the web page, you'll see this is what my first version did,
> > > and then I stalled out for months as I realized it wouldn't work.
> > > Eventually I decided I just had to go ahead with the list-based
> > > approach that's currently in the document.
> > >
> > > The compelling problem for me is that using repeated properties, as
> > > far as I know, it is not possible to reliably transform a RIF document
> > > using an incomplete reasoner.  I've called this "Requirement 4" in
> > > RIF-in-RDF [4].
> > >
> > > Let me back up and explain what I'm trying to do and why I think it's
> > > important.
> > >
> > > In my talks and writing about RIF to Semantic Web audiences, I explain
> > > that where I think RIF is essential is in data transformation.  With
> > > RIF, we can allow interoperation between vocabularies.  My standard
> > > example is that FOAF has a foaf:name property, and it also has
> > > foaf:firstName and foaf:lastName.  When you're producing FOAF data,
> > > which should you use?  When you're consuming FOAF data, which should
> > > you look for?  In both cases, if you want interoperability, you have
> > > to do both.  When there are only two options, and everyone knows about
> > > them, that's okay.  But what happens when the third, fourth, and fifth
> > > "standard" properties for representing names comes along?  It's a
> > > nightmare; the fact that the producer and consumer are both using RDF
> > > ends up not buying you very much at all.
> > >
> > > But RIF can solve this problem.  By having the ontology documents for
> > > each of terms include some RIF (via rif:importWithProfile), the folks
> > > deploying new properties can express how they map data to alternative
> > > properties.  (In this case, with some string operations.)  Now,
> > > data-consuming systems which implement RIF can automatically get the
> > > data in exactly the vocabulary they want.
> > >
> > > I think this is a very compelling use case.  In fact, without this
> > > mechanism (or an equivalent one) I don't see how the Semantic Web can
> > > work at all.  More recently, I've started using another example (which
> > > I mentioned on a recent telecon), where facebook's Open Graph Protocol
> > > uses RDF with a different style of modeling than most of the Semantic
> > > Web; here, again, RIF can provide interoperability via translation
> > > rules.
> > >
> > > Now, imagine we have this all in place.  Lots of RDF data out there,
> > > using various vocabularies.  When you dereference the terms you find
> > > some RIF that lets you translate between them, so it's all roughly
> > > interoperable.  Of course, not every vocabulary can be mapped; some
> > > aren't well understood enough to formalize, etc.  But many can be
> > > translated.  This allows new vocabularies to be deployed, and the
> > > overall system to grow and evolve in place.
> > >
> > > Now, remember the RIF extensibility requirement?  In the current
> > > design, we met it by providing may-ignore and must-understand
> > > extensions via annotations and new xml elements.  This works, but only
> > > in very broad strokes.  We have no "graceful" fallback.  Extensions
> > > can't offer syntactic sugar, and they certainly can't offer features
> > > which can be approximated.  This mechanism may not be good enough to
> > > allow extensions to really be deployed on the open Web.  We talked
> > > about all this years ago, but decided we didn't have time to work out
> > > all the details, and that it could wait.
> > >
> > > So, as you may have guessed by now, I want to provide RIF
> > > extensibility the same way I want to provide FOAF name extensibility:
> > > with RIF translation (fallback) rules.
> > >
> > > I'll walk through this, below, but here's the punchline: I think it
> > > works fine with the list-style of RIF-in-RDF, but I don't think it can
> > > be done with the repeated-properties style.  This is why I need the
> > > lists.
> > >
> > > I have a few ideas of transformations I want right now...
> > >
> > >   - automatically add universal quantification to free variables
> > >   - extend frames to allow for context/named-graphs (cf Decker's TRIPLE)
> > >   - convert some kinds of rules between PRD and BLD (trading off
> > >     between new() and logic functions)
> > >   - convert logic functions to builtin list operations (I think this
> > >     can be done; not sure) getting more of BLD into Core
> > >   - standard rewritings: get rid of conjunction in rule heads, disjunction
> > >     in rule bodies, Skolemize
> > >   - re-write out named-argument-uniterms
> > >
> > > ... but they're all too complex to use as first illustrations.  For
> > > that I'll use something that ridiculous, but pleasantly simple:
> > >
> > >   - Allow people to use the term my:Conjunction instead of rif:And.   Also,
> > >     use my:conjunct instead of rif:formula inside it.
> > >
> > > Before actually writing the transformation rule, we have to decide
> > > what the transformations are going to look like in RIF.   Some options:
> > >
> > >    1.  in place, new and old, overlapping; the new data (the output)
> > >        is distinguished by using different properties and/or classes.
> > >    2.  copy the whole document, with changes
> > >    3.  ...   maybe some other approaches?
> > >
> > > Let's try (1) first, since it's more terse.  Our input looks like
> > > this:
> > >
> > >       ...
> > >       <if>       <!-- or something else that can have an And in it -->
> > >          <my:Conjunction>
> > >              <my:conjunct>$1</my:conjunct>
> > >              <my:conjunct>$2</my:conjunct>
> > >              ...
> > >          </my:Conjunction>
> > >       </if>
> > >       ...
> > >
> > > and we'll just "replace" the element names.
> > >
> > > However, since we don't have a way to "replace" things in this
> > > "overlapping" style, we'll just add a second <if> property, and the
> > > serializer or consumer will discard this one, since it contains an
> > > element not allowed by the dialect syntax.
> > >
> > > So, the rule will add new triples, but leave the old ones intact.
> > > The rule will leave us with this:
> > >
> > >
> > >       ...
> > >       <if>       <!-- or something else that can have an And in it -->
> > >          <my:Conjunction>
> > >              <my:conjunct>$1</my:conjunct>
> > >              <my:conjunct>$2</my:conjunct>
> > >              ...
> > >          </my:Conjunction>
> > >       </if>
> > >       <if>      <!-- the same property, whatever it was -->
> > >          <And>
> > >              <formula>$1</formula>
> > >              <formula>$2</formula>
> > >              ...
> > >          </And>
> > >       </if>
> > >       ...
> > >
> > > Here's the rule:
> > >
> > >  forall ?parent ?prop ?old ?conjunct ?new
> > >  if And(
> > >    ?parent[?prop->?old]
> > >    my:Conjunction#?old[my:conjunct->?conjunct]
> > >    ?new = wrapped(?old)  <!-- use a logic function to create a new node -->
> > >  ) then And (
> > >    ?parent[?prop->?new]
> > >    rif:And#?new[rif:formula->?conjunct]
> > >  )
> > >
> > > This works fine, as long as the reasoning is complete.  However, if
> > > the reasoning is ever incomplete, we end up with undetectably
> > > incorrect results.  Rules that were "if and(a b c) then d" might get
> > > turned into "if and(a b) then d"!
> > >
> > > I don't think it's sensible to expect reasoners to be complete.  It's
> > > great to have termination conditions arise from the rules; it's not
> > > good to require the reasoner to run until it knows all possible
> > > inferences have been made.  With the above approach, there's no
> > > termination condition other than "make all the inferences possible".
> > >
> > > Alternatively, if we use the list encoding, the rule is very similar:
> > >
> > >  forall ?parent ?prop ?old ?conjuncts ?new
> > >  if And(
> > >    ?parent[?prop->?old]
> > >    my:Conjunction#?old[my:conjuncts->?conjuncts]
> > >    ?new = wrapped(?old)
> > >  ) then And (
> > >    ?parent[?prop->?new]
> > >    rif:And#?new[rif:formulas->?conjuncts]
> > >  )
> > >
> > > ... but now we can set a termination condition: if a RIF document in
> > > the desired dialect *can* be extracted, then you're done.
> > >
> > > A few notes:
> > >
> > >     * I've included the types (like rif:And) for now.  Whether to do
> > >       that is a separate issue (specifically ISSUE-101).
> > >
> > >     * It's okay to have the rules produce multiple valid RIF
> > >       documents; you can stop after generating one, but you can also
> > >       continue.  If there's some kind of weighting on the rules (cf
> > >       XTAN's "impact" mechanism) you can search for a solution that's
> > >       better than some others.  It may be possible to efficiently
> > >       direct this search towards the best solution; I'm not sure.
> > >
> > >     * I don't think the copy-the-whole-document approach to
> > >       translation helps at all.  There, instead of attaching the new
> > >       node to the same parent, we attach it to a new parent, and we
> > >       end up with a whole new tree.  But still, branches of the tree
> > >       are generated by separate rules applications, so an incomplete
> > >       reasoner may produce incomplete (wrong) output trees.
> > >
> > > I think that's it.  I trust y'all will point out any confusing or
> > > incorrect elements of this argument.
> > >
> > >       -- Sandro
> > >
> > > [1] http://lists.w3.org/Archives/Public/public-rif-wg/2010Jul/0015
> > > [2] http://lists.w3.org/Archives/Public/public-rif-wg/2010Jul/0017
> > > [3] http://lists.w3.org/Archives/Public/public-rif-wg/2010Jul/0018
> > > [4] http://www.w3.org/2005/rules/wiki/RIF_In_RDF#Requirements
> > >
> > >
> > >
> > >
> > >
> >
> 
>
Received on Sunday, 25 July 2010 22:53:20 UTC