RIF-in-RDF: Requirement 4

Dave [1], Harold [2], and Michael [3] have all expressed a desire to
have the RIF-in-RDF mapping more closely follow the XML syntax.  In
particular, they suggest it use repeated properties instead of
gathering all the values of the properties into a list.

I'm extremely sympathetic to this desire.  If you look back at the
history of the web page, you'll see this is what my first version did,
and then I stalled out for months as I realized it wouldn't work.
Eventually I decided I just had to go ahead with the list-based
approach that's currently in the document.  

The compelling problem for me is that using repeated properties, as
far as I know, it is not possible to reliably transform a RIF document
using an incomplete reasoner.  I've called this "Requirement 4" in
RIF-in-RDF [4].

Let me back up and explain what I'm trying to do and why I think it's
important.

In my talks and writing about RIF to Semantic Web audiences, I explain
that where I think RIF is essential is in data transformation.  With
RIF, we can allow interoperation between vocabularies.  My standard
example is that FOAF has a foaf:name property, and it also has
foaf:firstName and foaf:lastName.  When you're producing FOAF data,
which should you use?  When you're consuming FOAF data, which should
you look for?  In both cases, if you want interoperability, you have
to do both.  When there are only two options, and everyone knows about
them, that's okay.  But what happens when the third, fourth, and fifth
"standard" properties for representing names comes along?  It's a
nightmare; the fact that the producer and consumer are both using RDF
ends up not buying you very much at all.

But RIF can solve this problem.  By having the ontology documents for
each of terms include some RIF (via rif:importWithProfile), the folks
deploying new properties can express how they map data to alternative
properties.  (In this case, with some string operations.)  Now,
data-consuming systems which implement RIF can automatically get the
data in exactly the vocabulary they want.

I think this is a very compelling use case.  In fact, without this
mechanism (or an equivalent one) I don't see how the Semantic Web can
work at all.  More recently, I've started using another example (which
I mentioned on a recent telecon), where facebook's Open Graph Protocol
uses RDF with a different style of modeling than most of the Semantic
Web; here, again, RIF can provide interoperability via translation
rules.

Now, imagine we have this all in place.  Lots of RDF data out there,
using various vocabularies.  When you dereference the terms you find
some RIF that lets you translate between them, so it's all roughly
interoperable.  Of course, not every vocabulary can be mapped; some
aren't well understood enough to formalize, etc.  But many can be
translated.  This allows new vocabularies to be deployed, and the
overall system to grow and evolve in place.

Now, remember the RIF extensibility requirement?  In the current
design, we met it by providing may-ignore and must-understand
extensions via annotations and new xml elements.  This works, but only
in very broad strokes.  We have no "graceful" fallback.  Extensions
can't offer syntactic sugar, and they certainly can't offer features
which can be approximated.  This mechanism may not be good enough to
allow extensions to really be deployed on the open Web.  We talked
about all this years ago, but decided we didn't have time to work out
all the details, and that it could wait.

So, as you may have guessed by now, I want to provide RIF
extensibility the same way I want to provide FOAF name extensibility:
with RIF translation (fallback) rules.

I'll walk through this, below, but here's the punchline: I think it
works fine with the list-style of RIF-in-RDF, but I don't think it can
be done with the repeated-properties style.  This is why I need the
lists.

I have a few ideas of transformations I want right now...

  - automatically add universal quantification to free variables
  - extend frames to allow for context/named-graphs (cf Decker's TRIPLE)
  - convert some kinds of rules between PRD and BLD (trading off
    between new() and logic functions)
  - convert logic functions to builtin list operations (I think this
    can be done; not sure) getting more of BLD into Core
  - standard rewritings: get rid of conjunction in rule heads, disjunction
    in rule bodies, Skolemize
  - re-write out named-argument-uniterms

... but they're all too complex to use as first illustrations.  For
that I'll use something that ridiculous, but pleasantly simple:

  - Allow people to use the term my:Conjunction instead of rif:And.   Also,
    use my:conjunct instead of rif:formula inside it.

Before actually writing the transformation rule, we have to decide
what the transformations are going to look like in RIF.   Some options:

   1.  in place, new and old, overlapping; the new data (the output)
       is distinguished by using different properties and/or classes.
   2.  copy the whole document, with changes
   3.  ...   maybe some other approaches?

Let's try (1) first, since it's more terse.  Our input looks like
this:

      ...
      <if>       <!-- or something else that can have an And in it -->
         <my:Conjunction>
             <my:conjunct>$1</my:conjunct>
             <my:conjunct>$2</my:conjunct>
             ...
         </my:Conjunction>
      </if>
      ...

and we'll just "replace" the element names.

However, since we don't have a way to "replace" things in this
"overlapping" style, we'll just add a second <if> property, and the
serializer or consumer will discard this one, since it contains an
element not allowed by the dialect syntax.   

So, the rule will add new triples, but leave the old ones intact.
The rule will leave us with this:


      ...
      <if>       <!-- or something else that can have an And in it -->
         <my:Conjunction>
             <my:conjunct>$1</my:conjunct>
             <my:conjunct>$2</my:conjunct>
             ...
         </my:Conjunction>
      </if>
      <if>      <!-- the same property, whatever it was -->
         <And>
             <formula>$1</formula>
             <formula>$2</formula>
             ...
         </And>
      </if>
      ...

Here's the rule:

 forall ?parent ?prop ?old ?conjunct ?new
 if And( 
   ?parent[?prop->?old]
   my:Conjunction#?old[my:conjunct->?conjunct]
   ?new = wrapped(?old)  <!-- use a logic function to create a new node -->
 ) then And (
   ?parent[?prop->?new]
   rif:And#?new[rif:formula->?conjunct]
 )

This works fine, as long as the reasoning is complete.  However, if
the reasoning is ever incomplete, we end up with undetectably
incorrect results.  Rules that were "if and(a b c) then d" might get
turned into "if and(a b) then d"!   

I don't think it's sensible to expect reasoners to be complete.  It's
great to have termination conditions arise from the rules; it's not
good to require the reasoner to run until it knows all possible
inferences have been made.  With the above approach, there's no
termination condition other than "make all the inferences possible".

Alternatively, if we use the list encoding, the rule is very similar:

 forall ?parent ?prop ?old ?conjuncts ?new
 if And( 
   ?parent[?prop->?old]
   my:Conjunction#?old[my:conjuncts->?conjuncts]
   ?new = wrapped(?old)
 ) then And (
   ?parent[?prop->?new]
   rif:And#?new[rif:formulas->?conjuncts]
 )

... but now we can set a termination condition: if a RIF document in
the desired dialect *can* be extracted, then you're done.

A few notes:

    * I've included the types (like rif:And) for now.  Whether to do
      that is a separate issue (specifically ISSUE-101).

    * It's okay to have the rules produce multiple valid RIF
      documents; you can stop after generating one, but you can also
      continue.  If there's some kind of weighting on the rules (cf
      XTAN's "impact" mechanism) you can search for a solution that's
      better than some others.  It may be possible to efficiently
      direct this search towards the best solution; I'm not sure.

    * I don't think the copy-the-whole-document approach to
      translation helps at all.  There, instead of attaching the new
      node to the same parent, we attach it to a new parent, and we
      end up with a whole new tree.  But still, branches of the tree
      are generated by separate rules applications, so an incomplete
      reasoner may produce incomplete (wrong) output trees.

I think that's it.  I trust y'all will point out any confusing or
incorrect elements of this argument.

      -- Sandro

[1] http://lists.w3.org/Archives/Public/public-rif-wg/2010Jul/0015
[2] http://lists.w3.org/Archives/Public/public-rif-wg/2010Jul/0017
[3] http://lists.w3.org/Archives/Public/public-rif-wg/2010Jul/0018
[4] http://www.w3.org/2005/rules/wiki/RIF_In_RDF#Requirements

Received on Sunday, 25 July 2010 20:50:03 UTC