Approaching an XML syntax for RIF

Continuing the discussion from November, I suggest that we approach
the XML syntax of RIF Core like this:

   1.  Specify the abstract syntax for RIF Core in asn06.  (This
       abstract syntax specification can also be thought of as an
       ontology of RIF Core rules and as an object model/API for RIF
       implementations.  It's my belief that it's close enough to both
       of those: there may be some little semantic differences, but I'm
       hoping they wont manifest as a real problem.)

   2.  Design a mapping from asn06 to an XML schema language.

   3.  Use the mapping in (2) to turn the asn06 spec (1) into an XML
       schema/grammar for RIF.

I've been working on step 2.   (Mostly I've been writing programs to do
the mapping in step 3, and read and write the resulting XML syntax.)

The basic approach I'm exploring is "Stripe-Skipping".  The name comes
from the observation that a common, brute-force way to serialize objects
in XML is to use alternating "stripes":

 <PurchaseOrder>                        <!-- a class name -->
    <shipTo>                            <!-- a property name -->
      <Address>                         <!-- a class name -->
         <name>Alice Smith</name>       <!-- a property name -->
         <street>123 Maple Street</name>
  ...

The idea of stripe-skipping is to say that we can omit certain
XML elements -- skipping directly to their child elements -- because
they carry only redundant information.   In this case, the "Address"
stripe is redundant if it's known that the range of the "shipTo"
property is "Address".

There are various rules one can use for skipping stripes.  I've done
some experiments over the years, but this week I tried to make a
concerted effort to work out something usable.  I'm currently converging
on these rules:
   
    - skip class stripes when the class is the domain of the
      enveloping property, unless you need the class stripe to gather
      several properties into one XML element (such as for the root of
      the document, or items in a list).

    - skip a property stripe when there can only be one property for
      this class.  (A broader rule would involve selecting a "primary"
      property, or even a sequence of properties, but so far I'm trying
      to keep asn06 a subset of OWL, and I don't think OWL has anything
      like that.)  

    - Never skip a property when its range is the same as one of the
      classes consecutively skipped above it.  (That would introduce an
      ambiguity.)

The results are pretty encouraging, but still preliminary.  I was hoping
to have demo code releasable by last Friday, but it's still not really
working right, and we're at the point where I obviously need to send
this e-mail to allow for discussion on Tuesday.   My sincere apologies
for the delay.

Here's an asn06 specification of the Condition language, with input
from Harold, but he hasn't see this version.

================================================================
    
    default namespace rif = "http://www.w3.org/2007/01/rif#"
    
    class Condition
    
        subclass And
            property formulas : list of Condition
    
        subclass Exists
            property declare : list of Var
            property formula : Condition
    
        subclass Atom
    
        subclass Equals
            property equated : list of Term
    
    class Composite
        property parts : list of Term
    
        subclass Atom
    
        subclass Term
    
    
    class Term
    
        subclass Expr
    
        subclass Var
            property name : xsd:string
    
        subclass Con
            property ref : xsd:anyURI
    
================================================================    


Some issues with this:
    - it calls out the similarity between Atoms and Terms (note the
      multiple inheritance.)  I'm not sure that's a good idea.  In a
      sense Atoms and Terms are very different things.
    - it puts the predicate/function as the first element in a list,
      rather than as a separate property.   This is a coin-toss
      decision, to me.   Sometimes you want to just have it be another
      member of a list, sometimes you want to treat it specially.  It's
      easy enough to convert.   This makes the XML look simpler.
    - it uses short names like "Con" instead of "Constant"
    - it uses lists instead of sets in several places; I believe we'll
      have set semantics, but in practice I think we do care about
      order for roundtripping, so we should keep it in the object
      model. 
    - we made Equals use a list instead of just two elements, to be
      more in-keeping with And, etc.

Harold has updated the XML example on the Positive Conditions page [1]
to match this asn06 declaration.  (My software's not working right now,
so I haven't mechanically checked that it is correct.).  Here it is, as
of right now:

================================================================
 <And>
    <Exists>
      <declare><Var>Buyer</Var></declare>
      <formula>
        <Atom>
          <Con>purchase</Con>
          <Var>Buyer</Var>
          <Var>Seller</Var>
          <Expr>
            <Con>book</Con>
            <Var>Author</Var>
            <Con>LeRif</Con>
          </Expr>
          <Con>$49</Con>
        </Atom>
      </formula>
    </Exists>
    <Equal>
      <Var>Seller</Var>
      <Var>Author</Var>
    </Equal>
  </And>
================================================================

The "Var" tags inside "declare" are an example of a stripe that could be
dynamically skipped but not statically skipped.   That is, it depends on
the instance data -- if there's only one "Var", you could leave out the
"Var" tag.  That kind of dynamic skipping is probably more confusing and
complicated that it's worth -- probably we only want to use skipping
that can be determined by static analysis and encoded into the XML schema.

So, anyway, that XML tree is kind of ugly, but as XML serializations of
objects go, it's pretty nice, I think.

If we adopt this approach, the Working Group still needs to settle on
the details of an asn06 declaration for the RIF Core -- settle the
issues I list above, and any others -- but the rest of the task of
getting to an XML syntax is essentially taken care of.   

Perhaps the biggest benefit, in my mind, is that various RIF extensions
just need to have their syntax expressed in asn06 and a consistent XML
syntax follows.  (And generalized parsers/serializers can be written to
convert between any XML in this pattern and RDF-triples or
property/value objects.  Mine's not working, but it's close enough that
I'm convinced it's doable.)

For comparison, a fully-striped version would look something like this:

 <And>
    <formulas>
       <Exists>
          <declare><Var><name>Buyer</name></Var></declare>
          <formula>
            <Atom>
               <parts>
                   <Con><ref>purchase</ref></Con>
                   <Var><name>Buyer</name></Var>

Or, fully-striped with type data, so that it can be parsed without
knowledge of the schema:

 <And>
    <formulas><List>
       <Exists>
          <declare><Var><name><xsd:string>Buyer</name></Var></declare>
          <formula>
            <Atom>
               <parts><List>
                   <Con><ref><xsd:anyURI>purchase</xsd:anyURI></ref></Con>
                   <Var><name><xsd:string>Buyer</xsd:string></name></Var>

At which point, one is probably better off just using RDF/XML:

 <And>
    <formulas rdf:parseType="Collection">
       <Exists>
          <declare><Var><name>Buyer</name></Var></declare>
          <formula>
            <Atom>
               <parts rdf:parseType="Collection">
                   <Con><ref rdf:datatype="&anyURI;">purchase</ref></Con>
                   <Var><name>Buyer</name></Var>


I guess this is on the agenda for Tuesday.  E-mail comments welcome as
well.

    -- Sandro


[1] http://www.w3.org/2005/rules/wg/wiki/CORE/Conditions/Positive

Received on Monday, 29 January 2007 03:49:44 UTC