- From: Sandro Hawke <sandro@w3.org>
- Date: Sun, 28 Jan 2007 22:49:18 -0500
- To: public-rif-wg@w3.org
Continuing the discussion from November, I suggest that we approach the XML syntax of RIF Core like this: 1. Specify the abstract syntax for RIF Core in asn06. (This abstract syntax specification can also be thought of as an ontology of RIF Core rules and as an object model/API for RIF implementations. It's my belief that it's close enough to both of those: there may be some little semantic differences, but I'm hoping they wont manifest as a real problem.) 2. Design a mapping from asn06 to an XML schema language. 3. Use the mapping in (2) to turn the asn06 spec (1) into an XML schema/grammar for RIF. I've been working on step 2. (Mostly I've been writing programs to do the mapping in step 3, and read and write the resulting XML syntax.) The basic approach I'm exploring is "Stripe-Skipping". The name comes from the observation that a common, brute-force way to serialize objects in XML is to use alternating "stripes": <PurchaseOrder> <!-- a class name --> <shipTo> <!-- a property name --> <Address> <!-- a class name --> <name>Alice Smith</name> <!-- a property name --> <street>123 Maple Street</name> ... The idea of stripe-skipping is to say that we can omit certain XML elements -- skipping directly to their child elements -- because they carry only redundant information. In this case, the "Address" stripe is redundant if it's known that the range of the "shipTo" property is "Address". There are various rules one can use for skipping stripes. I've done some experiments over the years, but this week I tried to make a concerted effort to work out something usable. I'm currently converging on these rules: - skip class stripes when the class is the domain of the enveloping property, unless you need the class stripe to gather several properties into one XML element (such as for the root of the document, or items in a list). - skip a property stripe when there can only be one property for this class. (A broader rule would involve selecting a "primary" property, or even a sequence of properties, but so far I'm trying to keep asn06 a subset of OWL, and I don't think OWL has anything like that.) - Never skip a property when its range is the same as one of the classes consecutively skipped above it. (That would introduce an ambiguity.) The results are pretty encouraging, but still preliminary. I was hoping to have demo code releasable by last Friday, but it's still not really working right, and we're at the point where I obviously need to send this e-mail to allow for discussion on Tuesday. My sincere apologies for the delay. Here's an asn06 specification of the Condition language, with input from Harold, but he hasn't see this version. ================================================================ default namespace rif = "http://www.w3.org/2007/01/rif#" class Condition subclass And property formulas : list of Condition subclass Exists property declare : list of Var property formula : Condition subclass Atom subclass Equals property equated : list of Term class Composite property parts : list of Term subclass Atom subclass Term class Term subclass Expr subclass Var property name : xsd:string subclass Con property ref : xsd:anyURI ================================================================ Some issues with this: - it calls out the similarity between Atoms and Terms (note the multiple inheritance.) I'm not sure that's a good idea. In a sense Atoms and Terms are very different things. - it puts the predicate/function as the first element in a list, rather than as a separate property. This is a coin-toss decision, to me. Sometimes you want to just have it be another member of a list, sometimes you want to treat it specially. It's easy enough to convert. This makes the XML look simpler. - it uses short names like "Con" instead of "Constant" - it uses lists instead of sets in several places; I believe we'll have set semantics, but in practice I think we do care about order for roundtripping, so we should keep it in the object model. - we made Equals use a list instead of just two elements, to be more in-keeping with And, etc. Harold has updated the XML example on the Positive Conditions page [1] to match this asn06 declaration. (My software's not working right now, so I haven't mechanically checked that it is correct.). Here it is, as of right now: ================================================================ <And> <Exists> <declare><Var>Buyer</Var></declare> <formula> <Atom> <Con>purchase</Con> <Var>Buyer</Var> <Var>Seller</Var> <Expr> <Con>book</Con> <Var>Author</Var> <Con>LeRif</Con> </Expr> <Con>$49</Con> </Atom> </formula> </Exists> <Equal> <Var>Seller</Var> <Var>Author</Var> </Equal> </And> ================================================================ The "Var" tags inside "declare" are an example of a stripe that could be dynamically skipped but not statically skipped. That is, it depends on the instance data -- if there's only one "Var", you could leave out the "Var" tag. That kind of dynamic skipping is probably more confusing and complicated that it's worth -- probably we only want to use skipping that can be determined by static analysis and encoded into the XML schema. So, anyway, that XML tree is kind of ugly, but as XML serializations of objects go, it's pretty nice, I think. If we adopt this approach, the Working Group still needs to settle on the details of an asn06 declaration for the RIF Core -- settle the issues I list above, and any others -- but the rest of the task of getting to an XML syntax is essentially taken care of. Perhaps the biggest benefit, in my mind, is that various RIF extensions just need to have their syntax expressed in asn06 and a consistent XML syntax follows. (And generalized parsers/serializers can be written to convert between any XML in this pattern and RDF-triples or property/value objects. Mine's not working, but it's close enough that I'm convinced it's doable.) For comparison, a fully-striped version would look something like this: <And> <formulas> <Exists> <declare><Var><name>Buyer</name></Var></declare> <formula> <Atom> <parts> <Con><ref>purchase</ref></Con> <Var><name>Buyer</name></Var> Or, fully-striped with type data, so that it can be parsed without knowledge of the schema: <And> <formulas><List> <Exists> <declare><Var><name><xsd:string>Buyer</name></Var></declare> <formula> <Atom> <parts><List> <Con><ref><xsd:anyURI>purchase</xsd:anyURI></ref></Con> <Var><name><xsd:string>Buyer</xsd:string></name></Var> At which point, one is probably better off just using RDF/XML: <And> <formulas rdf:parseType="Collection"> <Exists> <declare><Var><name>Buyer</name></Var></declare> <formula> <Atom> <parts rdf:parseType="Collection"> <Con><ref rdf:datatype="&anyURI;">purchase</ref></Con> <Var><name>Buyer</name></Var> I guess this is on the agenda for Tuesday. E-mail comments welcome as well. -- Sandro [1] http://www.w3.org/2005/rules/wg/wiki/CORE/Conditions/Positive
Received on Monday, 29 January 2007 03:49:44 UTC