Re: types of conformance from Bijan Parsia on 2007-01-04 (public-rif-wg@w3.org from January 2007)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Thu, 4 Jan 2007 03:13:47 +0000
To: edbark@nist.gov
Cc: W3C RIF WG <public-rif-wg@w3.org>
Message-Id: <6B62EE15-A903-484C-92CB-E5BC97991B28@cs.man.ac.uk>
On Jan 4, 2007, at 1:43 AM, Ed Barkmeyer wrote:

> Bijan,
>
> I think we are largely on the same page.  I need to think more  
> about some parts of this, but I will try to clarify some things now.

Great.

> you wrote:
[snip]
>> Presumably, a document could "conform" to such simply by using  
>> those  features. You mean we should call out specific combinations  
>> as "key"?
>
> A document can conform by containing only those features, and  
> conforming to the representation requirements for them.

Well, I don't distinguish these. :)

> I did mean that, for tools, it is better to have "key combinations"  
> of features that have a high probability of multiple  
> implementations that include all of the features in a specified  
> combination.

The main reason for this, afaict, is to force tools to get up to  
snuff. Conformance is a very useful, if powerful, stick. But  
recognize that someone will get wacked :)

(I blogged about this: <http://clarkparsia.com/weblog/2006/12/30/ 
conformance-conundrums/>.)

>   If we allow arbitrary combinations over a sizeable set (10+) of  
> optional features, we may expect most RIF rulesets to be readable  
> only by the tool that wrote them.

I really don't see why that's the case, especially if the main goal  
is interchange.

If RIF is primarily an interchange format then users will be  
primarily working in *antecedently developed* rule langauges.  
Perhaps, "independently developed" is better. So, they'll use  
whatever that language offers. If they want to port their rules to  
other rule engines they have to do some work to ensure their rules  
are portable (this is as opposed to popping the rules into a third  
party editor or pretty printing....it's *much* easier to "fully  
conform" to even wildly expressive languages for these sorts of tools.)

Now, if we presume, for a moment, that no rule vendor changes their  
language or engine in the next 5 years, then it doesn't matter  
whether we pick the mixnmatch or the By God Conform line...the  
implementation picture is the same. However, *interchange* is better  
supported by mixnmatch. Why? Because users can figure out for each  
engine what it actually supports (somewhat easierly) and a feature  
checker will tell them what features their RIF document uses. Given  
honesty by the vendors in reporting what they handle (or a diligent  
third party), it is straightforward to write a tool which takes a  
rule document, checks the features, then spits out a list of engines  
that can handle your document. It can even diagnosis what rules are  
incompatible with some engines. Heck, a more sophisticated tool could  
help massage the document in a variety of ways.

What's the problem with this? None that I can see. It's not even  
particularly unpleasant or unwieldly.

OTOH, suppose you'd really like all rule engines to at least *try* to  
deal with recursive rules because, oh, you write a lot of them. Then  
it makes perfect sense to define a conformance point which includes  
recursive rules and to make it as "core" as possible since it lets  
you say to the vendor, "We demand RIF Core support!!!" It encourages  
you to do so, actually :)

This can be valuable too. But I don't see it as the sine qua non of  
effective interchange.

>>> The "bottom line" is what I think of as the "core" sublanguage  
>>> --  minimum compliance.  If we have just the "core" and 12   
>>> independently optional features, we will have in fact defined  
>>> 2^12  distinct sublanguages, and in all likelihood they will have  
>>> no  consistently definable semantics.
>> I don't understand this. In most cases, if core + all 12 options   
>> forms a coherent language, which doesn't seem hard to do, then  
>> all  the subsets are coherent as well. (Assuming some sane notion  
>> of  "feature" of course :))
>
> It did seem to me to be hard to do, based on the RuleML experience,

What happened in the RuleML experience?

I'm not saying it's not easy to screw it up, but I'd rather talk in  
concretes rather than in ineffables. Give me some concrete examples  
of conflicting features?

> but I bow to your superior knowledge in this area.  It may also be  
> that we can define a coherent language which defies implementation --

Sure, but so what? If you write crazy rules that won't run on  
anything because, whoa, you went nuts with the expressivity, that's  
your look out. (I do hold the line against semantically incoherent  
langauges. If you can infer paradox from the empty document under  
your semantics, go away! :)) Why would this cause a problem? I  
suppose someone could complain "The UNION of all RIF FEATURE is  
UNIMPLEMENTABLE!!!!!!" (practically speaking). Sure, but when was  
this supposed to matter? If anything, it'd be a *feature* since it  
would emphasize that RIF is not a particular *language* intended to  
be worked in and "fully supported" by vendors, but an exchange  
framework. I guess the difference is that if you define a language,  
you shouldn't pop in features that a user can't reliably use because  
of implementation difficulty. In an exchange framework, let the user  
"exchange" rulesets that aren't (currently) practical to reason over  
is no big deal. Presumably, no user would have such a rule set  
because, well, they couldn't run the rules in the first place. Or if  
they did create such a ruleset (as an experiment), they'd quickly  
learn that such rulesets were hopeless (but they still might be  
useful as an abstract or succinct specification from which they  
derived more practical rulesets).

> if you have all these features, it is extremely difficult to build  
> an engine that correctly processes all reasonable rulesets that use  
> certain combinations of them.

Sure. So? Is that a goal?

>> I see many reasons why one might reject this approach, but the   
>> likihood of a lack of "consistency definable semantics" eludes me   
>> still. Clarify?
>
> I guess my real problem is ensuring that when I create a ruleset  
> and post it on the Web, a potential user can know whether his tool  
> will actually be able to process it as intended.

But that's easy enough. Fire up the tool and if during parsing it  
notices it can't handle it, it goes."Whoops, sorry jack, it uses  
these features and they suck" OWL tools do that *now* at both a  
course grain (species) and fine grain (DL expressivity).

>   In many cases, there may be more than one way for me to construct  
> an effective ruleset for the purpose at hand, but each of these  
> involves different combinations of features.  (If the tool doesn't  
> have X, there is a work-around that uses Y.)  If some sufficient  
> set of features is a defined sublanguage, I will use that set in  
> creating the ruleset.

I fail to see why this will be hard to determine even without the WG  
defining it.

>   But if I have to guess whether more potential users will have  
> tools that support X and Z vs. Y and Q, I have a problem.

Perhaps. But it depends on the actual facts on the ground. I mean,  
let's say you REALLY NEED a feature Q but it's not in the minimum  
core. Big Popular Rule Vendor (BPRV) was *going* to implement Q but  
whiny RIFheads said, "You're not conforming! Add this other stuff"  
and so they punked on Q. Now you have *no* engine, much less the BPRV  
engine, that uses Q. Too bad for you :)

Do these sorts of stories have evidential value? I mean, I can spin  
them all day long.... ;)

> Put another way, I would like to know what features I should NOT  
> use if I don't absolutely need them, because using them reduces the  
> number of implementations that can process my ruleset.

Well, up to the point of successful representation, I think. Market  
pressures will bring some degree of commonality (I don't *really*  
think rule vendors will go nuts....why would they?) Conformance  
points just provide an extra kick. So they are really only worth it  
if you have some particular kick you want :) That or you think you  
can successfully brand some conformance point and you find that  
useful for building a market. (I *like* the relational subset!!!  
Exchanging relational databases seems sexy to me, esp. if you can  
glom it onto XQuery. I doubt any existing customer is demanding that  
(and from that discussion, the current relational db vendors don't  
feel that need), but I think a cool story could be spun by the W3C.
[snip]

So, here's my experience as an OWL implementor. Species were useful  
*both* for forcing me (and others) to implement features (e.g.,  
nominals) *and* for avoiding features (e.g., qualified number  
restrictions, which had high user demand but weren't in OWL, and OWL  
Full which have low user demand, and OWL DL let me get away with not  
implementing it). OWL Lite caused people to distort their KBs in the  
expectation that there would be better or more scalable tools, but  
that didn't happen (mostly because OWL Lite is not in fact really  
expressively restrictive). In Swoop, we implemented fairly fine  
grained expressivity analysis (i.e., mixnmatch) but never restricted  
what could be said. This seemed useful and I can imagine in a market  
with more vendors with more varying coverage that a switch that let  
you set the editor to keep you from using certain features would be  
useful.

For OWL 1.1, we defined some "tractable fragments" (i.e., more  
rational owl lites), but there still some discussion whether we  
should "name" these species to create nice branding or just leave it  
as information for implementors and let the market come up with ways  
to communicate support to the users.

I go various ways on this. A lot depends on the specific market you  
are dealing with I think.

But if you have document conformance, then reasoner conformance is as  
easy under mixnmatch as under named points (assuming a coherent  
overall mix). "A RIF rules engine *implements* a set of RIF features  
iff it implements a sound and complete reasoning strategy wrt the  
logic determined by the set of features in question." Even if some  
features are not compatible with each other, you can indicate that  
pretty easily :)

I'm not at all arguing for the mixnmatch strategy. I just don't share  
your concerns about it. It could make sense to do *both*, i.e.,  
(expansive) named conformance points for document conformance, and  
mixnmatch for implements.

I would suggest going for the least contentious and minimal  
conformance clause possible. It's important to avoid, as Gary put it,  
more "XML formats for expressing my rules that don't interoperate  
with any other rule system".

	<http://lists.w3.org/Archives/Public/public-rif-wg/2006Dec/0076>

He already has a couple. What are they? What's needed to make this  
interoperation happen?

Cheers,
Bijan.
Received on Thursday, 4 January 2007 03:14:10 UTC