Re: "entailment regime"?

Sandro Hawke wrote:
> Dave Reynolds <der@hplb.hpl.hp.com> writes:

>> On the telecon we discussed that part of the proposal and I thought we 
>> agreed to change it, that the entailmentRegime should be an attribute of 
>> the ruleset rather than the dataset. At least that's what I was 
>> suggesting and I thought Jos agreed and no one else objected.
> 
> Yeah, I didn't object because I couldn't figure out how to articulate my
> concern.   I thought about it some more, and then sent the e-mail at the
> start of this thread.
> 
>> Does that address your issue?
> 
> I don't think so.
> 
>> As discussed in the parallel thread with Jos and Axel I would prefer to 
>> do that by providing an import mechanism rather than metadata. So that a 
>> rule set where one wanted to process RDF data and assume RDFS semantics 
>> could be something like:
>>
>>     <rif:RuleSet>
>>          <rif:import uri="http://www.w3.org/2007/rif#RDFSruleset.rif" />
>>          ... rules ...
>>     </rif:RuleSet>
>>
>> However, if that were done indirectly via ruleset metadata that would be 
>> OK too (I can make arguments both ways round if you like :-)).
>>
>> If neither ruleset import nor ruleset metadata are acceptable to you 
>> what's the alternative?
> 
> As I understand it, the semantics of an RDF file, like anything else
> with a MIME type, should stand on their own.   To add additional
> parameters affecting the interpretation is like saying "fetch
> http://example.com/bar and interpret it as JPEG, no matter what MIME
> type is received."   (That may actually be what you need to do some
> times, but clearly one "SHOULD NOT" do that.)

Nah, I think it is more like fetching the jpeg, respecting it's 
image/jpeg mime type but saying "but don't bother with incremental 
rendering". You haven't ignored all its semantics but have permitted 
some optional processing to be skipped.

> I've been meaning to poke at that other thread -- let me do it by
> suggesting the particular Web address at which we should publish rules
> that implement RDFS: "http://www.w3.org/2000/01/rdf-schema".   That is,
> we should provide, to the best of our ability, executable semantics for
> RDFS. 
> 
> I'm hearing in that thread that it wont work, but I'm having trouble
> understanding the difficulty.

Well there's a difference between "won't work" and "not a good idea in 
practice". I'm only claiming the latter.

In practice in an application there are many cases where one wants to 
control the amount of entailments to be performed. These include:
   - performance trade-offs
   - termination (kind of an extreme performance trade-off)
   - to do some processing such as validation where the full set of 
entailments would get in the way

In the specific case of RDFS the most clear cut problem area is the 
infinite number of axiomatic triples of the form:

     rdf:_N rdf:type rdfs:ContainerMembershipProperty .
     rdf:_N rdfs:domain rdfs:Resource .
     rdf:_N rdfs:range rdfs:Resource .

A correct and complete ruleset for RDFS would include these. [Clearly it 
couldn't do so as ground facts but could provide rules which deliver 
these on demand.]

Now the problem is that any application which uses this RDFS 
specification must never ask queries such as "what are all the 
properties in this dataset" because the answer will be infinite. Yet 
that is a common and important query.

This also rules out ever using a forward chaining system such as a PR 
engine for interpreting the RDFS ruleset.

Yet those axiomatic facts are fairly useless for most applications.

Systems in practice solve this by either ignoring the 
ContainerMembershipProperties all together or arranging that only the 
rdf:_N which exist in the base data are reported, or (in the case of 
Jena) allowing either at the developer's discretion.

The minimum range of entailments required is a property of the 
application, the assumptions that the rest of the code is going to make. 
It is not a property of the data. Which is why if we are going to 
specify the entailment regime at all it has to either be out-of-band as 
part of the application or associated with the rule set, not with the 
dataset.

So for RIF I see us having 3 options:

(1) Pick a single subset of RDFS semantics which we think is 
sufficiently complete to satisfy most applications of RIF and enforce 
that as the one true way. We don't have the realistic option of having a 
single *complete* RDFS ruleset given the infinite axiom problem so this 
is a blessed subset.

(2) Pick a few subsets of RDFS semantics to encode, capturing the most 
common useful trade-offs. Leave the machinery open for people to specify 
other rulesets. We might in fact just pick one such subset but the point 
is we leave it open to allow applications to pick others.

(3) Provide nothing. Say it is up to the RIF processor to decide how 
much RDFS entailment to apply, it is not a property of a RIF document.

You are arguing for #1, I'm arguing for #2.

Specifically I'm suggesting that we just have a generic "include RIF 
ruleset" feature and provide one or more RIF rulesets capturing RDFS 
semantics (these may or may not be normative). That way any rule set 
publisher can be clear on what RDFS semantics the rule set assumes but 
has the option to assume none or a different subset from any ones that 
the WG blesses.

My reasons for this are:

(a) allows applications to make the performance trade-offs when they 
need to;

(b) because I'm not sure we'll agree on what the one-true subset should 
be. Specifically Harold and others have suggested rhoDF (which is indeed 
the most useful core of RDFS) whereas at least some applications need 
deduction of things like rdfs:member. Arguably it shouldn't be RIF's job 
to bless a single RDFS subset.

(c) because I have use cases for different subsets of entailment, e.g.

    (i) Validation.  One use of rules is publishing data validation 
constraints. This is particularly useful in the RDF world where 
validation is tricky and under supported. Some validation is only 
possible in the absence of certain entailments. For example the rule 
"all things used as an rdfs:Class should be declared as rdf:type 
rdfs:Class" is in practice very useful for identifying errors in RDF 
documents but is meaningless given implied RDFS entailment.

   (ii) Publishing RDFS subsets. One legitimate use of RIF is to publish 
rule-based (i.e. proof theoretic) semantics for RDF specifications 
including various OWL/Tiny subsets that several groups have talked 
about. That might well require control over the minimal RDFS entailments 
to be assumed.

Dave
-- 
Hewlett-Packard Limited
Registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England

Received on Monday, 9 July 2007 14:11:31 UTC