Re: "entailment regime"? from Sandro Hawke on 2007-07-08 (public-rif-wg@w3.org from July 2007)

From: Sandro Hawke <sandro@w3.org>
Date: Sun, 08 Jul 2007 18:55:34 -0400
To: Bijan Parsia <bparsia@cs.man.ac.uk>
Cc: public-rif-wg@w3.org
Message-ID: <31593.1183935334@ubuhebe>
[This is kind of far afield from mainstream RIF, but in my head it's
deeply linked to the extensibility design, so I'm keeping it on-list.]

Let me start with a comment on where we are with respect to standards on
this issue of what entailment regime one should use with an RDF
document.  It seems to me there are three options here:

   1.  The semantics (the entailment regime) can be determined 
       by looking at just the triples/message-content/document.

   2.  The semantics (the entailment regime) can be determined by
       looking at the triples/message-content/document *and* the
       envelope information (the metadata headers transmitted along
       with the content)

   3.  The semantics (the entailment regime) cannot be determined in
       any standard way.

Unfortunately, no W3C Recommendation gives us a standard for doing
either Option 1 or Option 2, so we're stuck in the land of Option 3.
I believe this is due, historically, to it being hard to establish a
consensus around either Option 1 or Option 2.

The lack of proper subset-layering between RDF, RDFS, OWL-DL, and
OWL-Full makes it impossible to just use Option 1.  The lack of any
defined header for indicating the semantics makes it impossible to
just use Option 2.  What I hear people doing, in practice, is either
agreeing privately on which semantics they use (which doesn't scale at
all), or staying away from the parts of the languages which are not
properly subset-layered.  I suggest the latter option, an ad hoc use
of Option 1.

Again, I favor Option 1 over Option 2 because of how it supports
extensibility.  I recognize that neither approach is standard, and I
lament that I don't know when either one might ever be, for RDF.  

That said, I'll try some point-by point comments...

Bijan Parsia <bparsia@cs.man.ac.uk> writes:
> On 4 Jul 2007, at 15:33, Sandro Hawke wrote:
> 
> > (This message is likely to be controversial among the RDF folks, but I
> > think it's important.  It may also not be something we can agree on --
> > other Semantic Web working groups have been challenged by this -- but
> > let me at least try to make the case.)
> >
> > In http://www.w3.org/2005/rules/wg/wiki/Arch/RDF, Jos writes:
> >> Furthermore, this data set has a particular entailment regime
> >> associated with it (e.g. simple, RDF, RDFS).
> >>
> >> rif:rdfEntailmentRegime a rdf:Property ;
> >>     rdfs:domain rif:RuleSet ;
> >>     rdfs:range  rif:RDFEntailmentRegime .
> >
> > If I understand this approach correctly, I'm afraid I have to disagree
> > with it.  RDF entailment regimes should *not* be specified.  It's a  
> > key
> > element of the architecture of the Semantic Web that semantics are
> > implied by the vocabulary in use,
> 
> If it is, then it's violated by RDF and OWL and poor practice in a  
> number of cases (e.g., why shouldn't I be able to query an OWL  
> document under simple entailment only?)

I don't think I understand.   You seem to be saying two unrelated
things:

    1 -- I think you're saying that layering is violated by RDF and OWL
         -- yes, I agree it is.  It's a shame; but I don't think the
         mismatches in layering are features users want, and I think
         they can be fixed.

    2 -- Why shouldn't you be able to query an OWL document under simple
         entailment only?  Of course you *can* do that, but if you do
         it, you are taking on certain responsibilities.  It's like
         running a Language-Version-6 program on a Language-Version-5
         interpreter.  You may get different results (including subtle
         errors, or compilation errors), or you may get exactly the right
         results, depending on the program.  But if you get errors, you
         probably can't blame the person who wrote the program.  It's
         your responsibility because you interpreted the content in the
         wrong language.

> > The idea, though, is that you use all the entailment regimes which are
> > defined for the vocabulary in use.
> 
> Since this is not true in theory or in practice, I think it cannot be  
> a part of the semantic web *architecture*, esp. not a key part. You  
> could propose it as part of it, but that's a bit different.

Maybe it's best to avoid this kind of discussion.  Sorry for starting
it.  As far as I understand the Semantic Web, this is a key part of the
architecture.   But I know other people see it differently.   It's
probably better to argue from use cases.

> (Note: A consequence of this requirement is that I cannot correctly  
> use an RDF entailment based reasoner on an OWL document...which seems  
> to strongly contradict the "process what you can" model.

As I said, the fact that RDF entailment is not a subset of OWL
entailment is something I consider a bug.  Fortunately, it seems to be
fairly harmless bug, and the "process what you can" model doesn't seem
to actually get people into trouble.

> I guess you could argue that the *semantics* are implied, but the  
> *processing* can vary, but I think that certainly isn't in most  
> people's head :))
> 
> >   So for an RDF/XML document that uses
> > no RDFS or OWL terms, RDF entailment applies.  If you use RDFS terms,
> > RDFS entailment applies.  If you use OWL terms, OWL entailment  
> > applies.
> >
> > At a high enough level, this is equivalent to just using all the
> > entailment regimes you can.
> 
> ? I don't see why "can" or "cannot" have to do with it. Once you make  
> it a requirement, then arguably I should *punt* if I "can't"  
> understand the RDFS terms.

This has to do with forward compatibility.  No, you should not punt.
You should do the minimal-impact fallback, which in the case of RDF is
to ignore the semantics of the triples you don't understand.   

> If I don't have to punt, one reason to  
> *not* to use RDFS entailment even when I could is to interoperate  
> with a system that doesn't understand the RDFS tools. (And what do we  
> mean by "can"?) Since I already would have to specify what sort of  
> entailment to use out of band, what's the issue?

But, as I say, you should not specify the entailment out of band.  That
would be Option 2, above, and that does not allow for independent
extensions.

> Plus this is similar to content sniffing which in some lights is a no- 
> no and in other lights a good thing :) Isn't mime-type a way to  
> override? Should I always interpret <html></html> as html even if i  
> retrieve it with mimetype text/plain?

Content-sniffing is bad if it is not allowed by the spec for the given
mime type.  If it is allowed (or mandated), then it's fine.  (Arguably,
it's no longer content-sniffing, it's just another level of
indirection.)

> (Or consider validating a document with different schemas....they  
> could, for example, add different defautl
> 
> > I know for people concerned with
> > theoretical properties of logics, this is painful,
> 
> It is?
> 
> > and it does present
> > some challenges.  But if there's a case where this approach is a real
> > problem to users, I'd like to hear about it.
> 
> It's not the worst default in a lot of cases, but I don't think it's  
> a panacea.
> 
> > (Among the flaws in the current specs, I realise, is that there are
> > OWL-Full entailments for a OWL-DL graph which are not OWL-DL
> > entailments.  That's a bug, though, and will hopefully be fixed some
> > day, by getting the semantics fully aligned.)
> 
> Even that wouldn't be *vocabulary* dispatch, but *syntax*  (i.e., use  
> of vocabulary) dispatch.

Yes -- I was oversimplifying.  In the general case, the guiding
principal is that multiple languages should overlap in syntax where they
overlap in semantics.  If a document parses as if written in some
language, then you can interpret it as being in that language, even if
its writer was thinking of a totally different language.  In general, I
think it's a good idea to use new vocabulary for new features, but it
may be that we have to dispatch on syntactic construct matching as well.
(And in OWL-DL vs. OWL-Full, that is the case.  I'm hoping we can avoid
it in RIF and in the future in RDF.  I don't know any way to implement
syntax dispatching efficiently.)

> It's worse than that, consider:
> 
> 	:P rdfs:subPropertyOf :R.
> 	:R rdfs:range :C.
> 
> is it entailed that
> 	:P rdfs:range :C?
> 
> under RDFS semantics? (No! Yeek!) Yet is entailed under owl semantics  
> (including owl full).
> 
> In general, I favor giving people the ability to directly specify the  
> intended semantics and to being able to override that as they see  
> fit. So, for example, I think it'd be great if I could publish an  
> rdfs ontology at a sparql endpoint, have in the ontology that I  
> intend the owl semantics, and to let the user request that the query  
> be evaluated under the rdfs semantics. The endpoint might not be able  
> to respect that request, but I see no harm in having endpoints that do.

I'm not sure I understand this use case.  

I certainly do see that when users have some control over a reasoner
(such as the sparql endpoint you're talking about) they will sometimes
want to guide the kind of reasoning used and get feedback about what has
been done.  If the RDFS/OWL logics were layered properly, one could say
"please do reasoning up to OWL-DL" and the system could respond first
with the ground entailements (and let you know when it's done with
them), then the RDFS entailments (and let you know it's done with them),
and finally with the OWL-DL entailments.  As a user, I think I'd like
that.

On the other hand, as a user, I'd hate to have to guess which entailment
regime is the right one.  There's a terrible failure mode, where I'm
trying to learn something -- maybe the room number for the room where
I'm supposed to lecture tonight -- and when I ask the room-reservation
sparql endpoint, I have to guess which entailment regime it should use.
And, what's worse, I try two different guesses, and I get two different
room numbers!  How can I know which is the right one?  (It's a bit like
a one-time-pad in computer security.  I can't possibly tell.)  Maybe in
the room-number example this is unlikely, but I hope this illustrates
the problem.

> Oh, if I have a statement "I am to be understood under owl semantics"  
> isn't that vocabulary? In use? :)

Hmm.  That kind of thing might work in some cases not others.  The
problem I see is that the graph with that triple RDF-entails the graph
without that triple, so in some cases (maybe not this example) I think
you'd have incoherent combined semantics.   The vocabulary-in-use
approach doesn't seem to have that problem.

    -- Sandro
Received on Sunday, 8 July 2007 22:56:43 UTC