Re: [TF-ENT] RDFS entailment regime & inconsistencies from Chimezie Ogbuji on 2009-09-30 (public-rdf-dawg@w3.org from July to September 2009)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Wed, 30 Sep 2009 13:53:43 -0400
To: "Birte Glimm" <birte.glimm@comlab.ox.ac.uk>, "SPARQL Working Group" <public-rdf-dawg@w3.org>
Message-ID: <C6E91367.CC50%ogbujic@ccf.org>
Birte.  Comments below

On 9/29/09 3:18 PM, "Birte Glimm" <birte.glimm@comlab.ox.ac.uk> wrote:
> Andy, others,
> I would like to resolve the issue about handling inconsistencies, so I
> come back to it...
> 
>> From your explanations it is still not completely clear to me, how we
> can specify the behavior for inconsistent graphs properly and I would
> like to understand the issue better. If I don't understand it well, it
> is hard to see all the consequences that the changes have. I am ok
> with a text similar to what the OWL spec says for OWL RL, but it is
> much more explicit than just saying "A system MAY raise an error if
> the queried graph is inconsistent". Such a statement does not satisfy
> the conditions that are placed on the entailment regimes by the SPARQL
> 1.0 spec. If we have a MAY or a SHOULD (which I much prefer), then we
> have to explain how we guarantee finite answer sets because the above
> statement leaves it open for systems to just produce infinitely many
> answers.

Generally speaking, IMO the preferred behavior should be to ensure that the
queries are safe (i.e., finite), but that they 'make sense' (an answer that
follows trivially because the source data is inconsistent is not a useful
answer).  I'm unsure if both can be achieved without resorting to a
mechanism for reporting errors or warning or by setting levels of compliance
(both of which require that you find an inconsistency in the graph)
 
> I would like to consider a simple example. You have a graph containing
> :a :b ">"^^rdf:XMLLiteral .
> :b rdfs:range rdf:XMLLiteral .
> which is the shortest way to state an inconsistency. Assume the query is
> ASK { :a rdf:type :c . }

Thanks for the example. I have a question (out of ignorance).  The RDF MT
document only mentioned inconsistency in the context of malformed XML
literals, which seems to suggest that checking rdf:XMLLiterals for
well-formedness is only thing needed for RDFS inconsistency checking.  My
understanding of inconsistency (in a model theoretic-sense), however, is
that the sentence(s) cannot be satisfied by *any* model/interpretation -
which is a significantly harder thing to check.

If we are talking about the latter inconsistency then doesn't this apply to
*any* entailment regime that is defined in terms of interpretations, models,
etc. (i.e., not just RDFS)?

> The entailment holds by definition of RDFS entailment. If I go to the
> OWL spec [1] and look at OWL RL entailment checkers (I can
> equivalently just check whether the graph entails the graph containing
> just the triple :a rdf:type :c and the data is leagl OWL RL), then I
> can find that an OWL RL entailment checker MUST NOT return false. It
> SHOULD return true, but unknown is not excluded and it not necessarily
> required to terminate. Giving false as an answer is against the spec
> and would mean the system is unsound, which is not nice.

I agree.

> I can imagine a similar definition as for OWL RL adapted to RDFS and
> RDFS entailment, but I think you want to allow false as an answer,
> right?

Well, I'm not sure.  Do we want a system that is 1) unilaterally sound but
not safe (i.e., it violates SPARQL 1.0's requirement of finite answers, but
gives all the answers that follow from the entailment - without informing
the user that the cause of the problem is inconsistency), 2) sound and safe
(i.e., it either raises an inconsistency error/warning and returns nothing,
or it raises the error/warning and returns an arbitrarily finite set of
answers) , or 3) not sound but safe (i.e., disregards the well-formedness)?

I think a system that is both sound and safe is more useful for the usecases
I have in mind, but I don't see any alternative to finding inconsistencies
and informing the user.  It could return an arbitrary finite set of answers
(which would be sound at least for the answers given since *everything*
follows from an inconsistent graph), but this is misleading at the very
least. Or, it could fallback to BGP matching & simple entailment, which
would also be misleading because that suggests to the user that there are
*no* additional answers that follow from rdfs-entailment.

Axel suggested in the last teleconference (unless I misunderstood) that an
inconsistent RDFS graph could outright be considered not well-formed for the
entailment regime but in order to determine this you *still* have to check
for inconsistency (even if this is done *before* the query is evaluated)
 
> [...] Now you start loading the triples for that graph and
> while you load the data, you see if you can find bindings, so if you
> parse :a rdf:type :b, you take x->:a, y->:b as a solution. Because you
> don't want to buffer all solutions and let the user wait for them, you
> return a solution as soon as you find it [...]

I'm not sure if the motivation for the pushback was in order to support a
scenario where answers are streamed, but rather due to the general cost of
doing checks for inconsistency *before* answering a query using an RDFS
entailment regime.
 
> Assuming I am right so far, I can imagine two cases: you find the
> inconsistency or you dont. Let's assume first, you find the
> inconsistency. What will happen? You have already sent some solutions
> to the client. Again, there are different ways to go. You can now
> issue a warning and say that there was an inconsistency, but you keep
> returning answers as if there was no inconsistency. That will
> obviously terminate (result in finitely many answers) and the user
> knows that there was an inconsistency that might need fixing (mostly
> inconsistencies are unintended).

Another question out of ignorance.  If the graph *does* have an
inconsistency, even if the process ignores it, isn't it the case that the
process will *not* terminate?

> Now let's assume you find something that would be an inconsistency but
> you don't recognise it, so you go through the graph, you apply some
> RDFS rule and derive _:1 rdf:type rdf:XMLLiteral and _:1 is assigned
> to some mal-formed XML literal, say ">"^^rdf:XMLLiteral. That is
> actually the only pattern for RDFS inconsistencies as I understand it.

Okay, this seemed to answer my earlier question.  So, is it the case then
that checking for inconsistency in RDFS is simply a matter of checking for
well-formedness for all rdf:XMLLiterals and the fact that everything is a
consequence of such an inconsistency is not necessarily reflected by naively
applying *all* the entailment rules to exhaustion?

> In this case, you don't recognise the inconsistency because you don't
> check whether ">" is a valid lexical form. Is that something that you
> think can happen and should be allowed under RDFS entailment? In that
> case, you would obviously not give infinitely many answers, but you
> are incomplete. You possibly return more answers than you would get
> from simple entailment, but you also didn't apply all RDFS rules, well
> or you ignored that under RDFS entailment rules you have to check
> lexical forms of XML literals. If that can happen, then you would most
> likely answer the ASK query above with "no", right?

Depending on the answers to the questions above, it seems that perhaps what
is desirable to avoid a priori inconsistency checking is to only support an
extension to SPARQL for a stricter subset of RDFS entailment that didn't
include checks for well-formedness in rdf:XMLLiterals (which, as I
understand it so far, is the *only* cause for trivially entailed answers)

> Now what I am not sure about is, can it happen that you stop giving
> answers, but you have not even found a triple such as _:1 rdf:type
> rdf:XMLLiteral with _:1 assigned to a mal-formed XML literal? How can
> that happen? Do you not apply all (RDFS) rules because you know which
> ones do not matter for the query? Do you not apply all (RDFS) rules
> because you in general choose to support only a subset of them?

Well, if you are still guaranteed to terminate after the application of rule
lg, rule gl and the RDF and RDFS entailment rules (even in the face of an
XML clash), then you *will* stop, right?

-- Chimezie


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2008).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
Received on Wednesday, 30 September 2009 17:55:02 UTC