[TF-ENT] RDFS entailment regime & inconsistencies from Birte Glimm on 2009-09-29 (public-rdf-dawg@w3.org from July to September 2009)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Tue, 29 Sep 2009 20:18:38 +0100
To: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <492f2b0b0909291218n373de429m7edec7b8ed54b762@mail.gmail.com>
Andy, others,
I would like to resolve the issue about handling inconsistencies, so I
come back to it...

>From your explanations it is still not completely clear to me, how we
can specify the behavior for inconsistent graphs properly and I would
like to understand the issue better. If I don't understand it well, it
is hard to see all the consequences that the changes have. I am ok
with a text similar to what the OWL spec says for OWL RL, but it is
much more explicit than just saying "A system MAY raise an error if
the queried graph is inconsistent". Such a statement does not satisfy
the conditions that are placed on the entailment regimes by the SPARQL
1.0 spec. If we have a MAY or a SHOULD (which I much prefer), then we
have to explain how we guarantee finite answer sets because the above
statement leaves it open for systems to just produce infinitely many
answers.

I would like to consider a simple example. You have a graph containing
:a :b ">"^^rdf:XMLLiteral .
:b rdfs:range rdf:XMLLiteral .
which is the shortest way to state an inconsistency. Assume the query is
ASK { :a rdf:type :c . }

The entailment holds by definition of RDFS entailment. If I go to the
OWL spec [1] and look at OWL RL entailment checkers (I can
equivalently just check whether the graph entails the graph containing
just the triple :a rdf:type :c and the data is leagl OWL RL), then I
can find that an OWL RL entailment checker MUST NOT return false. It
SHOULD return true, but unknown is not excluded and it not necessarily
required to terminate. Giving false as an answer is against the spec
and would mean the system is unsound, which is not nice.

I can imagine a similar definition as for OWL RL adapted to RDFS and
RDFS entailment, but I think you want to allow false as an answer,
right?

Now I try to understand how the system you have in mind would work and
please tell me if I am wrong: You are given a graph (some URI of it)
that you have not loaded and you have not seen before, so you don't
know what data it contains and you don't know that it contains an
inconsistency such as the one above. Lets also assume, for simplicity,
that your query just contains a single triple pattern, e.g., ?x
rdf:type ?y. Now you start loading the triples for that graph and
while you load the data, you see if you can find bindings, so if you
parse :a rdf:type :b, you take x->:a, y->:b as a solution. Because you
don't want to buffer all solutions and let the user wait for them, you
return a solution as soon as you find it. Now while you are loading
the data (or once you have loaded it), you also apply some rules
(after all we use the RDFS entailment regime), e.g., you parse b:
rdfs:subClassOf :c, you know that x->:a, y->:c is a solution and you
return it. Am I right so far?

Assuming I am right so far, I can imagine two cases: you find the
inconsistency or you dont. Let's assume first, you find the
inconsistency. What will happen? You have already sent some solutions
to the client. Again, there are different ways to go. You can now
issue a warning and say that there was an inconsistency, but you keep
returning answers as if there was no inconsistency. That will
obviously terminate (result in finitely many answers) and the user
knows that there was an inconsistency that might need fixing (mostly
inconsistencies are unintended). You could also not tell the user at
all, but that is not nice I think. You can also raise an error at that
point and say that the graph contains an inconsistency and that all
you said before is entailed, but you will stop giving more answers
because everything is trivially entailed. Am i right so far? Any
strong preferences?

Now let's assume you find something that would be an inconsistency but
you don't recognise it, so you go through the graph, you apply some
RDFS rule and derive _:1 rdf:type rdf:XMLLiteral and _:1 is assigned
to some mal-formed XML literal, say ">"^^rdf:XMLLiteral. That is
actually the only pattern for RDFS inconsistencies as I understand it.
In this case, you don't recognise the inconsistency because you don't
check whether ">" is a valid lexical form. Is that something that you
think can happen and should be allowed under RDFS entailment? In that
case, you would obviously not give infinitely many answers, but you
are incomplete. You possibly return more answers than you would get
from simple entailment, but you also didn't apply all RDFS rules, well
or you ignored that under RDFS entailment rules you have to check
lexical forms of XML literals. If that can happen, then you would most
likely answer the ASK query above with "no", right?

Now what I am not sure about is, can it happen that you stop giving
answers, but you have not even found a triple such as _:1 rdf:type
rdf:XMLLiteral with _:1 assigned to a mal-formed XML literal? How can
that happen? Do you not apply all (RDFS) rules because you know which
ones do not matter for the query? Do you not apply all (RDFS) rules
because you in general choose to support only a subset of them?

Cheers,
Birte

[1] http://www.w3.org/2007/OWL/wiki/Conformance#Entailment_Checker

-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Tuesday, 29 September 2009 19:19:14 UTC