Re: [TF-ENT] RDFS entailment regime & inconsistencies from Birte Glimm on 2009-09-30 (public-rdf-dawg@w3.org from July to September 2009)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Wed, 30 Sep 2009 20:03:54 +0100
To: Chimezie Ogbuji <ogbujic@ccf.org>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <492f2b0b0909301203v158290f8hbfab868d4f181c08@mail.gmail.com>
see below...

2009/9/30 Chimezie Ogbuji <ogbujic@ccf.org>:
> Birte.  Comments below
>
> On 9/29/09 3:18 PM, "Birte Glimm" <birte.glimm@comlab.ox.ac.uk> wrote:

[snip]

>> I would like to consider a simple example. You have a graph containing
>> :a :b ">"^^rdf:XMLLiteral .
>> :b rdfs:range rdf:XMLLiteral .
>> which is the shortest way to state an inconsistency. Assume the query is
>> ASK { :a rdf:type :c . }
>
> Thanks for the example. I have a question (out of ignorance).  The RDF MT
> document only mentioned inconsistency in the context of malformed XML
> literals, which seems to suggest that checking rdf:XMLLiterals for
> well-formedness is only thing needed for RDFS inconsistency checking.  My
> understanding of inconsistency (in a model theoretic-sense), however, is
> that the sentence(s) cannot be satisfied by *any* model/interpretation -
> which is a significantly harder thing to check.

Well, yes and no. The thing is that if you have an rdf:XMLLiteral
which has a mal-formed lexical form, then the interpretation has to
interpret that as an element that is not an rdfs:Literal (thus, not an
rdf:XMLLiteral either). This is a bit wired, but that's how it is. Now
your data is still consistent unless you also have a range statement
somewhere that forces the element that you found to be outside of
rdfs:literal to actually be in rdfs:Literal (or in rdf:XMLLiteral). In
my example above, :b has range rdf:MLLiteral and since
">"^^rdf:XMLLiteral is the :b successor of :a, we cannot get away with
interpreting ">" as an element outside of rdfs:Literal. If I hadn't
added the range restriction on :b, however, the data would be
consistent. Obviously, the range can also be stated for a sub-property
of :b and you do need to apply some rdfs rules before you can tell
whether there is an inconsistency or not. Thus, it is not merely a
syntactic check on xml literals, which would probably cause less
concerns for the scalability.

> If we are talking about the latter inconsistency then doesn't this apply to
> *any* entailment regime that is defined in terms of interpretations, models,
> etc. (i.e., not just RDFS)?

Well, for OWL (Lite, DL, EL, QL), for example, mal-formed lexical
forms of literal cause a syntax error according to the spec. In OWL
DL, however, you have many other ways of causing an inconsistency and
doing a consistency check first is what all systems I know of do (you
can collect many useful information during the check ;-)). In OWL QL,
on the other hand and if I recall rightly, you cannot state
inconsistencies, but you can have syntax errors (for literals) or
axioms that are outside of what OWL QL is supposed to handle. This you
do have to check when you parse the data, but it is a syntax check.
After that you are safe. For OWL RL and RIF, I don't know from the top
of my head, but for D-entailment, inconsistencies can arise from
several different constraints on the datatypes (integers and strings
must have disjoint interpretations etc), so there you have the same
problem.

[snip]

>> [...] Now you start loading the triples for that graph and
>> while you load the data, you see if you can find bindings, so if you
>> parse :a rdf:type :b, you take x->:a, y->:b as a solution. Because you
>> don't want to buffer all solutions and let the user wait for them, you
>> return a solution as soon as you find it [...]
>
> I'm not sure if the motivation for the pushback was in order to support a
> scenario where answers are streamed, but rather due to the general cost of
> doing checks for inconsistency *before* answering a query using an RDFS
> entailment regime.

As I understood it, both are concerns. An upfront check is considered
too expensive and streaming of answers is much preferred to buffering
them. As I understand it, you can, however, find an inconsistency
while you are trying to find the answers and at a point where you have
already sent some answers to the client.

>> Assuming I am right so far, I can imagine two cases: you find the
>> inconsistency or you dont. Let's assume first, you find the
>> inconsistency. What will happen? You have already sent some solutions
>> to the client. Again, there are different ways to go. You can now
>> issue a warning and say that there was an inconsistency, but you keep
>> returning answers as if there was no inconsistency. That will
>> obviously terminate (result in finitely many answers) and the user
>> knows that there was an inconsistency that might need fixing (mostly
>> inconsistencies are unintended).
>
> Another question out of ignorance.  If the graph *does* have an
> inconsistency, even if the process ignores it, isn't it the case that the
> process will *not* terminate?

Not necessarily. Most likely it will terminate if it is a decision
procedure. You could just ignore the fact that the literal is
mal-formed, you kind of fix it on the go with something well-formed
(well, interpret it as an xml literal although it is none). In that
case you would do reasoning as if all mal-formed literals had been
repaired and you terminate (if you implemented a decision procedure).

>> Now let's assume you find something that would be an inconsistency but
>> you don't recognise it, so you go through the graph, you apply some
>> RDFS rule and derive _:1 rdf:type rdf:XMLLiteral and _:1 is assigned
>> to some mal-formed XML literal, say ">"^^rdf:XMLLiteral. That is
>> actually the only pattern for RDFS inconsistencies as I understand it.
>
> Okay, this seemed to answer my earlier question.  So, is it the case then
> that checking for inconsistency in RDFS is simply a matter of checking for
> well-formedness for all rdf:XMLLiterals and the fact that everything is a
> consequence of such an inconsistency is not necessarily reflected by naively
> applying *all* the entailment rules to exhaustion?

No, you would either stop because you find the inconsistency or you
treat the mal-formed literals as well-formed. A system would not be
wrong in generating all answers, but it would not be the "natural'
behavior.

>> In this case, you don't recognise the inconsistency because you don't
>> check whether ">" is a valid lexical form. Is that something that you
>> think can happen and should be allowed under RDFS entailment? In that
>> case, you would obviously not give infinitely many answers, but you
>> are incomplete. You possibly return more answers than you would get
>> from simple entailment, but you also didn't apply all RDFS rules, well
>> or you ignored that under RDFS entailment rules you have to check
>> lexical forms of XML literals. If that can happen, then you would most
>> likely answer the ASK query above with "no", right?
>
> Depending on the answers to the questions above, it seems that perhaps what
> is desirable to avoid a priori inconsistency checking is to only support an
> extension to SPARQL for a stricter subset of RDFS entailment that didn't
> include checks for well-formedness in rdf:XMLLiterals (which, as I
> understand it so far, is the *only* cause for trivially entailed answers)

There are several options (non absolutely ideal IMO), you could ( I am
just mentioning some, I am not advocating any) declare mal-formed
literals as syntax errors (against the RDFS spec). You could work on
fragments that do not support range statements over rdfs:Literal and
rdf:XMLLiteral, you can support only some of the RDFS entailment rules
(OWL RL does that for example), you can just live with the strange
behavior that unchecked inconsistencies can give, ...

>> Now what I am not sure about is, can it happen that you stop giving
>> answers, but you have not even found a triple such as _:1 rdf:type
>> rdf:XMLLiteral with _:1 assigned to a mal-formed XML literal? How can
>> that happen? Do you not apply all (RDFS) rules because you know which
>> ones do not matter for the query? Do you not apply all (RDFS) rules
>> because you in general choose to support only a subset of them?
>
> Well, if you are still guaranteed to terminate after the application of rule
> lg, rule gl and the RDF and RDFS entailment rules (even in the face of an
> XML clash), then you *will* stop, right?

Yes, how does it happen that you stopped, but didn't find the clash?
You can not apply all RDFS rules, you might know that some part of the
graph is not relevant for the query so you don't look at that part,
but unfortunately that part contained an inconsistency, or you could
just not validate the literals for well-formedness. That would be my
explanation.

Birte

> -- Chimezie
>
>
> ===================================
>
> P Please consider the environment before printing this e-mail
>
> Cleveland Clinic is ranked one of the top hospitals
> in America by U.S. News & World Report (2008).
> Visit us online at http://www.clevelandclinic.org for
> a complete listing of our services, staff and
> locations.
>
>
> Confidentiality Note:  This message is intended for use
> only by the individual or entity to which it is addressed
> and may contain information that is privileged,
> confidential, and exempt from disclosure under applicable
> law.  If the reader of this message is not the intended
> recipient or the employee or agent responsible for
> delivering the message to the intended recipient, you are
> hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited.  If
> you have received this communication in error,  please
> contact the sender immediately and destroy the material in
> its entirety, whether electronic or hard copy.  Thank you.
>
>



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Wednesday, 30 September 2009 19:04:27 UTC