- From: Chimezie Ogbuji <ogbujic@ccf.org>
- Date: Wed, 30 Sep 2009 13:53:43 -0400
- To: "Birte Glimm" <birte.glimm@comlab.ox.ac.uk>, "SPARQL Working Group" <public-rdf-dawg@w3.org>
Birte. Comments below On 9/29/09 3:18 PM, "Birte Glimm" <birte.glimm@comlab.ox.ac.uk> wrote: > Andy, others, > I would like to resolve the issue about handling inconsistencies, so I > come back to it... > >> From your explanations it is still not completely clear to me, how we > can specify the behavior for inconsistent graphs properly and I would > like to understand the issue better. If I don't understand it well, it > is hard to see all the consequences that the changes have. I am ok > with a text similar to what the OWL spec says for OWL RL, but it is > much more explicit than just saying "A system MAY raise an error if > the queried graph is inconsistent". Such a statement does not satisfy > the conditions that are placed on the entailment regimes by the SPARQL > 1.0 spec. If we have a MAY or a SHOULD (which I much prefer), then we > have to explain how we guarantee finite answer sets because the above > statement leaves it open for systems to just produce infinitely many > answers. Generally speaking, IMO the preferred behavior should be to ensure that the queries are safe (i.e., finite), but that they 'make sense' (an answer that follows trivially because the source data is inconsistent is not a useful answer). I'm unsure if both can be achieved without resorting to a mechanism for reporting errors or warning or by setting levels of compliance (both of which require that you find an inconsistency in the graph) > I would like to consider a simple example. You have a graph containing > :a :b ">"^^rdf:XMLLiteral . > :b rdfs:range rdf:XMLLiteral . > which is the shortest way to state an inconsistency. Assume the query is > ASK { :a rdf:type :c . } Thanks for the example. I have a question (out of ignorance). The RDF MT document only mentioned inconsistency in the context of malformed XML literals, which seems to suggest that checking rdf:XMLLiterals for well-formedness is only thing needed for RDFS inconsistency checking. My understanding of inconsistency (in a model theoretic-sense), however, is that the sentence(s) cannot be satisfied by *any* model/interpretation - which is a significantly harder thing to check. If we are talking about the latter inconsistency then doesn't this apply to *any* entailment regime that is defined in terms of interpretations, models, etc. (i.e., not just RDFS)? > The entailment holds by definition of RDFS entailment. If I go to the > OWL spec [1] and look at OWL RL entailment checkers (I can > equivalently just check whether the graph entails the graph containing > just the triple :a rdf:type :c and the data is leagl OWL RL), then I > can find that an OWL RL entailment checker MUST NOT return false. It > SHOULD return true, but unknown is not excluded and it not necessarily > required to terminate. Giving false as an answer is against the spec > and would mean the system is unsound, which is not nice. I agree. > I can imagine a similar definition as for OWL RL adapted to RDFS and > RDFS entailment, but I think you want to allow false as an answer, > right? Well, I'm not sure. Do we want a system that is 1) unilaterally sound but not safe (i.e., it violates SPARQL 1.0's requirement of finite answers, but gives all the answers that follow from the entailment - without informing the user that the cause of the problem is inconsistency), 2) sound and safe (i.e., it either raises an inconsistency error/warning and returns nothing, or it raises the error/warning and returns an arbitrarily finite set of answers) , or 3) not sound but safe (i.e., disregards the well-formedness)? I think a system that is both sound and safe is more useful for the usecases I have in mind, but I don't see any alternative to finding inconsistencies and informing the user. It could return an arbitrary finite set of answers (which would be sound at least for the answers given since *everything* follows from an inconsistent graph), but this is misleading at the very least. Or, it could fallback to BGP matching & simple entailment, which would also be misleading because that suggests to the user that there are *no* additional answers that follow from rdfs-entailment. Axel suggested in the last teleconference (unless I misunderstood) that an inconsistent RDFS graph could outright be considered not well-formed for the entailment regime but in order to determine this you *still* have to check for inconsistency (even if this is done *before* the query is evaluated) > [...] Now you start loading the triples for that graph and > while you load the data, you see if you can find bindings, so if you > parse :a rdf:type :b, you take x->:a, y->:b as a solution. Because you > don't want to buffer all solutions and let the user wait for them, you > return a solution as soon as you find it [...] I'm not sure if the motivation for the pushback was in order to support a scenario where answers are streamed, but rather due to the general cost of doing checks for inconsistency *before* answering a query using an RDFS entailment regime. > Assuming I am right so far, I can imagine two cases: you find the > inconsistency or you dont. Let's assume first, you find the > inconsistency. What will happen? You have already sent some solutions > to the client. Again, there are different ways to go. You can now > issue a warning and say that there was an inconsistency, but you keep > returning answers as if there was no inconsistency. That will > obviously terminate (result in finitely many answers) and the user > knows that there was an inconsistency that might need fixing (mostly > inconsistencies are unintended). Another question out of ignorance. If the graph *does* have an inconsistency, even if the process ignores it, isn't it the case that the process will *not* terminate? > Now let's assume you find something that would be an inconsistency but > you don't recognise it, so you go through the graph, you apply some > RDFS rule and derive _:1 rdf:type rdf:XMLLiteral and _:1 is assigned > to some mal-formed XML literal, say ">"^^rdf:XMLLiteral. That is > actually the only pattern for RDFS inconsistencies as I understand it. Okay, this seemed to answer my earlier question. So, is it the case then that checking for inconsistency in RDFS is simply a matter of checking for well-formedness for all rdf:XMLLiterals and the fact that everything is a consequence of such an inconsistency is not necessarily reflected by naively applying *all* the entailment rules to exhaustion? > In this case, you don't recognise the inconsistency because you don't > check whether ">" is a valid lexical form. Is that something that you > think can happen and should be allowed under RDFS entailment? In that > case, you would obviously not give infinitely many answers, but you > are incomplete. You possibly return more answers than you would get > from simple entailment, but you also didn't apply all RDFS rules, well > or you ignored that under RDFS entailment rules you have to check > lexical forms of XML literals. If that can happen, then you would most > likely answer the ASK query above with "no", right? Depending on the answers to the questions above, it seems that perhaps what is desirable to avoid a priori inconsistency checking is to only support an extension to SPARQL for a stricter subset of RDFS entailment that didn't include checks for well-formedness in rdf:XMLLiterals (which, as I understand it so far, is the *only* cause for trivially entailed answers) > Now what I am not sure about is, can it happen that you stop giving > answers, but you have not even found a triple such as _:1 rdf:type > rdf:XMLLiteral with _:1 assigned to a mal-formed XML literal? How can > that happen? Do you not apply all (RDFS) rules because you know which > ones do not matter for the query? Do you not apply all (RDFS) rules > because you in general choose to support only a subset of them? Well, if you are still guaranteed to terminate after the application of rule lg, rule gl and the RDF and RDFS entailment rules (even in the face of an XML clash), then you *will* stop, right? -- Chimezie =================================== P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S. News & World Report (2008). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use only by the individual or entity to which it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. Thank you.
Received on Wednesday, 30 September 2009 17:55:02 UTC