Re: ISSUE: Inconsistent graphs (and illformed literals) from Bijan Parsia on 2006-08-18 (public-rdf-dawg@w3.org from July to September 2006)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Fri, 18 Aug 2006 05:29:49 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <57A0E064-402E-4C55-84B3-34C3CFB6A9EB@cs.man.ac.uk>
On Aug 17, 2006, at 7:24 AM, Pat Hayes wrote:
	[At some point, I wrote, but Pat snipped the attribution line :)  
Stop doing that, Pat!]
>> There are two additional choices:
>>
>> 	1) An inconsistent graph returns no answers
>> 	2) A query on an inconsistent graph returns an error, oh,  
>> "inconsistent graph".
>>
>> I personally prefer the latter.
[snip]
>> Note, just as an aside, I think this shows that the original idea  
>> of just doing graph matching and then doing graph matching against  
>> "virtual graphs" is not, by itself, a sufficient way of  
>> specification, even aside from the other problems it has with  
>> scaling to OWL. There are other side conditions that one must check.
>>
>> Oh, there is a third choice, though I tend not to take it seriously:
>> 	3) An inconsistent graph returns the set of answers that graph  
>> matching against a graph generated by forward chaining application  
>> of the entailment rules + some hacking to avoid BNode proliferation.
>
> Well, 'hacking' is tendentious.

It is?

> We have to do some 'hacking' to do this in *any* scheme:

Yes. So? That seems true to me.

> all the elaborate machinery of scoping sets and so on in the E- 
> entailment definition is exactly this kind of 'hacking',

Do you think "hacking" is a derogatory term? It generally suggests  
some inelegance, but "elegant hack" is not an oxymoron.

> phrased in mathematical terminology. I think this way of phrasing  
> the conditions is actually quite useful and effective when it can  
> be used, and should be taken more seriously.

Hacks are often useful, effective and taken seriously. Ok, this is a  
sideline.

>> 3 can be used as a way to detect contradictions, since there is a  
>> (large, disjuncitve) query that should test for datatype clashs.  
>> But clearly this particular incompleteness isn't sanctioned by the  
>> semantics, and we have to be *very* careful about specifying the  
>> rules and how they are applied to assure interoperability.
>>
>> It wouldn't be that hard to come up with a paraconsistent reading  
>> of all this so as to get some version of 3 (i.e., "useful answers"  
>> out of the graph). For example, we could sanction all answers  
>> following from every maximal consistent subgraph of the  
>> inconsistent graph.
>
> Ah, indeed we could :-).
>
>>  It would still be good to distinguish between some answers that  
>> don't follow because the triples *aren't there* and some answers  
>> not following because they depend on *inconsistent* triples.
>
> Well, we could return a binding to a sparql:badLiteral URI to  
> signal the presence of the error. This does not cover all possible  
> cases, but it will cover most of the actual cases. These  
> inconsistencies are all of a very special kind and their source can  
> usually be traced to a particular typed literal.

I am now entirely against 3, even with the badLiteral "hack", whoops,  
I mean, "useful and effective, if incomplete and perhaps nto to  
everyone's taste, friendly amendment".

Somewhere else, Pat, you suggested that it would be a bit wasteful to  
refuse to give answers just because there's a tiny flaw *somewhere*  
in the kb. But I believe the following:
	1) Contradictory graphs are rare
	2) In a curated situation, that such a rare error crept in needs  
investigation. For example, it might be that my scraping to RDF  
script is broken and really *all* my results are bad, but just not  
RDFS-inconsistent.
	For example, suppose I'm building XMLLiterals by chopping up some  
XML data and I do so by taking the first five characters of each  
line. And suppose I have the following data:

	<ab/>
	<de/>
	<hi/>
	<bye/>

And my script generates (reasonably):
	"<ab/>"^^rdf:XMLLiteral.
	"<de/>"^^rdf:XMLLiteral.
	"<hi/>"^^rdf:XMLLiteral.
	"<bye/"^^rdf:XMLLiteral.

And the property they are the object of has range rdf:XMLLitearal  
(which I definitely should do :)).

So, now I have a nice contradiction to alert me to my script problem.  
But wait, you say, the other data is good! Well, if I have this 5  
char limit, maybe I'm also using it elsewhere, to build uris. Or  
string literals. Or what have you.

In other words, there is reason to distrust the data and actually  
figure out what's going on. In fact, I wish illformed literals  
directly produced contradictions. That's way easier to understand, I  
think, than the current behavior. I suspect most people expect that  
if I put ^^xsd:integer in there that i get an object that is an  
integer! Not some other object altogether if I've typoed. If there  
*is* no such integer, then it's an error/contradiction, because I've  
said that a non-integer is "of type" integer.

This is in the spirit of XML, where the well-formed constraints are  
taken very seriously. No tag soup parsing. So, perhaps no semantic  
soup query answering :)

(Of course, paraconsisent consequence relations are quite  
respectable. I'm doing my thesis on some. But that is definitely  
introducing something new and there are issues. I think the report an  
error solution is clean, useful, and easy to understand. It may not  
be maximally useful in all contexts, but is useful.)

In fact, under D entailment, I would argue that if the processor  
doesn't recognize the datatype, it should report an error. I don't  
see why we should have defaulting back behavior inside D entailment.  
If you have to fall back, fall back.

It shouldn't be that hard to come up with a sparql query under simple  
entailment that has answers iff a graph is inconsistent. That query  
could be used to check, or to diagnose, inconsistency.

Hmm. Let's see if I can make progress on such. It'd be a nice WG note.

Oh, I don't see any RDF Test cases for inconsistent graphs. Are there  
any? Does anyone know off hand?

Cheers,
Bijan.
Received on Friday, 18 August 2006 04:30:04 UTC