Re: ISSUE: Inconsistent graphs (and illformed literals) from Pat Hayes on 2006-08-19 (public-rdf-dawg@w3.org from July to September 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 19 Aug 2006 00:18:18 -0700
To: Bijan Parsia <bparsia@cs.man.ac.uk>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <p06230959c10c659e5fa7@[192.168.1.6]>
>On Aug 17, 2006, at 7:24 AM, Pat Hayes wrote:
>	[At some point, I wrote, but Pat snipped the attribution line 
>:) Stop doing that, Pat!]

Sorry. Trying to save space.

>>>There are two additional choices:
>>>
>>>	1) An inconsistent graph returns no answers
>>>	2) A query on an inconsistent graph returns an error, oh, 
>>>"inconsistent graph".
>>>
>>>I personally prefer the latter.
>[snip]
>>>Note, just as an aside, I think this shows that the original idea 
>>>of just doing graph matching and then doing graph matching against 
>>>"virtual graphs" is not, by itself, a sufficient way of 
>>>specification, even aside from the other problems it has with 
>>>scaling to OWL. There are other side conditions that one must 
>>>check.
>>>
>>>Oh, there is a third choice, though I tend not to take it seriously:
>>>	3) An inconsistent graph returns the set of answers that 
>>>graph matching against a graph generated by forward chaining 
>>>application of the entailment rules + some hacking to avoid BNode 
>>>proliferation.
>>
>>Well, 'hacking' is tendentious.
>....
>Do you think "hacking" is a derogatory term?

I thought it was intended in that way here, but obviously I was wrong, sorry.
...

>>>3 can be used as a way to detect contradictions, since there is a 
>>>(large, disjuncitve) query that should test for datatype clashs. 
>>>But clearly this particular incompleteness isn't sanctioned by the 
>>>semantics, and we have to be *very* careful about specifying the 
>>>rules and how they are applied to assure interoperability.
>>>
>>>It wouldn't be that hard to come up with a paraconsistent reading 
>>>of all this so as to get some version of 3 (i.e., "useful answers" 
>>>out of the graph). For example, we could sanction all answers 
>>>following from every maximal consistent subgraph of the 
>>>inconsistent graph.
>>
>>Ah, indeed we could :-).
>>
>>>  It would still be good to distinguish between some answers that 
>>>don't follow because the triples *aren't there* and some answers 
>>>not following because they depend on *inconsistent* triples.
>>
>>Well, we could return a binding to a sparql:badLiteral URI to 
>>signal the presence of the error. This does not cover all possible 
>>cases, but it will cover most of the actual cases. These 
>>inconsistencies are all of a very special kind and their source can 
>>usually be traced to a particular typed literal.
>
>I am now entirely against 3, even with the badLiteral "hack", 
>whoops, I mean, "useful and effective, if incomplete and perhaps nto 
>to everyone's taste, friendly amendment".

No, hack.

>Somewhere else, Pat, you suggested that it would be a bit wasteful 
>to refuse to give answers just because there's a tiny flaw 
>*somewhere* in the kb.

Yes, and I still do.

>But I believe the following:
>	1) Contradictory graphs are rare
>	2) In a curated situation, that such a rare error crept in 
>needs investigation.

Suppose I agree: why is this relevant to *querying*?

>For example, it might be that my scraping to RDF script is broken 
>and really *all* my results are bad, but just not RDFS-inconsistent.
>	For example, suppose I'm building XMLLiterals by chopping up 
>some XML data and I do so by taking the first five characters of 
>each line. And suppose I have the following data:
>
>	<ab/>
>	<de/>
>	<hi/>
>	<bye/>
>
>And my script generates (reasonably):
>	"<ab/>"^^rdf:XMLLiteral.
>	"<de/>"^^rdf:XMLLiteral.
>	"<hi/>"^^rdf:XMLLiteral.
>	"<bye/"^^rdf:XMLLiteral.
>
>And the property they are the object of has range rdf:XMLLitearal 
>(which I definitely should do :)).
>
>So, now I have a nice contradiction to alert me to my script 
>problem. But wait, you say, the other data is good! Well, if I have 
>this 5 char limit, maybe I'm also using it elsewhere, to build uris. 
>Or string literals. Or what have you.
>
>In other words, there is reason to distrust the data and actually 
>figure out what's going on.

But suppose your data has been entered by hand, and an occassional 
typo gives a datatype error. We can go on trading examples like this 
for ever. I don't see any *general* reason why all KBs with 
inconsistencies in them must be treated as faulty.

>In fact, I wish illformed literals directly produced contradictions. 
>That's way easier to understand, I think, than the current behavior.

The problem with that design, though, is that you cannot do RDF/RDFS 
reasoning about the wellformedness of literals, which was considered 
potentially useful. Also, it means that you lose monotonicity (in a 
sense) because you can discover that something is ill-formed only 
when you know the datatype. These bad-datatype errors are pretty easy 
to discover, so you could generate an error message when you find the 
contradiction, the RDF specs allow this.

(Another option would be to have 'bottom' and 'top' elements in all 
the dataype domains, but this doesnt conform with XSD nor with the 
RDFS basic logic where classes are more like sets than semilattices.)

>I suspect most people expect that if I put ^^xsd:integer in there 
>that i get an object that is an integer!

If you think of datatypes as a kind of syntactic constraint, yes. 
That is one way to think about it, and a very popular one, I agree, 
but it doesnt really mesh with a monotonic descriptive logic 
framework.

>Not some other object altogether if I've typoed. If there *is* no 
>such integer, then it's an error/contradiction, because I've said 
>that a non-integer is "of type" integer.

It is an error if you know what integers are supposed to look like. 
And yes, it *is* a contradiction, that's the point. Look, we agree 
that, say, "abc" isn't a numeral. So we could say that 
"abc"^^xsd:number is ill-formed, an error. But this means that even 
*parsing* RDF is dependent on knowing all the datatypes. If we let 
this past the parser, it has to denote *something* unless we are in a 
free logic. Whatever it denotes, that had better not be a number, 
right?

>This is in the spirit of XML, where the well-formed constraints are 
>taken very seriously. No tag soup parsing. So, perhaps no semantic 
>soup query answering :)
>
>(Of course, paraconsisent consequence relations are quite 
>respectable. I'm doing my thesis on some. But that is definitely 
>introducing something new

Right. We made a decision to avoid paraconsistency, and also 
non-denoting names, early on in the RDF design.

>and there are issues. I think the report an error solution is clean, 
>useful, and easy to understand. It may not be maximally useful in 
>all contexts, but is useful.)
>
>In fact, under D entailment, I would argue that if the processor 
>doesn't recognize the datatype, it should report an error.

That design would be absolutely against the RDF design philosophy. We 
wanted it to be the case that missing information was not fatal, only 
made you unable to draw some conclusions. You might object to this, 
but it was a strong design constraint throughout the RDF process.

Anyway, these decisions are all made now.

Pat
-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 19 August 2006 07:18:36 UTC