Re: Ill-typed vs. inconsistent? from Pat Hayes on 2012-11-12 (public-rdf-wg@w3.org from November 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 12 Nov 2012 11:21:23 -0600
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <955498B3-3C83-435E-9383-B8CA087E15BF@ihmc.us>
On Nov 12, 2012, at 9:43 AM, Richard Cyganiak wrote:

> On 12 Nov 2012, at 07:58, Pat Hayes wrote:
>>> What's the relevance of the distinction between “graphs containing ill-typed literals” and “inconsistent graphs” in the Semantics?
>> 
>> The relevance is that it is quite possible to say sensible (and therefore consistent) things about ill-typed literals, such as that they are ill-typed. 
> 
> Do you mean, say those things in RDF? How do you say in RDF that a literal is ill-typed?

Well, you need RDFS and OWL to get expressive enough and to work around the silly RDF syntax rules, but this would do it:

_:x owl:sameAs "sillyliteral"^^xsd:number .
_:x rdf:type :IllFormedLiteral .
:IllFormedLiteral owl:disjointWith rdfs:Literal

> Can you give a less self-referential example for something sensible that could be said about an ill-typed literal?

That would depend upon the literal. But for example, suppose you wanted to use xsd:number but your data has the occassional entry of  "zero" to mean "0". Right now, you could just let this slide past and the parsers would not actually break. And you could even assert that the value was a number without anything breaking. 

> 
>>> The text stresses that the presence of an ill-typed literals does not constitute an inconsistency.
>>> 
>>> But why does the distinction matter?
>> 
>> I am not sure what you mean by "the distinction" here. Why would you expect that an ill-typed literal would produce an inconsistency?
> 
> I don't understand why a separate class of errors is introduced *just* for ill-typed literals.

Inconsistency isn't an error. If you want to suggest that we make ill-typed literals into a version of a syntactic error, I would go along with that, but it has consequences for parsers that the 2004 WG thought were too onerous. Since RDF's treatment of dataypes is open-ended, it means that when any new datatype is added, you have to re-write the parser code to catch a new class of errors. And it also means that RDF is kind of nonmonotonic (in a sense): adding RDF information can make previously legal RDF into illegal RDF. 

(Both of these issues would be solved or greatly eased if RDF simply adopted the XSD datatypes as the only, and fixed, set of RDF datatypes and ceased trying to be so future-general. If we were to do this, then I would absolutely vote for making ill-typed XSD literals into a *syntactic* error, and simply not mentioning ill-typing in the semantics at all.) 

But, if illtyped literals are syntactically legal, then as Semantics editor I insist that they must have some kind of meaning. This can be done in several ways. One is to simply say that all triples containing an ITL are false. This is simple but has several disadvantages. It means that there is no way to say anything about an ill-typed literal, and it means that a whole raft of "obvious" logical properties start to have exceptions. Basically, all tautologies involving literals can now be false, so all the axioms need to have ill-formed-literal exceptions written into them (and we have to check that they can't sneak in the back door by interactions between inference rules.) This is a well-known can of worms for logicians, which is one reason typed logics were invented, to push ill-typing into the syntax in order to keep the semantics from getting hopelessly muddy. The other way (and it is a general pattern for expressing typing in an untyped logic) is the one we used: allow these "bad" expressions to have a value, but insist that it is a value in a dustbin category that has a name, so that wellformedness can be expressed in the logic itself. 

> I don't understand what benefit is gained by classifying ill-typed literals and inconsistencies differently in the specs. I don't understand how the distinction is actionable. How does anyone benefit from knowing that a given graph is non inconsistent but contains an ill-typed literal?
> 
>> Why would the presence of an ill-typed literal make a triple false?
> 
> Because it asserts that some entity has a relationship to some non-existing thing?

But why would that make it *false*? Surely its truth value would depend on *what it was saying* about that nonexistent entity? 

> 
>>> Is there any reason anybody needs to know about this distinction who isn't interested in the arcana of the model theory?
>> 
>> I'm not sure what you consider to be "arcana".
> 
> "Arcana: Highly specialized knowledge that is mysterious to the average person."
> 
> That's quite an appropriate description of the distinction between “ill-typed” and “inconsistent”, I think.
> 
>> Someone who cannot follow the model theory probably shouldn't be using RDF.
> 
> That's crazy talk.

Do you really think that the idea of inconsistency is mysterious to the average RDF user?

> 
>>> From the perspective of someone who authors RDF data, or works with RDF data, they both seem like belonging to the same class of problem, and I'm a bit at a loss as to how to explain the difference.
>> 
>> To me they seem quite obviously different, so apparently I am not following your intuition here.
> 
> So let's look at these three RDF graphs:
> 
> Graph A:
> 
>   :a :b "1"^^xsd:integer.
> 
> Graph B:
> 
>   :a :b "xxx"^^xsd:integer.
> 
> Graph C:
> 
>   :a :b "xxx"^^xsd:integer. :b rdfs:range rdfs:Literal.
> 
> Those graphs fall into three classes, let's call them class A, class B, and class C.
> 
> To me it's pretty clear why we would distinguish class A from class B+C. That's because it's clear what graph A is supposed to mean (assuming we know what :a and :b are), and it's rather unclear what statement graphs B and C are trying to make.

Well, we do distinguish them. B +C both contain ill-typed literals, while A does not. I agree, deliberately putting illtyped literals into RDF data is a wierd thing to do, and when you find them it is probably a signal of a mistake. But again, what has that got to do with inconsistency? 

(Suppose that instead of "xxx" you saw "inf" or "null" or "zero", what would you conclude? I wouldn't describe those as nonsensical, and I would hazard a likely guess as to what the author was intending, and how I could explain the error. Perhaps more to the point, suppose you saw

Graph D:

:a :b "xxx"^^ex:unknownDatatype

What would you conclude?)

> 
> The distinction between B and C, if seen as an attempt to communicate some knowledge about the domain of interest, is not clear to me. They're both nonsense, and clearly the result of an error on the author's part.

If I were to read C my guess would be that the author was trying (and failing :-) to do something clever with an edge case of some kind, eg wanting to have leading zeros be allowed in a context where the datatype says they aren't, or some such. 

> It's rather impossible to interpret the author's intent. So I don't know why we distinguish two different classes of nonsense here.

Again, inconsistent doesn't mean nonsensical. And nonsense doesn't always arise from authors' intentions. It can happen from all kinds of reasons, such as putting together data from various sources. That is one way that something like C could arise, for example. 

> 
>> FWIW, one should *not* think of inconsistency as a kind of error condition. (Maybe the semantics text should spend a little time explaining this point.)
> 
> Perhaps. I think of inconsistency as a data quality problem.

I think that is the basic problem here, that you are overloading "inconsistent" with other meanings it does not have. 

> There are other kinds of data quality problems that have nothing to do with consistency, of course.

Like containing malformed literals? 

But there are also many other uses of inconsistency. Reasoners often treat inconsistency as a termination condition, for example. In general, its a mistake to think of inconsistency as anything other than a logical condition on sets of triples. It does not come pre-branded for a certain kind of use. 

Pat


> 
> Best,
> Richard
> 
> 
> 
>>> (I know how both terms are defined and what conditions exactly cause them; the question is about why the spec insists that ill-typed literals do not cause a graph to be inconsistent.)
>> 
>> My question, in reply, would be to ask why anyone would think it would.
>> 
>> Pat
>> 
>>> 
>>> Best,
>>> Richard
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 12 November 2012 17:21:53 UTC