Re: Ill-typed vs. inconsistent? from Richard Cyganiak on 2012-11-14 (public-rdf-wg@w3.org from November 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 14 Nov 2012 09:46:27 +0000
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <A7AED412-7CBF-4139-9D5E-1B524522B1FC@cyganiak.de>
On 14 Nov 2012, at 07:16, Pat Hayes wrote:
>> I agree that ill-typed literals should not be syntactic errors, for the reasons you mention.
> 
> I actually think they should be. Illformedness is, after all, a purely syntactic matter.

It's not a purely syntactic matter. Literals are ill-typed if the datatype's L2V mapping -- the thing that specifies the datatype's semantics -- doesn't assign a value -- a meaning -- to the literal.

> And if we make them into inconsistencies, engines will still need to do the same work in checking wellformedness, so the pragmatic arguments seem to apply in either case. 

But that would make the definition of syntactic validity dependent on a datatype map. Since every implementation may choose its own datatype map (with certain constraints), this would make the syntax ill-defined, with different implementations disagreeing about what's syntactically valid.

>> I also don't think that limiting the allowed datatypes to only a fixed set of built-in types is feasible at this point; in 2012, removing custom datatypes would just not fly.
> 
> I tend to agree, but then we are left with a host of issues about how to treat "unknown" datatyped literals.

That's difficult if you try to push extensible datatyping into the syntax, but not really an issue if the extensible typing is handled in the semantics. Unknown datatypes simply don't impose any additional semantic conditions on interpretations, so no interesting inconsistencies, entailments or equivalencies arise from them.

Anyway, neither treating extensible datatyping as a syntax issue, nor abolishing it, is going to get consensus in this WG.

> But, let me emphasise, however we decide to handle this, there is a clear *conceptual* distinction between inconsistency and ill-formedness. Consistency has to do with truth: inconsistent means, necessarily false. Being ill-formed or otherwise syntactically peculiar is a different topic altogether. So I didn't, and still don't, see any particular reason why these two notions shold be conflated.

I don't dispute that there is a conceptual distinction. I'm not suggesting that the distinction shouldn't be made in the Semantics.

I'm suggesting that the distinction is not a helpful one to make in Concepts.

> The simple trick I suggested for making ill-formedness into inconsistency is exactly that: a trick.

If it's a trick that keeps the interface between Semantics and Concepts simpler, then I'm all for it.

>> Thus, a distinction that isn't actually useful in Concepts land gets pushed out of Semantics and into Concepts territory.
>> 
>> I can live with that. Concepts can say informatively that the distinction is there for technical reasons
> 
> It is "there" for the same reason that there is a distinction between chalk and cheese. They are different.

Nonsense. We are talking here about the treatment of this distinction *in the RDF specifications*. The RDF specifications make no distinction between chalk and cheese—they are both resources that can be denoted by IRIs and have statements made about them using RDF. Just because two concepts are different is, in itself, no reason to treat them differently in a particular model.

>> related to the formal model theory, and applications may treat ill-typed literals the same way they treat inconsistencies
> 
> Same way?? In that they both might trigger some kind of error, I guess, but not in any closer way. An inconsistency can arise, for example, from putting together data from two sources which simply disagree about the facts. This isn't a plausible account of how an ill-formed literal can get there.

That's not as true as you may think.

Almost all RDF data in existence is the result of some existing data being translated from some other format or data model into RDF. A common scenario is that Alice develops a conversion process for Bob's data to RDF, but isn't fully aware of the syntactic constraints that apply to the data, and therefore puts together the lexical forms obtained from Bob with an inappropriate datatype IRI.

From the point of view of data collection, data transformation and data fusion, the process by which we end up with ill-typed literals isn't fundamentally different from the process by which we end up with logical inconsistencies. Those two things might be fundamentally different to a logician, but they may not be to the DBAs and analysts who process the data.

> What I still don't follow is, why anyone who understands what an inconsistency is, would even form the idea that an ill-typed literal would be an inconsistency. It's not the distinction that needs explaining, it's why anyone would treat them as similar in the first place.  Illformedness is not even in the same category as an inconsistency. Literals aren't true or false by themselves. 

Data isn't true or false by itself either. All data is an imperfect record of the domain of interest. All data has quality issues that limit its applicability. That's nothing new, and the people working with any particular dataset get to know its limitations. Imperfect data is still useful. The list of quality issues that data analysts have to deal with on a daily basis is long, as many things can go wrong: access/network, character encodings, syntax issues, outdatedness, inconsistent representations of the same concept, logical inconsistencies, and so on and so forth. People work around these issues, and most of them care very little about the classification of these issues! In fact, whether some data is logically consistent or not is, in many cases, of very little concern. Without having any hard data, I would claim that most logical inconsistencies are caused by the presence of ill-typed literals, as in practice most properties have a declared range these days; and this kind of issue can usually be worked around quite easily with a bit of SPARQL. (And most of the other inconsistencies can be worked around simply by not using OWL DL.)

Does this explain why anyone would treat ill-typed literals and inconsistencies as similar?

Is this sufficient motivation for the question why the RDF stack treats ill-typed literals neither as syntax errors nor as inconsistencies?

Best,
Richard
Received on Wednesday, 14 November 2012 09:47:10 UTC