Re: RDF-ISSUE-79 (undefined-datatype): What is the value of a literal whose datatype IRI is not a datatype? [RDF Concepts] from Richard Cyganiak on 2011-11-21 (public-rdf-wg@w3.org from November 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 21 Nov 2011 20:21:17 +0000
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>, RDF Working Group Issue Tracker <sysbot+tracker@w3.org>
Message-Id: <C43CACB1-88F9-4E92-9941-A4469220ABBB@cyganiak.de>
On 20 Nov 2011, at 21:55, Pat Hayes wrote:
>> it's not unreasonable for a reader to start with the working theory that literals with a datatype IRI that isn't known to denote a datatype are also considered ill-typed.
> 
> Not unreasonable, but still wrong. The problem is that when we don't know what the datatype is, the value of the literal really could be anything. In particualr, it could be a perfectly good literal value. So we can't in this case impose the 'not in LV' condition on the value: we just don't know enough to tell. When we do have a datatype and a string which the datatype mapping knows is not allowed, then we can impose the not-in-LV condition, but this is a much more informed state to be in that not even knowing the datatype. 

That makes sense to me.

> What I meant was, if unknown means 'not known', then you might come to know it later. Its being not known is labile, because new information might come along. BUt if "unknown" is a classification or a kind of value, then once somehting is "unknown" it has to stay "unknown". Coming to know it later is then actually a kind of contradiction. (This issue comes up acutely in considering three-valued logics, BTW, which I once studied in more depth than I care to remember, a long time ago.)

When I read “unknown” in RDF Semantics I tend to translate it to “unconstrained – could be anything”. Appealing to three-value logic is also a nice way of explaining it. Makes it clear that it's not “unknown to some conscious interpreter of the RDF graph” or such nonsense; instead, it's a well-defined state that something is in until some possible additional information comes along later on to fill it in.

>> Now I'm tempted to write something like this in RDF Concepts:
>> 
>> [[
>> If the literal's datatype IRI is not in the datatype map, then the literal value is undefined.
>> ]]
>> 
>> “Undefined” seems to be the right term to use here: The spec does not say anything about what the value is, but neither does it stop anyone from defining the value (e.g., in a semantic extension).
> 
> As long as nobody thinks that "undefined" means "does not have a value at all". I think "unknown" might actually be better, maybe. Right now Semantics says this: "Typed literals whose type is not in the datatype map of the interpretation are treated as before, i.e. as denoting some unknown thing. "  

Yup, that phrasing works well in the context of RDF Semantics where a lot of space is expended on explaining the entire notion of “building up knowledge” by effectively “restricting possible words”. In RDF Concepts, in the absence of this exposition, I'd prefer to avoid the term “unknown” if possible. Without the exposition, it raises the question: Unknown to whom?

A slight refinement:

[[
If the literal's datatype IRI is not in the datatype map, then the literal value is not defined by this specification.
]]

This goes a bit further in a) suggesting that there is a value, and b) leaving open the possibility that another specification (like OWL) might actually say exactly what it is.

Another situation where RDF Concepts has to deal with the “unknown” is when (non-normatively) explaining the referent of an IRI in the Introduction. I changed this to use the same phrasing (“not defined by this specification”) and I find it works quite well:

[[
What exactly is denoted by any given IRI is not defined by this specification. The question is treated in other documents like Architecture of the World Wide Web, Volume One [WEBARCH] and Cool URIs for the Semantic Web [COOLURIS].
]]

So I'm happy enough with this phrasing and will mark the associated ISSUE-79 as Pending Review:
http://www.w3.org/2011/rdf-wg/track/issues/79

But am of course still happy to discuss this wording further if you or anyone else thinks it can be improved.

> Its not easy to say this stuff in a way which is compact, easy to read and also reasonably proof against misunderstandings. 

Amen!

Richard



> 
> Pat
> 
>> 
>> Best,
>> Richard
>> 
>> 
>> 
>> 
>>> In the semantics document, validity refers to truth in interpretations: being invalid means that a graph is false in every interpretation, ie it cannot be satisfied. It does not mean syntactically illegal. Validity in this sense requires an inference engine to check, not a parser. I know that "valid" has many meanings, but just wanted to make sure we don't start talking past one another, or at least be aware of it when we do, cf. this thread.)
>>> 
>>> Pat
>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> ------------------------------------------------------------
>>> IHMC                                     (850)434 8903 or (650)494 3973   
>>> 40 South Alcaniz St.           (850)202 4416   office
>>> Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667   mobile
>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
>
Received on Monday, 21 November 2011 20:21:50 UTC