W3C home > Mailing lists > Public > public-rdf-wg@w3.org > November 2011

Re: RDF-ISSUE-79 (undefined-datatype): What is the value of a literal whose datatype IRI is not a datatype? [RDF Concepts]

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sun, 20 Nov 2011 20:16:28 +0000
Cc: RDF Working Group WG <public-rdf-wg@w3.org>, RDF Working Group Issue Tracker <sysbot+tracker@w3.org>
Message-Id: <229D4FDA-47E1-40B3-9EE8-C86E2EF2731F@cyganiak.de>
To: Pat Hayes <phayes@ihmc.us>
Hi Pat,

On 18 Nov 2011, at 16:55, Pat Hayes wrote:
>> The RDF Concepts spec (in both 2004 and 1.1 versions) does not answer the question what's the value of a literal where the datatype IRI doesn't actually denote a datatype, like <"foo",http://example.com/not-a-datatype>. This is surprising, as there is a section that normatively defines the value of *all other* literals.
> 
> I dont find it surprising, and I think you have slightly mischaracterized it.

I'm not criticizing the design. I'm criticizing the fact that RDF Concepts doesn't say anything about what happens in this case.

> A typed literal only has a fixed meaning relative to an actual datatype. So, to fix the meaning, you have to invoke a datatype denoted by the datatype URI. If this is not available, then the literal's value is not determined, and it becomes in effect something like an unknown URI.

Right – and that's exactly what I expected RDF Concepts should say. At the moment it says *nothing*, so users and implementers have to guess (or read RDF Semantics).

>> There are many possibilities:
>> 
>> (i) the spec leaves it undefined
>> (ii) that's not a valid RDF graph
>> (iii) it's a valid RDF graph, but the value, if any, is unknown
>> (iv) it's a valid RDF graph, and the literal is ill-typed
>> 
>> This should be made explicit.
>> 
>> The status quo is (i). I believe that the model theory says it's (iii).
> 
> Yes, it is (iii) at the moment, if by "valid" you mean syntactically correct. (See below.) However, the semantics does (rather vaguely) talk about the possibility of having datatypes "declared" in a graph (see end of section 5.1, http://www.w3.org/TR/rdf-mt/#DTYPEINTERP ):
> 
> "If every recognized URI reference in a graph is the name of a known datatype, then there is a natural datatype map DG which pairs each recognized URI reference to that known datatype (and 'rdf:XMLLiteral' to rdf:XMLLiteral). Any rdfs-interpretation I of that graph then has a corresponding 'natural' DG-interpretation which is like I except that I(aaa) is the appropriate datatype and the class extension of rdfs:Datatype is modified appropriately. ApplicationsMAY require that RDF graphs be interpreted by D-interpretations where D contains a natural datatype map of the graph. This amounts to treating datatyping triples as 'declarations' of datatypes by the graph, and making the fourth semantic condition into an 'iff' condition. Note however that a datatyping triple does not in itself provide the information necessary to check that a graph satisfies the other datatype semantic conditions, and it does not formally rule out other interpretations, so that adopting this requirement as a formal entailment principle would violate the general monotonicity lemma described in section 6, below."

I'm sorry but I can't make sense of this paragraph. And I've been trying, honestly. What's a “recognized URI”? What's a “known datatype”?

(I'd like to flag this paragraph for editorial attention – I can make sense of most of the Semantics document if I try hard enough, but this part beats me.)

> You can never know that literal is ill-typed unless you have the datatype to check that it is, so (iv) can't ever be right. 

Well, I know that this is the intended design, but given that “ill-typed” is never precisely defined anywhere, it's not unreasonable for a reader to start with the working theory that literals with a datatype IRI that isn't known to denote a datatype are also considered ill-typed.

In fact, I believe that's what the OWL2 RDF-based semantics seems to assume. They treat both cases – lexical form not in the L2V map, and datatype IRI not in the datatype map – in the same way: the literal denotes something outside of rdfs:Literal. See text quoted here:
http://lists.w3.org/Archives/Public/public-rdf-wg/2011Nov/0126.html

This may be a bug in OWL2 and may be worth raising as an erratum with the OWL WG. (But someone else do this please – model theory is not my territory.)

> (BTW, this phrase "valid RDF graph" seems to be blurring its meaning.

You're right – I meant “valid” in the sense of “conforming to the definition of RDF graph”. (cf. “valid HTML”, “RDF Validator”)

On 18 Nov 2011, at 17:07, Pat Hayes wrote:
> BTW, its very odd to say that something "is unknown". If this means "we don't know what its value is (yet)", then of course any missing information makes something unknown.  But it is tempting to treat "unknown" as a classification, like being human, so that once something is in the unknown category then it is **known** to be 'unknown'. And if we do that, then the logic becomes nonmonotonic and many other semantic assumptions break. I think you guys might be using the word in different ways (?). I'm assuming Richard is using in the second way. 

Well, I'll admit that I'm struggling with this notion of something being “unknown”. Surely, in a deterministic system, you either know something or you don't know it – and if it's the latter, then you *know* that it's unknown.

Now I'm tempted to write something like this in RDF Concepts:

[[
If the literal's datatype IRI is not in the datatype map, then the literal value is undefined.
]]

“Undefined” seems to be the right term to use here: The spec does not say anything about what the value is, but neither does it stop anyone from defining the value (e.g., in a semantic extension).

Best,
Richard




> In the semantics document, validity refers to truth in interpretations: being invalid means that a graph is false in every interpretation, ie it cannot be satisfied. It does not mean syntactically illegal. Validity in this sense requires an inference engine to check, not a parser. I know that "valid" has many meanings, but just wanted to make sure we don't start talking past one another, or at least be aware of it when we do, cf. this thread.)
> 
> Pat
> 
>> 
>> 
>> 
>> 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 
Received on Sunday, 20 November 2011 20:17:12 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:46 GMT