Re: Denotation of datatype values from Pat Hayes on 2002-04-18 (w3c-rdfcore-wg@w3.org from April 2002)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Thu, 18 Apr 2002 18:40:04 -0500
To: Patrick Stickler <patrick.stickler@nokia.com>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101534b8e4fada393a@[65.217.30.94]>
>On 2002-04-18 0:38, "ext Pat Hayes" <phayes@ai.uwf.edu> wrote:
>
>
>>>  If RDF Datatyping cannot provide a consistent and unambiguous
>>>  interpretation resulting in a specific datatype value, then
>>>  we're just wasting our time.
>>
>>  No no no. This is a misunderstanding. Some idioms provide only
>>  lexical form checking, other idioms provide unambiguous denotations
>>  of datatype values. Both are needed (by different user communities,
>>  maybe, but needed nevertheless.) Some people want both, some people
>>  want one without the other, some people want to be able to remain
>>  agnostic. The various idioms provide for all possibilities. Insisting
>>  that any rational person MUST be involved with finding datatype
>>  values, and the (for example) the Dublin Core style of using RDF is
>>  just wasting time, is both wrong and also confusing to the reader.
>
>I see the only point in constraining literals to the lexical forms
>of a given datatype to be so that there is a consistent interpretation
>of those literals as denoting specific datatype values.
>
>While some idioms do provide denotations of those datatype values
>and the inline idiom does not, that does not mean that the purpose
>of the inline idiom is not to identify a datatype value.
>
>The purpose of *all* datatyping idioms is to identify datatype values.

No, really, it isn't. Dont believe me, check with Graham about DC 
uses of datatype checking. Not everyone is interested in values.

>
>If an idiom doesn't have that purpose, then it is not a datatyping
>idiom. If the inline idiom doesn't have that purpose, then it
>should not be called a datatyping idiom.

That is a narrow view which not everyone shares. I started off with 
that position, but being on this WG kind of knocked it out of me.

>And BTW, I consider an idiom that places a literal in the lexical
>space of a datatype to have identified a datatype value. How could
>it not?! So I consider the inline idiom to be a fully valid datatyping
>idiom.
>
>That *doesn't* mean that the literal node denotes the datatype value,
>but that ultimately, some application should understand at some level
>that the RDF graph is talking about the actual datatype value in relation
>to that particular property

BUt look what you just said. The MT says A, and the MT is the RDF 
meaning spec, but we should say that an application 'at some level' 
should understand this to mean (not A). Thats not a coherent position 
to take, even if you think A is wrong.

>, even if that datatype value is not denoted
>in the graph itself.

No, I insist, we should NOT say that, because it isn't true. An 
application MIGHT do that , but it is not OBLIGED to do that. RDF 
allows an application to make up its own mind on that point. And, I 
might add, if it does do that *and then tries to draw RDF inferences 
on that basis*, it will likely get wrong results. In other words, 
once it has done that, it's on its own, it's out of our sphere, it's 
obeying its own rules, we wash our RDF hands of it. Not our problem.

>I.e.
>
>    literal node all by itself          = literal
>    literal node combined with datatype = datatype value
>
>the latter does *not* change the meaning of the literal
>itself, no more than 2 + 1 = 3 changes the meaning of 2
>just because if combined with 1 we get 3.
>
>The datatyping idioms have a meaning that is the sum of their
>parts -- the literal and the datatype -- and that total meaning
>does not change the meaning of their component parts.
>
>I tried to ask a question a week ago: does a datatype value have to
>have an explicit denotation in the graph, by a single node, in order
>for it to be expressed/identified/communicated by the graph?
>
>I was told, by a combination of silence and limited comments, that
>no, it does not.
>
>You seem to be arguing that it does.

I am saying that what an RDF expresses/identified/communicates, in 
some grander scheme of things, is none of our business. Our business 
is to be very clear what exactly it is that the various parts of an 
RDF graph actually *mean*, and to pin that down as accurately and 
unambiguously as we can. Then users can choose to conform to our spoc 
or not, up to them. If they don't, its not our fault if they get into 
a pickle.

>That if an RDF graph is going
>to express/identify/communicate a particular datatype value, that
>value must have denotation in the graph, or at least explicit
>definition in the MT.

Sure. And if the graph is NOT going to express/whatever that value, 
then it need not have any such denotation. And that we allow BOTH 
options, and that we should say so clearly.

>
>I considered it to be an important and pivotal question then,
>and I still do. And the answer to that question seems to be
>at the root of all of the misunderstanding and disagreement
>about what meaning the idioms capture and how to define the
>MT to that end.
>
>My answer to this question is no, it does not. In that a literal
>node and a datatype associated with that literal node *together*
>can represent/express/identify/communicate a specific datatype
>value even if that value has no explicit denotation in the graph
>by a single node. And I consider that to be the ultimate goal of RDF
>Datatyping, to communicate datatype values.

Then Im sorry, but you ought to be writing a different document, with 
a different title: something like "Guide to using RDF in the Stickler 
way", or something. But it shouldn't be the spec., because the spec 
does not mandate only that kind of usage or restrict users to that 
particular goal.

>
>Here's an analogy:
>
>Datatype Idiom:       value = foo("x");
>Lexical Form Idiom:   function = foo; value = *function("x");
>Inline Idiom:         function = foo; *function("x");
>
>Now, in the first two cases, the result of executing the
>function 'foo' with the argument "x" is stored in the variable
>'value'. In the last case, the result is not stored, but it
>is still expressed.
>
>In all cases, the function is executed and the result is
>obtained.
>
>It is exactly the case with the datatyping idioms. It it
>analogous to
>
>Datatype Idiom:       bnode = xsd:integer("10");
>Lexical Form Idiom:   datatype = xsd:integer; bnode = *datatype("10");
>Inline Idiom:         datatype = xsd:integer; *datatype("10");
>
>In all cases, the lexical to value mapping is defined and the
>datatype value is identified. That's the whole point of
>the datatyping idioms.

Its one point, but not the only point possible. I might only care 
that something is a numeral, and not be interested in its numerical 
value. Perhaps I want to check that it is a valid part of a date in a 
certain format. Im not interested (right now) in the actual date, 
only that this piece of text could be a date (and then I'll send it 
on to the date-checker, and let him worry about the value. We will 
both use datatyping, but probably use different properties, or maybe 
he will use more information about my properties than I am using.)

>
>You seem to say that only the bnode idioms actually identify
>the datatype value, because the value has explicit denotation;
>and that the inline idiom does not identify a datatype value
>but only constrains the literal to the lexical space of a datatype.

Right.

>But in order to test if the literal is a member of the datatype,
>you must -- and this is crucial -- attempt to map it to a datatype
>value, and see if you get an actual value!

Not at all. How the testing is done depends entirely on code on the 
other side of the API for the datatype. Some might offer a lexical 
check that just does a string-parse, without ever computing a value. 
(For example, you don't need arithmetic to check that a numeral is a 
numeral.) But in any case, that is all external to RDF and irrelevant 
to the issue.

>And thus, to say a
>literal is a lexical form of a datatype is to identify the value
>that it represents, presuming it is a valid lexical form.

That is just obviously wrong. The BNF

<digit> ::= 0|1|2|3|4|5|6|7|8|9
<numeral> ::= <digit>|<numeral><digit>

identifies lexical forms without providing any way to even refer to 
their values.

>
>Thus, I find your intepretation of the inline idiom as *only*
>constraining literals to valid lexical forms to be both
>useless insofar as the practical needs of datatyping are concerned
>(which is to communicate datatype values

You keep saying that, and it sounds more and more like someone saying 
that Islam is the only true religion. Sorry, but many people disagree.

>), and actually
>fail to understand how you can say that identifying a literal
>as a lexical form of a datatype does not indirectly yet
>unambiguously and unquestionably identify the datatype value
>which is represented by that lexical form.

Its like saying that one can draw an apple without eating it: its 
kind of obvious.

>
>The expression L2V(I(ddd))("LLL") identifies a specific
>datatype value, no?

Yes, so what? The condition on the inline idiom is only that this 
expression is *defined*; it does not refer to its value. If you 
prefer, we could rephrase this differently: the condition is that 
"LLL" is in the lexical space of I(ddd). I didn't phrase it that way 
purely for the sake of mathematical elegance, since the MT only 
assumes the existence of the L2V map, but we could say that "LLL" is 
in the domain of L2V(I(ddd)).

>
>And all three idioms define L2V(I(ddd))("LLL"), no?
>
>And the bnode idioms further make the assignment
>I(ccc) = L2V(I(ddd))("LLL"), i.e. they fix the datatype
>value to the bnode, no?

The bnode idioms do, yes.

>
>Thus when I say that the datatyping
>MT provides a datatype value interpretation for all
>three idioms, I just don't get why you say that's not
>correct. If that's not what the MT is saying, then
>that's what the MT *should* be saying, IMHO, and
>that is certainly what I thought the MT was saying
>when I voted in favor of the "stake in the ground".

Well then evidently there was some misunderstanding. Look, if your 
interpretation were correct, what use would the bnode idioms have? 
They would only say the same as the inline idiom, but using more 
nodes. The whole point of all these idioms is that each says a 
different thing, conveys a different piece of information, but they 
all refer in one way or another to datatypes. They all make use of 
datatyping information in various ways. They all work together 
smoothly and are mutually consistent, but they provide a variety of 
ways of using the information and a variety of degrees of commitment 
to what is being said about ranges and properties.

>
>>>  If a given approach to reaching that goal results in contradictions,
>>
>>  Ignoring datatype values while being concerned with lexical forms is
>>  not being involved in a  contradiction. It might be bad style, or
>>  bone-headed, but some of our customers are like that.
>
>I think you have misunderstood the motivations for the inline
>idiom, or then I have. I understand the issue such that
>it is the presence of the blank node that is considered
>cumbersome

No, we tried that idea and it was rejected by the WG. That was the MT 
in the version BEFORE the stake got stuck in the ground, the one that 
Jeremy liked so much. In that version, a literal in the in-line idiom 
is really just like a bnode in the dlex idiom, but written more 
compactly. But that idea is history, at this point. (And while I was 
kind of pleased with it at the time, it was fragile. It avoided 
untidy literal nodes by a subtle dodge that would have broken if we 
made even a small change to RDF syntax, and would have required 
re-writing the basic RDF MT, so maybe it was best forgotten. Its 
possible to be too damn clever.)

>-- and more from the point of view of the serialization,
>not the graph representation, and it's not just the desire to
>only restrict the literal to valid lexical forms of a datatype.

Some members of the WG are quite firmly insistent on the principle 
that a literal means a literal means a literal, and that this is cast 
in stone, and that a literal most definitely and emphatically does 
not mean something that depends on some other datatyping information. 
And that means that it does not represent, express, convey, indicate, 
signify, transmit, refer to, mean, designate or whatever that other 
thing, as far as the RDF spec is concerned. We really should tell our 
users this clearly and unequivocally, and not shilly-shally. They 
need time to get used to it, and if they want their RDF to 
communicate their meaning according to the spec, then they need to 
get the alternatives clear in their minds.

>
>(and as I've said before, the only point to restricting literals
>to the lexical forms of a datatype is to assert an interpretation
>in terms of that datatype)
>
>I expect that the same datatype clash occurs for both of the
>following cases:
>
>Case 1:
>
>    ex:age rdfd:datatype xsd:integer .
>    ex:age rdfd:datatype xsd:string .
>    Jane ex:age "10" .

That isn't a clash. A savvy RDF engine might post a user warning, but 
strictly speaking this is datatype consistent. It really is not clear 
what the 'intended' value of Jane's age is in some larger sphere, but 
what this RDF actually *says* is that Jane's <ex:age> is a string 
conforming to both datatype lexical forms; and it does.

>Case 2:
>
>    ex:age rdfd:datatype xsd:integer .
>    ex:age rdfd:datatype xsd:string .
>    Jane ex:age _:x .
>    _:x rdfd:lex "10" .
>
>Note that datatype clashes happen only above the RDF level.

Well, in a sense, but also in a sense not. The notion of a datatyping 
interpretation imposes semantic conditions that can only be *checked* 
by invoking some external machinery, but it does make them into RDF 
semantic conditions. So it makes sense to say that a datatype clash 
is really inconsistent *in (datatyped) RDF*.

>RDF cannot know that the two datatypes map the lexical
>form to different values, and thus cannot know that in
>the latter case, the blank node is assigned two values.

Right, that can only be checked by the relevant datatyping machinery. 
But it is an RDF semantic condition.

>And since the interpretation of the inline idiom and
>rdfd:datatype assertion *together* as identifying a
>datatype value

What value they identify is irrelevant, since the conditions on the 
inline idiom do not refer to it. So there would be no need to request 
it from the datatype API, for example.

>  happens above the RDF level, the clash
>also occurs.
>
>(and of course, it's important to note that datatype
>clashes do not mean software crashes ;-)

I agree, though we might expect that a conforming RDF engine would 
complain in some way.

>but rather that
>there are conflicting interpretations being asserted by
>the datatypes globally associated with the lexical
>forms, and an application needs to deal with it in
>some fashion)
>
>If people do not wish to constrain the datatyping
>interpretation of ex:age values to a given datatype
>or datatypes, then they shouldn't make any rdfd:datatype
>assertions, for *any* of the idioms, since they apply
>to *all* of the idioms.

But they apply differently to different idioms. The image in my 
original document makes this fairly clear, I think.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Thursday, 18 April 2002 19:40:18 UTC