RE: datatypes and MT

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Wed, 14 Nov 2001 16:48:45 -0600
Message-Id: <p05101054b8187b5db241@[]>
To: Patrick.Stickler@nokia.com
Cc: w3c-rdfcore-wg@w3.org
>>  >We need to keep in focus the fact that "10" is a lexical
>>  representation
>>  >of the value, not the value.
>>  Well, that is one of the issues being discussed. In the S proposal,
>>  the value of  the literal "10" would be the string "10".
>Then that would be incorrect, as the value is 'ten'.

Well, that was my own intuition, but others disagree. Which is why we 
are having this discussion.

>But my point is that we should not be trying to address the
>mapping from "10" to 'ten' in RDF beyond the association of
>the literal "10" with a class which denotes a data type
>that defines a mapping from "10" to some value in the value
>space identified by that data type.

I agree, but the question of how to relate that mapping to the RDF 
syntax is still open and still under discussion, and different 
proposals provide different answers to these questions.

>>  >Yes, folks are not saying that the shoe size is a string. They
>>  >are expecting that lexical form to mapped to a value in a particular
>>  >value space.
>>  Right, which is what the P(++) datatyping proposals try to do.
>As is the X proposal.
>>  >The same is true of the lexical representation of literals in a
>>  >programming language.
>>  >
>>  >    protected Integer shoeSize = 10;
>>  >
>>  >is not saying the shoeSize is the character sequence '10' (even
>>  >though there are no quotes), but the value ten.
>>  Right, but that lack of quotes is significant.
>It's an issue entirely within the lexical space of the data type.
>RDF has its own lexical space for its "primitives" and literals
>are a primitive, and we enclose literals in quotes so that they
>are known to be RDF Literals,

Again, that was my understanding also, but apparently it is not 
universal. Dan C. in particular wants to take these quote marks to be 
genuine quotations. That proposal, I have to admit, has a 
refreshingly coherent ring to it once you get used to the necessary 
discipline in using RDF appropriately.

>  but that doesn't mean that the
>use or meaning of quotes in any other lexical space to delimit
>lexical forms is relevant. It's not.
>>  In LISP for example,
>>  supplying the character sequence '10' as an argument indicates the
>>  value ten, while supplying the character sequence ''10' indicates
>>  that the value is the character sequence '10'.
>It's irrelevant how LISP delimits its lexical forms.

I was only reacting to your 'programming language' example which was 
apparently making a point about quotation in general, and observing 
that in at least one programming language, the point was invalid. I 
could have chosen a number of languages in which quotation is used to 
indicate strings, of course.

>....> >
>>  >The mistake here is to somehow thing that RDF will interpret
>>  >them in any way.
>>  Right. But I believe that nobody is making that particular mistake.
>>  The discussion is about whether a literal label should be taken to
>>  *denote* the string or the value it has under a datatyping scheme.
>The literal denotes a value in a value space, just as any lexical
>form in a lexical space of a data type denotes a value in the
>value space of the data type.
>I thought that was crystal clear.

No, it is not clear. In fact, this is a way of stating the issue 
under discussion: HOW should denotation in RDF be related to the 
lexical-to-value space mapping of a datatype? We have several 
possible answers on the table now. In some of the proposals, literals 
denote strings.

>Whether the actual mapping from lexical form to value is defined
>in terms of RDF constructs, within RDF Space, is a seperate issue,
>and one that I don't think RDF should address.

I agree, and I wasn't talking about that.

>IMO, all that RDF should address is the association of RDF Literal
>to the RDF Class denoting the data type. And in fact, this is
>identical to the means of associating type with any resource
>whatsoever, Literal or otherwise.
>>  ....
>>  >  > I just meant to avoid the implication that they were to be
>>  >>  interpreted as strings, since that interpretation begs
>>  the question.
>>  >>  If we can agree that XML syntax in general should not be
>>  interpreted
>>  >>  using logical canons of notational rigor, then we can
>>  leave the quote
>>  >>  marks there and not call them quotes.
>>  >
>>  >Exactly. No interpretation is going to happen in RDF.
>>  We are at cross purposes. Interpretation, in the sense I was using
>>  it, is not something that HAPPENS.
>Defining in RDF in any way that "10" maps to 'ten' within
>the scope of the data type xsd:integer is interpretation of
>the literal, and should not be done by RDF; at least not as
>part of the core model.

What do you mean by 'done by RDF'? Such a condition can certainly 
form part of a semantics for RDF, but of course that is not to say 
that it directly corresponds to any kind of process that would be set 
in motion by any kind of RDF processor or engine.

>All that RDF should do is allow one to say that "10" is a lexical
>form corresponding to some value in the value space of the
>data type, not how that mapping occurs or what the mapping is.

But consider an RDF extension like DAML+OIL, which is able to assert 
that two values are equal. An inference engine for DAML+OIL might 
well need to be able to 'know' that some piece of RDFS has as a 
semantic consequence that a literal equals - has the same value as - 
another expression. This might for example have important 
consequences for a cardinality reasoner.

>  > >They *are* strings.
>>  >Leave the quote marks to indicate they are strings.
>>  They don't need quote marks to indicate that they are strings.
>They might, if there is to be a lexical distinction in the
>notation between literals and other terms.
>E.g. in NTriples, we can (I think) write a local ID just
>as the value, with no quotes, so if we have an ID(foo)
>and a Literal(foo) then we use quotes to differentiate them
>insofar as the notation grammar is concerned:
>    _:X foo "foo" .
>The lexical form for the literal "foo" is only the three
>characters 'f', 'o', and 'o'. The quotes are a mechanism
>of the notation.
>>  The
>>  quote marks, if interpreted as genuine quotations, would indicate
>>  that those strings denoted other strings, eg the string of four
>>  characters on the next line:
>>  "10"
>>  is often understood as denoting the string of two characters
>>  on the next line:
>>  10
>>  which in turn is usually taken to denote the number ten.
>Why are we going in circles about stuff that computer science
>has solved eons ago?

Actually, this particular matter was solved by logicians about a 
century before computers were invented. But come, let us not split 

>If you need to include a significant character of the notation
>as a literal character, you escape it, and the application which
>knows how to parse that notation unescapes it during parsing.
>    "foo"     ->  ( f o o )
>    "\"foo\"" ->  ( " f o o " )
>I don't see that this is even an issue...

I agree, but I wasn't talking about character escaping, but about how 
to interpret quoted stings as referring terms, and what they refer to.

>This is just about the notation used, not about RDF Literals.
>>  It is this 'quotation' interpretation that is under discussion, and
>>  that is accepted by the S and DC conventions but not by the P(++)
>>  ones. The question doesn't arise in the X proposal, since literals in
>>  this sense are not used.

Well, it does if some of the proposals involve interpreting all 
literals as quoted strings. Which they (S and DC) indeed do.

>Are we defining how resources are typed or designing a new notation?!
>This is why I gave that verbose graph abstraction in my proposal,
>to illustrate the data model *not* some convenient notation which has
>to define a lexical grammar to be interpreted in terms of that
>Deciding whether to write
>    X ---foo---> "bar" ---type---> "bas"
>    X ---foo------> _:1:bar
>    _:1 ---type---> "bas"
>and arguing whether "bar" and the suffix 'bar' in _:1:bar
>are the same literal is of course important, but is
>not defining the actual model, it's just playing around with
>OK, to be fair, maybe I'm missing all that is being expressed
>in the mathematics going back and forth, but I'd like to see
>us get past the notation issues, choose one notation, and
>define the models in terms of it.

Right. I have been using Ntriples as the notation as far as possible 
for just this reason. I have to go to Ntriples++ when we have 
literals in subject position, just because Ntriples cannot handle 
that case. The graphs are the same, its just that we need to able to 
use nodeIds with literal nodes as well in order to keep track of 
which node is which.

>The fact that folks are trying to define new notations
>with complex terms such as _:1:bar and so forth suggests that
>we *all* are talking about a layer underneath the current
>resource-centric graph model,

The graphs are common to us all. But that three-node graph

    X ---foo---> "bar" ---type---> "bas"

can't be described using Ntriples, is all.

>and that we should just define
>the meta-structures of that lower level in terms of NTriples,
>such as I'm doing with my X proposal.
>>  >
>>  >>  Ah. So this would be OK, would it?
>>  >>
>>  >>  aa eg:prop _:x .
>>  >>  _:x xsd:integer "10"
>>  >>  _:x xsd:integer "0010"
>>  >>
>>  >>  That does make sense, I agree.
>>  >
>>  >But, just to clarify here, RDF is not determining that
>>  >these two lexical forms map to the same value in the
>>  >xsd:integer value space.
>>  It certainly is making this claim! 
>NO! NO! NO!
>It only is *asserting* that both literals "10"
>and "1010" map to the same value in the value space
>of xsd:integer

Right, that is what I said.

>  (though how could they! One denotes the
>value 'ten' and the other denotes the value 'one thousand
>and ten'!)

Well, in the above example, that was ten with two leading zeros.

>The key word above is "determine". RDF does not
>determine the equality of lexical forms, even if
>an RDF statement or construct might assert it.
>Just as rdfs:range does not determine the type of a
>value, it just asserts that the value must be of
>a certain type.

But that is what 'determining the value' means, is it not? (What else 
could it possibly mean?) I am completely unable to understand the 
distinction you are making here. Can you explain it in semantic terms?

>>  The use of the common bNode _:x
>>  asserts that there is one thing that is related to "10" and to "0010"
>>  by the same xsd:integer property.
>*What* shared bNode?!

The one called '_:x' .

>The node _:x denotes the property value,


>the object
>of the statement. That node itself has two properties
>which associate two lexical forms (literals) with
>that property value, but the literal nodes are
>not the same.

I didn't say they were the same literal; I said that this graph 
asserts that those two literals have the same value. Which is pretty 
much what you just said, as well.

>I guess I find the above model (where data types
>are properties) just a bit too wierd to follow
>One cannot achieve a merge of variant lexical forms
>which map to the same value (as suggested by the first
>representation) without knowledge about that data
>type, therefore such an approach is unnaceptable as
>it does not permit RDF to remain neutral with regards
>to data type schemes.
>Let's re-express it as follows:
>   aa eg:prop _:1:"10" .
>   aa eg:prop _:2:"1010" .
>  _:1 rdf:type xsd:integer .
>  _:2 rdf:type xsd:integer .
>Now, we have in fact two property values for the eg:prop
>property associated with aa, and each value has its
>own type and lexical form.
>And in this case, the values denoted by the two
>literals may or may not be the same value in the
>xsd:integer value space.

In that form, no; but then that is a different RDF graph, so it is 
allowed to say something different.

