RE: datatypes and MT from Patrick.Stickler@nokia.com on 2001-11-14 (w3c-rdfcore-wg@w3.org from November 2001)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 14 Nov 2001 13:49:01 +0200
To: phayes@ai.uwf.edu, w3c-rdfcore-wg@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C090@trebe003.NOE.Nokia.com>
> -----Original Message-----
> From: ext Pat Hayes [mailto:phayes@ai.uwf.edu]
> Sent: 14 November, 2001 03:20
> To: Stickler Patrick (NRC/Tampere)
> Cc: w3c-rdfcore-wg@w3.org
> Subject: RE: datatypes and MT
> 
> 
> ...
> >We need to keep in focus the fact that "10" is a lexical 
> representation
> >of the value, not the value.
> 
> Well, that is one of the issues being discussed. In the S proposal, 
> the value of  the literal "10" would be the string "10".

Then that would be incorrect, as the value is 'ten'.

But my point is that we should not be trying to address the
mapping from "10" to 'ten' in RDF beyond the association of
the literal "10" with a class which denotes a data type
that defines a mapping from "10" to some value in the value
space identified by that data type.

> >Yes, folks are not saying that the shoe size is a string. They
> >are expecting that lexical form to mapped to a value in a particular
> >value space.
> 
> Right, which is what the P(++) datatyping proposals try to do.

As is the X proposal.
 
> >The same is true of the lexical representation of literals in a
> >programming language.
> >
> >    protected Integer shoeSize = 10;
> >
> >is not saying the shoeSize is the character sequence '10' (even
> >though there are no quotes), but the value ten.
> 
> Right, but that lack of quotes is significant. 

It's an issue entirely within the lexical space of the data type.

RDF has its own lexical space for its "primitives" and literals
are a primitive, and we enclose literals in quotes so that they
are known to be RDF Literals, but that doesn't mean that the
use or meaning of quotes in any other lexical space to delimit
lexical forms is relevant. It's not.
 
> In LISP for example, 
> supplying the character sequence '10' as an argument indicates the 
> value ten, while supplying the character sequence ''10' indicates 
> that the value is the character sequence '10'.

It's irrelevant how LISP delimits its lexical forms.

Lexical forms (for any data type) are delimited in RDF by quotes.

That same lexical form in some other encoding (LISP code, Perl
code, C code, etc.) may have other delimiters.


> >The difference between e.g. Java and RDF is that Java actually
> >interprets the lexical forms before it uses them, but RDF just
> >holds on to them as-is.
> >
> >The mistake here is to somehow thing that RDF will interpret
> >them in any way.
> 
> Right. But I believe that nobody is making that particular mistake. 
> The discussion is about whether a literal label should be taken to 
> *denote* the string or the value it has under a datatyping scheme.

The literal denotes a value in a value space, just as any lexical
form in a lexical space of a data type denotes a value in the
value space of the data type.

I thought that was crystal clear.

Whether the actual mapping from lexical form to value is defined
in terms of RDF constructs, within RDF Space, is a seperate issue,
and one that I don't think RDF should address.

IMO, all that RDF should address is the association of RDF Literal
to the RDF Class denoting the data type. And in fact, this is
identical to the means of associating type with any resource
whatsoever, Literal or otherwise.

> ....
> >  > I just meant to avoid the implication that they were to be
> >>  interpreted as strings, since that interpretation begs 
> the question.
> >>  If we can agree that XML syntax in general should not be 
> interpreted
> >>  using logical canons of notational rigor, then we can 
> leave the quote
> >>  marks there and not call them quotes.
> >
> >Exactly. No interpretation is going to happen in RDF.
> 
> We are at cross purposes. Interpretation, in the sense I was using 
> it, is not something that HAPPENS.

Defining in RDF in any way that "10" maps to 'ten' within
the scope of the data type xsd:integer is interpretation of
the literal, and should not be done by RDF; at least not as
part of the core model.

All that RDF should do is allow one to say that "10" is a lexical
form corresponding to some value in the value space of the
data type, not how that mapping occurs or what the mapping is.
 
> >They *are* strings.
> >Leave the quote marks to indicate they are strings.
> 
> They don't need quote marks to indicate that they are strings. 

They might, if there is to be a lexical distinction in the
notation between literals and other terms.

E.g. in NTriples, we can (I think) write a local ID just
as the value, with no quotes, so if we have an ID(foo)
and a Literal(foo) then we use quotes to differentiate them
insofar as the notation grammar is concerned:

   _:X foo "foo" .

The lexical form for the literal "foo" is only the three
characters 'f', 'o', and 'o'. The quotes are a mechanism
of the notation.

> The 
> quote marks, if interpreted as genuine quotations, would indicate 
> that those strings denoted other strings, eg the string of four 
> characters on the next line:
> "10"
> is often understood as denoting the string of two characters 
> on the next line:
> 10
> which in turn is usually taken to denote the number ten.

Why are we going in circles about stuff that computer science
has solved eons ago?

If you need to include a significant character of the notation
as a literal character, you escape it, and the application which
knows how to parse that notation unescapes it during parsing.

So

   "foo"     ->  ( f o o )
   "\"foo\"" ->  ( " f o o " )

etc.

I don't see that this is even an issue...

This is just about the notation used, not about RDF Literals.

> It is this 'quotation' interpretation that is under discussion, and 
> that is accepted by the S and DC conventions but not by the P(++) 
> ones. The question doesn't arise in the X proposal, since literals in 
> this sense are not used.

BUT THIS HAS NOTHING TO DO WITH SOLVING THE PROBLEM OF DATA TYPING!

Are we defining how resources are typed or designing a new notation?!

This is why I gave that verbose graph abstraction in my proposal,
to illustrate the data model *not* some convenient notation which has
to define a lexical grammar to be interpreted in terms of that 
abstraction.

Deciding whether to write

   X ---foo---> "bar" ---type---> "bas"

or 

   X ---foo------> _:1:bar
   _:1 ---type---> "bas"

and arguing whether "bar" and the suffix 'bar' in _:1:bar
are the same literal is of course important, but is
not defining the actual model, it's just playing around with 
notations

OK, to be fair, maybe I'm missing all that is being expressed
in the mathematics going back and forth, but I'd like to see
us get past the notation issues, choose one notation, and
define the models in terms of it.

The fact that folks are trying to define new notations
with complex terms such as _:1:bar and so forth suggests that
we *all* are talking about a layer underneath the current
resource-centric graph model, and that we should just define
the meta-structures of that lower level in terms of NTriples,
such as I'm doing with my X proposal.

> >
> >>  Ah. So this would be OK, would it?
> >>
> >>  aa eg:prop _:x .
> >>  _:x xsd:integer "10"
> >>  _:x xsd:integer "0010"
> >>
> >>  That does make sense, I agree.
> >
> >But, just to clarify here, RDF is not determining that
> >these two lexical forms map to the same value in the
> >xsd:integer value space.
> 
> It certainly is making this claim!  


NO! NO! NO!

It only is *asserting* that both literals "10"
and "1010" map to the same value in the value space
of xsd:integer (though how could they! One denotes the
value 'ten' and the other denotes the value 'one thousand
and ten'!)

The key word above is "determine". RDF does not
determine the equality of lexical forms, even if
an RDF statement or construct might assert it.

Just as rdfs:range does not determine the type of a
value, it just asserts that the value must be of
a certain type.

> The use of the common bNode _:x 
> asserts that there is one thing that is related to "10" and to "0010" 
> by the same xsd:integer property.

*What* shared bNode?!

The node _:x denotes the property value, the object
of the statement. That node itself has two properties
which associate two lexical forms (literals) with
that property value, but the literal nodes are
not the same. 

I guess I find the above model (where data types
are properties) just a bit too wierd to follow
consistently.

One cannot achieve a merge of variant lexical forms
which map to the same value (as suggested by the first
representation) without knowledge about that data 
type, therefore such an approach is unnaceptable as
it does not permit RDF to remain neutral with regards
to data type schemes.

Let's re-express it as follows:

  aa eg:prop _:1:"10" .
  aa eg:prop _:2:"1010" .
 _:1 rdf:type xsd:integer .
 _:2 rdf:type xsd:integer .

Now, we have in fact two property values for the eg:prop
property associated with aa, and each value has its
own type and lexical form.

And in this case, the values denoted by the two
literals may or may not be the same value in the 
xsd:integer value space.

Patrick
Received on Wednesday, 14 November 2001 06:49:37 UTC