RE: datatypes and MT from Patrick.Stickler@nokia.com on 2001-11-15 (w3c-rdfcore-wg@w3.org from November 2001)

From: <Patrick.Stickler@nokia.com>
Date: Thu, 15 Nov 2001 12:31:59 +0200
To: phayes@ai.uwf.edu
Cc: w3c-rdfcore-wg@w3.org
Message-ID: <2BF0AD29BC31FE46B7887732114404316217BD@trebe003.NOE.Nokia.com>
> >The literal denotes a value in a value space, just as any lexical
> >form in a lexical space of a data type denotes a value in the
> >value space of the data type.
> >
> >I thought that was crystal clear.
> 
> No, it is not clear. In fact, this is a way of stating the issue 
> under discussion: HOW should denotation in RDF be related to the 
> lexical-to-value space mapping of a datatype? We have several 
> possible answers on the table now. In some of the proposals, literals 
> denote strings.

Saying that literals denote strings is like saying that
strings denote strings. Seems like an unnecessary level
of indirection.

As I see it, a literal in an RDF serialization denotes
a lexical form of a value. Since RDF does not itself have
any built in data types, the lexical form is retained,
for interpretation later (mapping to a value space) by
some external (to RDF) application that has built in
data types.

To say that a literal "10" represents the string '10'
which represents a lexical form is about like saying
"The name of myself is Patrick"

Maybe in some contexts you'd like to say such things,
but that's not the norm.

If you want to say a literal is a string, say it, i.e.

   xxx --someValue--> "10" --rdf:type--> xsd:string .

I.e., the literal represents a lexical form for
a data type, and the interpretation of that lexical
form is up to the application.

Eh?

> >Defining in RDF in any way that "10" maps to 'ten' within
> >the scope of the data type xsd:integer is interpretation of
> >the literal, and should not be done by RDF; at least not as
> >part of the core model.
> 
> What do you mean by 'done by RDF'? Such a condition can certainly 
> form part of a semantics for RDF, but of course that is not to say 
> that it directly corresponds to any kind of process that would be set 
> in motion by any kind of RDF processor or engine.

By 'done by RDF' I meant that in order for RDF to actually
define a mapping from lexical form to value, it has to have
some built-in canonical representation of that value, which
I don't think should be done.

Unless RDF has built-in data types, any mapping expressed in
RDF is just a mapping from one lexical space to another,
such as from a non-canonical lexical space to a canonical
lexical space, which does not map to any *values*, per se,
since RDF can only deal with lexical forms.

If the lexical space and the value space happen to be
a perfect intersection, then that is a coincidental 
property of the data type, not the result of RDF
mapping lexical forms to values.

> >All that RDF should do is allow one to say that "10" is a lexical
> >form corresponding to some value in the value space of the
> >data type, not how that mapping occurs or what the mapping is.
> 
> But consider an RDF extension like DAML+OIL, which is able to assert 
> that two values are equal. An inference engine for DAML+OIL might 
> well need to be able to 'know' that some piece of RDFS has as a 
> semantic consequence that a literal equals - has the same value as - 
> another expression. This might for example have important 
> consequences for a cardinality reasoner.

Sure, but there's a difference between (a) being able to make
the statement that two lexical forms map to the same value and
thus can be considered to denote an equal value, and (b) how
one determines that such an equality relation exists.

The former does not require knowledge about the lexical space
or value space of the data type, the latter does.

Thus, if a DAML+OIL reasoner wishes to determine if some 
value denoted by a lexical form is 'greater than' some
other value denoted by another lexical form, it must 
base its determination on relations defined in RDF about
in terms of those lexical forms and their associated
data type(s) -- and those statements may be transient
and asserted by employing a (possibly external) system 
component that 'knows' about those data types and is able 
to assert the necessary relations.

Thus, the reasoning system is providing interpretation
of those literals (mapping lexical forms to values) and
based on that interpretation inferring relations (expressable
in RDF) between values denoted by those literals, but the 
actual mapping from lexical form to the value is not
expressable in RDF (because RDF has no internal canonical
representation of values, only a representation of
lexical forms).


> >Why are we going in circles about stuff that computer science
> >has solved eons ago?
> 
> Actually, this particular matter was solved by logicians about a 
> century before computers were invented. But come, let us not split 
> hairs.

I stand corrected. Though you obviously get my point. ;-)
 
> >BUT THIS HAS NOTHING TO DO WITH SOLVING THE PROBLEM OF DATA TYPING!
> 
> Well, it does if some of the proposals involve interpreting all 
> literals as quoted strings. Which they (S and DC) indeed do.

Then I would not favor their adoption. I think it adds
an unnecessary level of indirection into the interpretation
(mapping) of lexical form to value.

> Right. I have been using Ntriples as the notation as far as possible 
> for just this reason. I have to go to Ntriples++ when we have 
> literals in subject position, just because Ntriples cannot handle 
> that case. The graphs are the same, its just that we need to able to 
> use nodeIds with literal nodes as well in order to keep track of 
> which node is which.

Understood.

> >The fact that folks are trying to define new notations
> >with complex terms such as _:1:bar and so forth suggests that
> >we *all* are talking about a layer underneath the current
> >resource-centric graph model,
> 
> The graphs are common to us all. But that three-node graph
> 
>     X ---foo---> "bar" ---type---> "bas"
> 
> can't be described using Ntriples, is all.

Which is why I was getting (even more) confused when people
were trying to express graphs such as the above in NTriples
but loosing the context of the literal because node identity
was not indicated.

If we're going to use notations to express ideas, let's be
sure that the ideas are actually expressable by the notation.

> >It only is *asserting* that both literals "10"
> >and "1010" map to the same value in the value space
> >of xsd:integer
> 
> Right, that is what I said.

My apologies for misunderstanding you. Glad to see
we are in agreement here.

> >  (though how could they! One denotes the
> >value 'ten' and the other denotes the value 'one thousand
> >and ten'!)
> 
> Well, in the above example, that was ten with two leading zeros.

Ahh, right, sorry. My bad. (nasty things, typos ;-)
 
> >The key word above is "determine". RDF does not
> >determine the equality of lexical forms, even if
> >an RDF statement or construct might assert it.
> >
> >Just as rdfs:range does not determine the type of a
> >value, it just asserts that the value must be of
> >a certain type.
> 
> But that is what 'determining the value' means, is it not? (What else 
> could it possibly mean?) I am completely unable to understand the 
> distinction you are making here. Can you explain it in semantic terms?

Probably not. Though I think I may have answered this question
above. To recap, in order for a mapping from lexical form to value
to be defined in RDF, RDF must be able to provide a canonical
representation for each value, for each data type, and I don't
see that as practical. And even RDF did provide a canonical
representation for each value, that representation would still
be a lexical form to most computer systems, so the "mapping"
would simply be between a non-canonical lexical space to a
canonical lexical space, not really from a lexical space to
a value space.
 
Is that any clearer? Sorry that I can't offer you a mathematical
definition...


Patrick
Received on Thursday, 15 November 2001 05:32:19 UTC