RE: Literals: lexical spaces and value spaces from Patrick.Stickler@nokia.com on 2001-11-05 (w3c-rdfcore-wg@w3.org from November 2001)

From: <Patrick.Stickler@nokia.com>
Date: Mon, 5 Nov 2001 15:29:33 +0200
To: Graham.Klyne@MIMEsweeper.com
Cc: w3c-rdfcore-wg@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C061@trebe003.NOE.Nokia.com>
> But my view is that the heart of RDF that really matters is 
> the underlying 
> abstraction.  

The heart yes, but it takes more than a heart to live,
move, and get anything done ;-)

And if you can't communicate the results of that wonderous
layer of abstraction to other systems or humans, via consistent
lexical forms, then what good is it?

This touches on the whole big issue of round-trip integrity
of lexical forms through RDF systems. QNames already get
trashed. Hey, why not lexical forms for primitive data types
too...  I put in "10" and get out "1010". I expect "10"
and get "0xA".
 
Just because "magic" happens inside an RDF graph does not
mean that the graph is in a vacume, or that the identity
of resources or values embodied in a graph should not have a 
persistent representation beyond the "RDF" layer.

Oftentimes it feels that RDF stands for "Reality Distortion
Field" as when one is inside RDF space, it is very difficult
to see the rest of the universe clearly and constants which
permeate the rest of the XML world cease to exist or have
different meaning ;-)

The point of saying that some literal is an xsd:integer is
so that a system can *parse* the literal accordingly, or 
can write the lexical form of an internal value accordingly.
Otherwise, how can one make comparisons of values or perform
any other useful operation on those values?

Yes, rdf:range and other logical operations also make use
of data type information, but that's only one side of the
coin. The other side is lexical representation, and those
two are inseparable in a real-world system.

> I tend to think that there will be some things that RDF 
> expresses (using whatever syntax) without reference to 
> literal values, and 
> that the data typing of such things should not be constrained 
> by lexical 
> (character string) representation.

Logical inferences are not limited to lexical representation,
but certainly interchange of knowledge is. RDF cannot work
in a vacume totally disconnected from serializations, even
if the "neat" stuff happens well above the lexical layer.
 
> I would be more comfortable with a scheme that defined value spaces 
> independently of lexical representation, and then provided 
> mappings for 
> lexical representations.

Fair enough. And anyone is free to define an upper-ontology of
purely abstract data types and ground data schemes such as XML
Schema and such in that. Fine. But you're not going to get much
work done or achieve portability of knowledge across the semantic
web unless you also have a reliable (brutally reliable) layer
to handle the lexical representation of values and the association
of those values with explicit data types for later interpretation.

> My particular beef with XML schema datatypes concerns non-integral 
> numbers. ...  we found rational numbers to be 
> useful in CONNEG 
> work, and they have been defined for CC/PP.
> 
> [(**) OK, you could devise exotic schemes where this isn't 
> the case, but 
> for practical purposes I still claim that rational numbers 
> underpin just 
> about all use of numbers in computers.]

Fine. Define that "ideal" upper-level data type scheme and
promote systems to use it internally. And maybe even define your own
set of lexical forms for values (possibly allowing a broader
range of notations than XML Schema) and relate the XML Schema 
data scheme to this more "correct" scheme.

But at the end of the day, if I have a sytem that is expecting
an xsd:integer, and someone gives me "0x18", I should be able
to complain. 

There are many programming languages and other formal systems
that have extremely rich and complex data type schemes, but
*every* one of them also defines a lexical space for those data
types. It is unavoidable so long as we must serialize our values.

So, in reality, it doesn't matter whether folks use XML Schema,
or your "ideal" and "correct" data type scheme, they'll still have
to enter lexical forms for values, and so the same mechanisms for
associating data types to literals should work for all schemes,
and *every* scheme which might be used to classify literals is
going to have to have a lexical space.

That seems to mean, at least to me, that the value space and
lexical space are inseparable insofar as rdf:type is concerned.

> >Insofar as associating a data type with a literal, both the value
> >space and lexical space are relevant.
> 
> Yes, of course.
>
> >Insofar as making logical inferences about the values themselves
> >and their relation to other values, compliance with range 
> constraints,
> >etc. etc. then the lexical space is not relevant.
> 
> Quite.

I general, it seems, we are in agreement on the essentials, though
perhaps not so much on their priority  ;-)

Cheers,

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Monday, 5 November 2001 08:29:42 UTC