RE: Literals: lexical spaces and value spaces from Pat Hayes on 2001-11-06 (w3c-rdfcore-wg@w3.org from November 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Mon, 5 Nov 2001 21:10:12 -0600
To: Patrick.Stickler@nokia.com
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101035b80cffdafdee@[65.212.118.166]>
>  > But my view is that the heart of RDF that really matters is
>>  the underlying
>>  abstraction. 
>
>The heart yes, but it takes more than a heart to live,
>move, and get anything done ;-)
>
>And if you can't communicate the results of that wonderous
>layer of abstraction to other systems or humans, via consistent
>lexical forms, then what good is it?

You could ask them to read the specs, or is that too much to expect?

>This touches on the whole big issue of round-trip integrity
>of lexical forms through RDF systems. QNames already get
>trashed. Hey, why not lexical forms for primitive data types
>too...  I put in "10" and get out "1010". I expect "10"
>and get "0xA".
>

?? But nobody is suggesting this, are they?

>Just because "magic" happens inside an RDF graph does not
>mean that the graph is in a vacume, or that the identity
>of resources or values embodied in a graph should not have a
>persistent representation beyond the "RDF" layer.
>
>Oftentimes it feels that RDF stands for "Reality Distortion
>Field" as when one is inside RDF space, it is very difficult
>to see the rest of the universe clearly and constants which
>permeate the rest of the XML world cease to exist or have
>different meaning ;-)

I sometimes wonder at the mindset of people for whom XML is the rest 
of the universe.

>The point of saying that some literal is an xsd:integer is
>so that a system can *parse* the literal accordingly, or
>can write the lexical form of an internal value accordingly.

We agree on that, to be sure. But it is more than parsing; once one 
knows that a literal is an xsd:integer, one can compute its actual 
numerical value, not just parse it.

>Otherwise, how can one make comparisons of values or perform
>any other useful operation on those values?
>
>Yes, rdf:range and other logical operations also make use
>of data type information, but that's only one side of the
>coin. The other side is lexical representation, and those
>two are inseparable in a real-world system.

Beautifully put. That inseparability is exactly captured, i would 
claim, by the semantic conditions on a datatype interpretation in the 
MT extension, which are: (1) that all urirefs denoting datatypes 
shall be properly interpreted; in particular, when used as class 
names, their extensions shall correspond to the value space of the 
datatype; (2) that if any literal node can be inferred to be in a 
datatype class, then the literal at that node shall be interpreted to 
denote the value corresponding to it under the datatype mapping; in 
other words, the lexical meaning of the literal must conform to the 
datatype mapping associated with the literal node. How that 
association gets done is up to an RDFS class reasoner, which is what 
it ought to be good at. How the value gets computed is up to some 
external datatype engine which knows more about the relevant 
datatyping rules than RDFS wants to know.

>
>>  I tend to think that there will be some things that RDF
>>  expresses (using whatever syntax) without reference to
>>  literal values, and
>>  that the data typing of such things should not be constrained
>>  by lexical
>>  (character string) representation.
>
>Logical inferences are not limited to lexical representation,
>but certainly interchange of knowledge is. RDF cannot work
>in a vacume totally disconnected from serializations, even
>if the "neat" stuff happens well above the lexical layer.
>
>>  I would be more comfortable with a scheme that defined value spaces
>>  independently of lexical representation, and then provided
>>  mappings for
>>  lexical representations.
>
>Fair enough. And anyone is free to define an upper-ontology of
>purely abstract data types and ground data schemes such as XML
>Schema and such in that. Fine. But you're not going to get much
>work done or achieve portability of knowledge across the semantic
>web unless you also have a reliable (brutally reliable) layer
>to handle the lexical representation of values and the association
>of those values with explicit data types for later interpretation.
>
>>  My particular beef with XML schema datatypes concerns non-integral
>>  numbers. ...  we found rational numbers to be
>>  useful in CONNEG
>>  work, and they have been defined for CC/PP.
>>
>>  [(**) OK, you could devise exotic schemes where this isn't
>>  the case, but
>>  for practical purposes I still claim that rational numbers
>>  underpin just
>>  about all use of numbers in computers.]
>
>Fine. Define that "ideal" upper-level data type scheme and
>promote systems to use it internally. And maybe even define your own
>set of lexical forms for values (possibly allowing a broader
>range of notations than XML Schema) and relate the XML Schema
>data scheme to this more "correct" scheme.
>
>But at the end of the day, if I have a sytem that is expecting
>an xsd:integer, and someone gives me "0x18", I should be able
>to complain.

I agree. In fact, I would expect something external to RDF to 
complain. The API between it and the RDF engine would have told it to 
expect something in the lexical space of xsd:integer, and then handed 
it "0x18".
>
>There are many programming languages and other formal systems
>that have extremely rich and complex data type schemes, but
>*every* one of them also defines a lexical space for those data
>types. It is unavoidable so long as we must serialize our values.
>
>So, in reality, it doesn't matter whether folks use XML Schema,
>or your "ideal" and "correct" data type scheme, they'll still have
>to enter lexical forms for values, and so the same mechanisms for
>associating data types to literals should work for all schemes,
>and *every* scheme which might be used to classify literals is
>going to have to have a lexical space.
>
>That seems to mean, at least to me, that the value space and
>lexical space are inseparable insofar as rdf:type is concerned.

It suggests to me that RDF ought to be able to associate literal 
labels - which are character strings, in the last analysis, like all 
labels - with datatype names (urirefs, I presume), and perhaps to 
request that any such string be suitably checked for conformity. 
Possible responses would include: unknown datatype name/illegal 
literal/legal, but non-normal, and here is normal form/OK.

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Monday, 5 November 2001 22:10:16 UTC