RE: Reject change to rdf:value from Patrick.Stickler@nokia.com on 2001-11-07 (w3c-rdfcore-wg@w3.org from November 2001)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 7 Nov 2001 10:50:18 +0200
To: phayes@ai.uwf.edu
Cc: w3c-rdfcore-wg@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C071@trebe003.NOE.Nokia.com>
> >or that once the
> >interpretation takes place, the lexical form becomes irrelevant.
> 
> ??? I'm not sure what you mean by 'takes place'. By 'interpretation' 
> I didn't mean to refer to a process. The lexical form is not 
> irrelevant, since its the only thing that determines the actual 
> literal value.

Meaning, that, once some knowledge leaves RDF space, such as
via an API into a canonical internal representation (once the
literal is interpreted/parsed according to the lexical form)
that lexical form is no longer relevant.

> >
> >The problem is that RDF does not itself provide those canonical
> >internal representations for value spaces
> 
> What canonical internal representations? Nobody has mentioned these 
> until now, as far as I know.

I.e., in Scheme (to take one example) '10' '#x12' and '#b1010' are
all different lexical variants of the same value that are all mapped
to the same internal canonical representation within a given system.
Once the source code is parsed, those values do not maintain their
lexical representations.

RDF does not define any such internal canonical representations (nor
should it) therefore it must maintain the form and type relation
until that knowledge leaves RDF space, at which time it many be
interpreted/parsed into some internal canonical representation in
the system performing that interpretation of the lexical form.
 
> >, but preserves the lexical
> >forms of such values -- hence in such cases, the data type of a
> >literal is inseperable with the lexical form embodied in the literal.
> 
> Well, its not *inseperable*. What is true is that you need both to 
> fully disambiguate the literal label.

Well, then, of course no part of the graph is "inseperable" from any
other part, but one could see the separation of literal from local
type as reducing the integrity and value of the information itself,
thus for practical purposes, one would hope that the fundamental
representation embodied in the graph would be reasonable impervious
to such separation insofar as common operations are concerned (such
as my example of inferred binding of values to properties of a
superordinate type).

> >The mappings from literals to value spaces do not happen in RDF,
> >and therefore all information needed for such mappings to take
> >place must be preserved across all processes prior to actual
> >interpretation.
> >
> >>  and the debate is about various proposals for how to use
> >>  some form of RDF syntax to establish that association.
> >
> >Fair enough. I've proposed the encoding of typed data literals as
> >URVs, a special class of URI explicitly intended for such purposes.
> >So I guess that's one more proposal on the table... ;-)
> 
> Right. BTW, do you have a pointer to that URV idea?

http://www-nrc.nokia.com/sw/X_Values_URI.pdf
 
I hope to have a revised, more polished version as an I-D in the
coming weeks as well, but the basic ideas can be found from the
link above.

> >>  (In my
> >>  proposal, these mappings are treated much like the 
> denotation mapping
> >>  in the model theory. Other proposals make these mappings 
> explicit as
> >>  rdf properties in one way or another. ) Do you agree with this
> >>  summary so far?
> >
> >Sure, but we have to ensure that those mappings remain fixed until
> >interpretation, including processes which by inference or other
> >means bind values to properties belonging to data types other than
> >that originally defined for the value.
> 
> I'm puzzled. You seem to be assuming that interpretation is something 
> that happens at some stage in processing (?) I was using 
> 'interpretation' in the sense of model theory. Maybe we are at cross 
> purposes.

Likely. I can't help but think in terms of applications, being a
software engineer who has to build systems to use this information.

By 'interpretation' I mean parsing the lexical form into an internal
representation for some system such that one can do things like
compare two values. So, I have a query that attempts to find all
persons with shoe size greater than 0x12 (note that the query
uses a hexidecimal lexical form ;-)  and I have a huge knowledge
base where shoe sizes are encoded using various ontologies with
values associated with various data types and encoded as literals
representing various lexical forms of values, etc. and I have schemata
which relate all those ontologies and data type schemes.

Now, the RDF layer (e.g. triples store) and RDFS capable inference
layer shouldn't need to look at the literals at all, but should
be able to provide my query API with enough information so that it
is able to interpret all of the values it gets bound by inference
to the query property denoting shoe size, by parsing them all into
a canonical form that allows comparison.

That's what I meant by "fixed until interpretation". That query
API can't just get the lexical forms of the values. It has to know
the original data types to which those lexical forms correspond.
Whether it is able to recognize those data types or parse those
lexical forms is its own problem, but at least it has all the
information it needs.

And the acceptability of any given value bound to that query
property denoting shoe size can be determined according to
the range constraints defined for the property and the class
relationships between the known data types, with no concern about
lexical form.

> >
> >>  One of our communication problems has been that the bare term
> >>  'datatype' is used in a variety of senses (sometimes for the value
> >>  domain, sometimes for the mapping, etc.), so perhaps I 
> had better try
> >>  to avoid it. I have used examples like octal, decimal and so on as
> >>  illustrative examples only to emphasize that two 
> different datatype
> >>  mappings may share the same value space.
> >
> >I agree that there is a problem with the terminology. The 
> term datatype
> >(as I use it, and also as I understand XML Schema to use it) defines
> >a given value space. It may also, for a given system/context, define
> >one or more lexical forms by which values in that value space may
> >be expressed.
> 
> Ah. My understanding was that a datatype corresponded to a mapping 
> from a lexical space to a value space. I see why we have been having 
> some trouble communicating.

And it may be that I am focusing solely on the graph and not the
MT interpretation of the graph.

And this is where my comments about "canonical internal representations"
come from. You can't operate within a value space *alone* unless you
have a canonical internal representation. So when you talk about
mapping from a lexical space to a value space, you seem to imply the
existence of such a canonical representation for such data types,
and that's where I get lost.

For XML Schema, insofar as I understand it, a (simple) datatype is the 
value space and it has an explicit lexical space defined and that lexical
space serves as the canonical representation for all serializations of
values in that value space. RDF inherits that definition of datatype,
apparently. And that's how I've been thinking.

Though I'll admit that I sometimes get lost in the MT space and
"interpretations" which are just "possible" but not absolute. I
guess I'm too much of a nuts-n-bolts kinda guy ;-)

> >Thus, decimal, hexidecimal, octal, binary, etc. are all possible
> >lexical forms (notations) of integers (and other possible 
> data types),
> >and are not themselves data types. It is IMO incorrect to equate a
> >given lexical form or notation as a data type. A data type defines
> >primarily a value space.
> 
> Well, never mind who is in the right, but let us try to agree on some 
> terminology we can all understand. If a datatype is a value space, 
> what do you call the mapping from lexical to value spaces?

Parsing and compilation ;-)

There is far more intersection of value spaces than lexical forms. An
"integer" to XML Schema is the same value space as a C int, a scheme
Integer, a Smalltalk Integer, etc. etc. but not all of these systems
define the same lexical space for representing values in serializations.

Perhaps in the MT you can talk of value spaces irrepective of
lexical forms and without positing any particular internal
canonical representation. But that makes it (for me at least)
very hard to see how that relates to using RDF encoding knowlege
in actual applications.

The problem is that, unlike program code, RDF typed literals are
not parsed and therefore, presumptions such as that the range
definition is sufficient for interpretation of an non-locally
typed value are wrong. For a programming language, that would
be fine, but not for RDF, because no parsing into a cononical
representation compatible with all superordinate types
has occurred.

> >Lexical forms are only a means to an end.
> >Data types are, in general, portable across systems and platforms
> >even if their lexical forms are not.
> 
> Ah, I profoundly disagree for RDF. There is no 'end' in this sense; 
> nothing gets compiled; its not a programming language. All there is 
> is the syntax, and all you ever get back from any kind of inference 
> is more syntax. There aren't any inner canonical forms, and no code 
> gets interpreted.

EXACTLY! This my point, as just expressed above. Sorry I was (again)
unclear.

In most other systems, lexical forms *are* merely a means to an end,
mapped some internal representation of a value. But RDF does not
provide any such transformation, so presumptions about how range
assertions work as descriptive mechanisms may be invalid, because
they seem to presume that lexical forms don't matter, yet they do.

> >I think we're mostly in agreement,

I agree.
 
> Yes, sorry I got testy. 

I'm sure I am to blame, being the &$@%*#$! that I am ;-)

> I will blame the pneumonia, its got to be 
> useful for something.

Well, in the interest of getting you back on your feet, I won't
respond to any of your posts for at least a week, OK?

> >though I still am concerned
> >about maintaining the inseparability of lexical form and data
> >type for data typed literals.
> 
> I don't mind letting them get separate as long as there is always a 
> way to get them back together. Inside a given graph there is, but 
> your point about queries has me worried that this isn't good enough.

Functionally speaking, if one can always get them back together
again reliably, then were they ever separate? ;-)

Cheers,

Patrick

(and stop reading these lists and get some rest! ;-)
Received on Wednesday, 7 November 2001 03:50:31 UTC