- From: <Patrick.Stickler@nokia.com>
- Date: Wed, 7 Nov 2001 10:50:18 +0200
- To: phayes@ai.uwf.edu
- Cc: w3c-rdfcore-wg@w3.org
> >or that once the > >interpretation takes place, the lexical form becomes irrelevant. > > ??? I'm not sure what you mean by 'takes place'. By 'interpretation' > I didn't mean to refer to a process. The lexical form is not > irrelevant, since its the only thing that determines the actual > literal value. Meaning, that, once some knowledge leaves RDF space, such as via an API into a canonical internal representation (once the literal is interpreted/parsed according to the lexical form) that lexical form is no longer relevant. > > > >The problem is that RDF does not itself provide those canonical > >internal representations for value spaces > > What canonical internal representations? Nobody has mentioned these > until now, as far as I know. I.e., in Scheme (to take one example) '10' '#x12' and '#b1010' are all different lexical variants of the same value that are all mapped to the same internal canonical representation within a given system. Once the source code is parsed, those values do not maintain their lexical representations. RDF does not define any such internal canonical representations (nor should it) therefore it must maintain the form and type relation until that knowledge leaves RDF space, at which time it many be interpreted/parsed into some internal canonical representation in the system performing that interpretation of the lexical form. > >, but preserves the lexical > >forms of such values -- hence in such cases, the data type of a > >literal is inseperable with the lexical form embodied in the literal. > > Well, its not *inseperable*. What is true is that you need both to > fully disambiguate the literal label. Well, then, of course no part of the graph is "inseperable" from any other part, but one could see the separation of literal from local type as reducing the integrity and value of the information itself, thus for practical purposes, one would hope that the fundamental representation embodied in the graph would be reasonable impervious to such separation insofar as common operations are concerned (such as my example of inferred binding of values to properties of a superordinate type). > >The mappings from literals to value spaces do not happen in RDF, > >and therefore all information needed for such mappings to take > >place must be preserved across all processes prior to actual > >interpretation. > > > >> and the debate is about various proposals for how to use > >> some form of RDF syntax to establish that association. > > > >Fair enough. I've proposed the encoding of typed data literals as > >URVs, a special class of URI explicitly intended for such purposes. > >So I guess that's one more proposal on the table... ;-) > > Right. BTW, do you have a pointer to that URV idea? http://www-nrc.nokia.com/sw/X_Values_URI.pdf I hope to have a revised, more polished version as an I-D in the coming weeks as well, but the basic ideas can be found from the link above. > >> (In my > >> proposal, these mappings are treated much like the > denotation mapping > >> in the model theory. Other proposals make these mappings > explicit as > >> rdf properties in one way or another. ) Do you agree with this > >> summary so far? > > > >Sure, but we have to ensure that those mappings remain fixed until > >interpretation, including processes which by inference or other > >means bind values to properties belonging to data types other than > >that originally defined for the value. > > I'm puzzled. You seem to be assuming that interpretation is something > that happens at some stage in processing (?) I was using > 'interpretation' in the sense of model theory. Maybe we are at cross > purposes. Likely. I can't help but think in terms of applications, being a software engineer who has to build systems to use this information. By 'interpretation' I mean parsing the lexical form into an internal representation for some system such that one can do things like compare two values. So, I have a query that attempts to find all persons with shoe size greater than 0x12 (note that the query uses a hexidecimal lexical form ;-) and I have a huge knowledge base where shoe sizes are encoded using various ontologies with values associated with various data types and encoded as literals representing various lexical forms of values, etc. and I have schemata which relate all those ontologies and data type schemes. Now, the RDF layer (e.g. triples store) and RDFS capable inference layer shouldn't need to look at the literals at all, but should be able to provide my query API with enough information so that it is able to interpret all of the values it gets bound by inference to the query property denoting shoe size, by parsing them all into a canonical form that allows comparison. That's what I meant by "fixed until interpretation". That query API can't just get the lexical forms of the values. It has to know the original data types to which those lexical forms correspond. Whether it is able to recognize those data types or parse those lexical forms is its own problem, but at least it has all the information it needs. And the acceptability of any given value bound to that query property denoting shoe size can be determined according to the range constraints defined for the property and the class relationships between the known data types, with no concern about lexical form. > > > >> One of our communication problems has been that the bare term > >> 'datatype' is used in a variety of senses (sometimes for the value > >> domain, sometimes for the mapping, etc.), so perhaps I > had better try > >> to avoid it. I have used examples like octal, decimal and so on as > >> illustrative examples only to emphasize that two > different datatype > >> mappings may share the same value space. > > > >I agree that there is a problem with the terminology. The > term datatype > >(as I use it, and also as I understand XML Schema to use it) defines > >a given value space. It may also, for a given system/context, define > >one or more lexical forms by which values in that value space may > >be expressed. > > Ah. My understanding was that a datatype corresponded to a mapping > from a lexical space to a value space. I see why we have been having > some trouble communicating. And it may be that I am focusing solely on the graph and not the MT interpretation of the graph. And this is where my comments about "canonical internal representations" come from. You can't operate within a value space *alone* unless you have a canonical internal representation. So when you talk about mapping from a lexical space to a value space, you seem to imply the existence of such a canonical representation for such data types, and that's where I get lost. For XML Schema, insofar as I understand it, a (simple) datatype is the value space and it has an explicit lexical space defined and that lexical space serves as the canonical representation for all serializations of values in that value space. RDF inherits that definition of datatype, apparently. And that's how I've been thinking. Though I'll admit that I sometimes get lost in the MT space and "interpretations" which are just "possible" but not absolute. I guess I'm too much of a nuts-n-bolts kinda guy ;-) > >Thus, decimal, hexidecimal, octal, binary, etc. are all possible > >lexical forms (notations) of integers (and other possible > data types), > >and are not themselves data types. It is IMO incorrect to equate a > >given lexical form or notation as a data type. A data type defines > >primarily a value space. > > Well, never mind who is in the right, but let us try to agree on some > terminology we can all understand. If a datatype is a value space, > what do you call the mapping from lexical to value spaces? Parsing and compilation ;-) There is far more intersection of value spaces than lexical forms. An "integer" to XML Schema is the same value space as a C int, a scheme Integer, a Smalltalk Integer, etc. etc. but not all of these systems define the same lexical space for representing values in serializations. Perhaps in the MT you can talk of value spaces irrepective of lexical forms and without positing any particular internal canonical representation. But that makes it (for me at least) very hard to see how that relates to using RDF encoding knowlege in actual applications. The problem is that, unlike program code, RDF typed literals are not parsed and therefore, presumptions such as that the range definition is sufficient for interpretation of an non-locally typed value are wrong. For a programming language, that would be fine, but not for RDF, because no parsing into a cononical representation compatible with all superordinate types has occurred. > >Lexical forms are only a means to an end. > >Data types are, in general, portable across systems and platforms > >even if their lexical forms are not. > > Ah, I profoundly disagree for RDF. There is no 'end' in this sense; > nothing gets compiled; its not a programming language. All there is > is the syntax, and all you ever get back from any kind of inference > is more syntax. There aren't any inner canonical forms, and no code > gets interpreted. EXACTLY! This my point, as just expressed above. Sorry I was (again) unclear. In most other systems, lexical forms *are* merely a means to an end, mapped some internal representation of a value. But RDF does not provide any such transformation, so presumptions about how range assertions work as descriptive mechanisms may be invalid, because they seem to presume that lexical forms don't matter, yet they do. > >I think we're mostly in agreement, I agree. > Yes, sorry I got testy. I'm sure I am to blame, being the &$@%*#$! that I am ;-) > I will blame the pneumonia, its got to be > useful for something. Well, in the interest of getting you back on your feet, I won't respond to any of your posts for at least a week, OK? > >though I still am concerned > >about maintaining the inseparability of lexical form and data > >type for data typed literals. > > I don't mind letting them get separate as long as there is always a > way to get them back together. Inside a given graph there is, but > your point about queries has me worried that this isn't good enough. Functionally speaking, if one can always get them back together again reliably, then were they ever separate? ;-) Cheers, Patrick (and stop reading these lists and get some rest! ;-)
Received on Wednesday, 7 November 2001 03:50:31 UTC