- From: <Patrick.Stickler@nokia.com>
- Date: Wed, 14 Nov 2001 12:06:01 +0200
- To: phayes@ai.uwf.edu
- Cc: w3c-rdfcore-wg@w3.org
> -----Original Message----- > From: ext Pat Hayes [mailto:phayes@ai.uwf.edu] > Sent: 14 November, 2001 01:44 > To: Stickler Patrick (NRC/Tampere) > Cc: w3c-rdfcore-wg@w3.org > Subject: Re: The X Datatype Proposal > > > > Definition of X Proposal, with examples > > .... > > >GLOSSARY OF TERMS > > > >representation space > > > > A set of concrete representations mapping to values in a > > value space which facilitate automated operations > > in terms of those values -- e.g. the reification of > > a value space within an computer system > > If I follow you, this is what I was calling a datype mapping, ie a > mapping from a domain of lexical literal forms into a set of literal > values; an example might be the standard mapping from decimal > numerals to natural numbers, right? Right. But I don't see how it is possible (or even useful) to try to define such a mapping in or for RDF, because to do so requires defining a canonical representation for all values in a given value space which means RDF having its own native, internal data type scheme. Since RDF itself is not an application, and applications interpret RDF encoded data, all that can be accomplished is a mapping from lexical space to canonical lexical space, which will still require a mapping from that RDF defined canonical lexical space into the internal representation space of an application. I think that by trying to define that latter mapping, we are stepping outside the reasonable bounds of "RDF Space". > >canonical lexical space > > > > A lexical space where each value in the value space > > has only one possible representation in the lexical space > > I fail to follow the distinction between 'representation' and > 'lexical' in your usage. A representation need not (necessarily) be a lexical representation. It could be e.g. a binary value within a computer system. I.e. the values are not represented by lexical forms encoded as strings which must be parsed and interpreted to obtain the value. It *might* be a lexical representation, but not necessarily. All internal representations in a computer are canonical, in that any given value has but one realization in the system for a given value space, and in fact that realization may serve as the intersection of several value spaces. A lexical representation need not be canonical. E.g. "05" and "5" are both lexical representations that map to the integer value 'five' but the internal representation in my computer may be the sequence of binary digits '101'. A given application could e.g. use 'lit:' defined URVs for its representation space, since such URVs are required to define canonical lexical spaces. > >data type > > > > An explicit lexical space whose members map to > > values in an explicit value space > > > >(RDF) literal > > > > A string > > > >typed (RDF) literal > > > > A lexical form > > > >local type > > > > A data type associated directly with an occurrence of a > > value serving as the object of a statement > > 1. I do not know what 'associated directly' means. {some literal} rdf:type {some data type} . > 2. Why is the datype - a *lexical* space - associated with the > occurrence of a *value* ?? Sorry, bad choice of words. "value" here means "property value" not "member of a data type value space". Apologies. > >prescriptive range > > > > A range constraint for a particular predicate > defining a global > > type which all local types for all values must be > equivalent to > > (either identical to, or a subclass of, the defined > range class) > > I see no difference here between prescriptive and descriptive. The > former seems to be the same as the latter with the provisio added > that everything must be consistent; but that is a vacuous condition > in an assertional language. There is a *huge* difference. It's as significant as the difference between XML well formedness and XML validity. Just because an instance is well formed, does not mean it is valid. Just because some literal is assigned a type does not mean that the type is acceptable. The ambiguity arises here because rdfs:range is used for *both* purposes, depending on context, to assign a type to a literal or to constrain the type of a literal. I.e. Context Application ---------------------------------------------------------- Local type + property range Prescriptive (type ~ range) Local type only n/a Property range only Descriptive (range -> type) I don't know how to explain it any more clearly than that. The difference is significant. Perhaps someone else who groks this can offer a better explaination (maybe in mathematical terms). > >node facet > > > > A primitive property of a graph node serving as the > > label of an arc > > ?? What about two different arcs coming out of a single node? I don't > see any utility to this idea of a 'facet'. The reason for calling node properties "facets" is to distinguish them from RDF properties in general. Facets are primitives of the graph model, not RDF properties that are defined by RDF constructs or governed by RDFS property relations. You can't e.g. relate facets via rdfs:subPropertyOf. Facets are not members of the class rdfs:Property. There is no problem with a given node having multiple facets, but the specific facets for each type of graph node are fixed in the node model. You cannot define arbitrary facets from arbitrary ontologies. It is a bounded set defined by the model. The example implementations for Java and Relation Tables should make this quite clear. > >LNode > > > > A node representing a resource labeled by an RDF Literal > > > >UNode > > > > A node representing a resource labeled by a URI Reference > > > >SNode > > > > A node representing an RDF Statement > > Interesting, I was not aware there were any such nodes. There are, in my proposed model. This model extends the concept of bNodes to a taxonomy of graph nodes which provide the basis for interpretation. SNodes facilitate the reification and qualification of statements, as well as provide a basis for constraining the behavior of query and inference processes in the interest of preserving critical relations between literals (LNodes) and local type definitions or original statement properties necessary for their reliable interpretation. UNodes facilitate the concise definition of compression operations which are critical for efficient storage and interaction of RDF encoded knowledge. > >literal match > > > > The binding of a statement to a query where the > statement and > > query are expressed in the same vocabulary and in > terms of the > > same data typing scheme > > We don't really have any notion of 'query' yet, other than in terms > of entailment. But we need one, IMO, at least insofar as constraints on the binding of property values to superordinate properties by inference -- so that critical context needed for interpretation of property values is not lost. This proposal offers a minimal but sufficient definition of such constraints. > >Typed literals constitute lexical forms within a given lexical > >space and which map to values in a given value space. > > > >The proper interpretation of a typed literal requires both the > >lexical form and the identity of the lexical and value space for > >which the lexical form is expressed. > > It also requires the mapping between them; what you called the > representation space and I earlier called the datatype mapping. No. RDF must avoid defining such a mapping itself. See my arguments in my recent posting > >Separation of a lexical form from either the lexical space or > >value space for which it was originally expressed renders it > >uninterpretable in a reliable manner. > > That isn't obvious. OK, let me try (again) to make it obvious. If we have _:X _:someSubProperty "12" . _:someSubProperty rdfs:range foo:hexInt . foo:hexInt rdfs:subClassOf xsd:integer . _:someSubProperty rdfs:subPropertyOf _:someSuperProperty . _:someSuperProperty rdfs:range xsd:integer . and we have a query _:X _:someSuperProperty ?V . which binds ?V to "12", implying the statement _:X _:someSuperProperty "12" . and then an application attempts to interpret the literal "12" in terms of the type defined for someSuperProperty by rdfs:range, namely xsd:integer, it will get the value 'twelve' but in fact, the value is actually 'eighteen' !!! Let's take a similar example, but with more focus on lexical space compatibility: If we have _:X _:someSubProperty "#x12" . _:someSubProperty rdfs:range scm:integer . scm:integer rdfs:subClassOf xsd:integer . _:someSubProperty rdfs:subPropertyOf _:someSuperProperty . _:someSuperProperty rdfs:range xsd:integer . (note that Scheme integers support lexical representations in various base notations, not just decimal) and we have a query _:X _:someSuperProperty ?V . which binds ?V to "12", implying the statement _:X _:someSuperProperty "#x12" . and then an application attempts to interpret the literal "#x12" in terms of the type defined for someSuperProperty by rdfs:range, namely xsd:integer, it will get a parse error, as "#x12" is not a member of the lexical space for xsd:integer. Does that help make it a bit more obvious? > >The rdfs:range property may function as either prescriptive > >or descriptive, depending on the presence or absence of a local > >type for the object of a statement. > > Again, I fail to see the meaning of this distinction. See discussion above, and please, anyone else feel free to jump in here to explain this distinction better than I am, as it's a significant distinction and if we don't all understand it, we will not arrive at a reasonable solution. > >In order for rdfs:range to function prescriptively, there must > >be both: > >a. a range value defined for the property of a statement > >b. a local type defined for the object of the statement > > > >In the absence of a local type, and in the presence of a range > >definition for a given property, the type of the object of a > statement > >is taken to be that defined as the range of the property. > > And in the presence of a local type, it is taken to be the local > type, provided that is consistent with the range statement, right? It is taken as the local type, regardless of the range statement. A statement is a statement is a statement, and whether that statement is acceptable in a given context does not effect the knowledge embodied in that statement. If I say that "green" is of type xsd:lang, it may be wrong, but the statement must be preserved, and the type that I give to the literal must be taken into account in all processing. The rdfs:range *constraint*, in the context of the presence of a local data type, allows for one to determine the suitability of such local typing, not whether the typing is defined at all. See my table above showing the descriptive vs. prescriptive application or rdfs:range based on the presence or absence of a locally defined type. > The inferences involved are the same in both cases: all the > information that can be obtained about the datatype of the literal, > by any means, local or global, is combined, provided it is > consistent. (If it isn't consistent, something is wrong. ) You are simply missing the critical distinction between declaration and constraint. These are not the same. > >Query processes, while not explicitly defined by the RDF > specification, > >should be taken into account with regards to the representation and > >interpretation of RDF encoded knowledge. > > > >Query processes which employ inference based on rdfs:subPropertyOf > >relations may bind objects to predicates which are superordinate to > >the predicate of the original statement. > > > >Query processes which employ inference based on rdfs:subClassOf > >relations may bind literals to types which are superordinate to > >the type originally defined for the literals. > > > >Query processes which bind a non-locally typed literal to a > superordinate > >predicate different from that of the original statement and which > >may have a range defined which differs from the range defined > >for the original predicate effectively separate the lexical form > >embodied in that literal from the lexical space for which it was > >originally expressed, rendering it uninterpretable in a reliable > >manner. > > Again, that begs some important questions. Yes, some *very* important questions. Namely, how do we preserve the relations between literal and locally defined type or untyped literal and the range defined for the property of the original statement of which the literal is the object. This statement-centric based model provides the basis for this, and the above constraints ensure that this critical information is never lost. And the same representation and mechanisms that provide for "type safety" also provide for qualification of statements. A pretty good bargain if you ask me (but of course I'm biased ;-) > >The basis for the graph representation, and all operations and > >interpretations, should be the explicit reification of the > >statement. > > NO!! I refuse to have anything to do with a proposal that requires > global reification just to handle literals. It is unworkable, > impossibly baroque, incompatible with all known uses of RDF > (including DAML ) and with XML, and semantically confused. Eh? I think you're having a "knee jerk" reaction here... Are you telling me that one cannot derive the present resource-centric graph representation from this model? Is not the foundation of the RDF conceptual model based on the statement? How is this model more baroque than the present graph model which requires *two* representations for each statement just to reify the statement, one that is resource centric and one that is statement centric? And it doesn't require global reification in the sense of reification per the current graph model, which I agree would result in a grossly baroque and obese graph. And it's not just for handling typing of literals, the same model addresses that, but also the (IMO critical) issues of statement qualification (scope, source, authority, etc.) which I'm sure is of great interest to the community at large. And it is *NOT* incompatible with any existing RDF applications as it is trivial to provide a logical resource-centric interpretation of this model per the current graph model. I.e. application ---------------------------- resource-centric API ---------------------------- statement-centric model Thus, it is not getting in between the current RDF model and current applications, but providing a foundation below the current resource-centric graph "view" that provides a better (IMMHO) basis for addressing the issues of data typing and statement qualification. Finally, from the perspective of a software engineer who has to make all this stuff work, it is *MUCH MORE* workable than the present model and provides the explicit mechanisms by which disparate applications can have a standardized and portable solution for interchange, query behavior, type integrity, and even shared, distributed knowledge bases. The resource-centric view of the present RDF model is useful for humans, surely, and we can continue to think in terms of that view, but a statement-centric model is IMO a much better foundation for RDF to address the many important issues that it is presently faced with. I hope that my examples for statement qualification and graph compression bear that out. > >An RDF graph should represent the statements which > >constitute knowledge, > > Quite. Not statements that *describe* the statements that > represent knowledge. The proposed graph model does not make statements, it represents statements. An SNode is not a statement about an RDF Statement, it is the model of an RDF Statement. > There is a well-known dodge referred to in Krep circles as 'escaping > to the metalevel'. When things get awkward, just *describe the > syntax* rather than trying to get the meaning straight. That's *not* what this proposal does. Sorry. Nope. Read it again. It simply inverts the explicit/implicit relation of the resource-centric view and statement-centric view. I.e. It does not add an additional meta-level not already defined by the RDF conceptual model for statement reification, it just adopts reified statements as the key representation of knowledge. > Syntax is > usually better-behaved than meanings, so it will be easier. However, > this doesn't solve the problems, it just takes out a kind of > intellectual loan. In order to be of actual inferential use, > something is going to have to figure out what to actually DO with the > expressions that you are now describing. I believe I've addressed that with regards to qualification of statements and constraints on query behavior. The proposed model is precisely intended to allow us to more easily figure out what to DO with the knowledge. > >and the present RDF graph model should be > >seen as a higher level resource-centric view or interpretation > >of that underlying statement-centric graph. > > > >Thus, rather than the present graph representation: > > > > [urn:foo] --- urn:someProperty ---> "bar" > > > >we should have instead, for every statement, a canonical > >underlying representation as follows: > > > > [ ] > > | > > ... > > I rest my case. I don't see that you have a case. Not in terms of your comments here. That first example was an abstraction, and in fact is what is embodied in the resource-centric representation. And in fact is very similar to the knowledge embodied in your P++ proposal! E.g. <urn:foo> urn:someProperty "bar" . implies [ nodeID "1"; label <urn:foo> ] [ nodeID "2"; label urn:someProperty ] [ nodeID "3"; label "bar" ] . which expands in to essentially the same abstraction (apart from node types): [ ] | ---- ID ----------> 1 | ---- subject -----> [ ] | | | ------ ID ------> 2 | | | ------ label ---> <urn:foo> | ---- predicate ---> [ ] | | | ------ ID ------> 3 | | | ------ label ---> <urn:someProperty> | -----object ------> [ ] | ------ ID ------> 4 | ------ label ---> "bar" Thus, we're really talking about comparable models, the key difference being that in my view, the explicit statement should be the basis for the model, rather than leaving it implicit in some resource-centric view. The whole problem with the resource-centric representation is that statements *are* implicit and therefore one cannot qualify them. I don't think you are being quite fair here in your dismissal of this proposal. I don't think you have considered the full implications of the resource-centric model with regards to qualification of statements (after all, how can you qualify something that either doesn't exist explicitly or requires a secondary, additional representation that is redundant to the resource-centric representation?!) Perhaps you can explain, in terms of the current graph model, how to address the many issues that I have identified. I've not yet seen any real solutions based on the current graph model. At least this proposal *provides* solutions. Regards, Patrick
Received on Wednesday, 14 November 2001 05:06:20 UTC