- From: <Patrick.Stickler@nokia.com>
- Date: Wed, 7 Nov 2001 08:52:15 +0200
- To: phayes@ai.uwf.edu, w3c-rdfcore-wg@w3c.org
> I'm not sure what a URV is, but how would one say (the equivalent of) > > aaa eg:shoeSize "10" . aaa eg:shoeSize <xsd:integer:10> . > using URVs? (BTW, I gave 'URV' to Google, and it found this: > http://www.scudworks.com/pablo/possum/urvtips.html ) This is outlined in my X-Values proposal. C.f. http://www-nrc.nokia.com/sw/X_Values_URI.pdf I'm working on a slightly more polished I-D for the extended URI taxonomy itself, and separate I-D for some URI schemes grounded in that taxonomy. Hopefully those will be done and submitted this month. > >Here's a radical (?) suggestion: data typing and type validation > >only apply to resources, not literals! Literals are not interpreted. > >They are only sequences of bytes. > > If I follow you, that is close to the 'bnode' suggestion already on > the table, ie to translate the above into something like > > aaa eg:shoeSize _:1 . > _:1 rdf:type actas:ShoeSizes > _:1 actas:shoesizeString "10" . Yes and no. It seems that the problem of misinterpretation of lexical forms which may differ between subtype and supertype really only comes into play when literals are not locally typed, but type is inferred from range constraints of the properties. Perhaps this is where the "bug" is. Perhaps range constraints should *only* be constraints, and that they only apply when the value itself has an explicit, local type defined. If no local type is defined for a value, then any range definitions are meaningless, and applications should not rely on range constraints to determine the type of non-locally typed literals. The idea that range defintions could be descriptive, per a strongly typed model, seems to miss the fact that lexical forms remain uninterpreted until needed, and may be invalid for a given binding as defined by the range of a property. Values are not in any canonical internal representation, where it is enough if there is proper intersection of the value spaces of original type and property (variable) type. The persistent lexical form precludes such a treatment. In my example to Brian recently, the misinterpretation of "0x12" as an integer in decimal notation occurred because the literal itself was not specified for type (which also defines lexical form); hence the problem. ASSERTION: Range definitions are *only* prescriptive, and *only* when local type is defined for values. Any assignment of type for a value must be made locally for each occurrence of the value. Hmmmm... that might do the trick... Thus, with the bNode approach, any locally defined type for a literal is inseparable from the lexical form, and logical binding of values based in inference and subClassOf and subPropertyOf relations must bind the bNode, not the literal, thus keeping together the original type and lexical form. The key here is that original type and lexical form *always* remain together in RDF/S space. Just as a Banana object in an OOP model remains a Banana even when it is held in a Fruit variable, so must a typed literal value retain its original data type, even when bound to a property with a range and interpretation of a superordinate type. And just as the interaction with a Banana as a Fruit is dependent on the API, so is the interpretation of e.g. a hexInteger as an integer dependent on "some" API. But the underlying representation in the graph remains explicit and type is inseparable from value. bNodes may very well provide that "persistent typing", but I'd like to see the above constraints on inseperability and binding, as well as the assertion that range is only prescriptive, stated explicitly and officially somewhere. The URV encoding could be seen as a compression of a bNode typing, where the type and lexical form are encapsulated in a URI. The additional benefit to this encoding (apart from ensuring that type and form are inseparable) is that each value in a given value space (presuming semantically-vacuous lexical variation is disallowed) shares a common global identity which can be useful in various contexts (see my X-Values proposal). Thus <eg:shoeSize rdf:resouce="xsd:integer:10"/> defines a typed resource, encoded in a URV, for which we can perform type validation and all logical bindings inferred from subClassOf and subPropertyOf relations do not result in potential misinterpretation of the lexical form, as the primary data type is inseperable from the lexical form. > But it seems to me that one *must* interpret literals, since they > provide essential semantic content. Clearly, it matters whether the > shoe size is 10 or 12. But such interpretation of literals belongs at or above the API layer (and its a great pity that RDF does not yet have a standardized API...) The logical operations that may be applied by inference engines on the graph itself should not IMO be mucking about with interpreting literals, but only the defined types of those values. > >If one wants to express a value > >of a given data type via a particular lexical form, one can encode > >it as a URI (URV). > > So there is a URI for every value of every literal in every > datatyping scheme?? Yes, just as there can be a URI for every other 'thing' we may wish to reify in our systems. > Instead of writing 10, I have to know that > http://www.w3.org/xmldt/integers/ten is the web address of the number > I need?? That seems like an intolerable burden on content writers. No. It is not so burdensome. You are perhaps confusing qnames based on URL namespaces with URVs, which are much more concise. In fact, which of the following do you consider to be more verbose and burdensome: aaa eg:someProperty _:1 . _:1 rdf:value "10" . _:1 rdf:type <http://www.w3.org/2001/XMLSchema#int> . or aaa eg:someProperty <xsd:integer:10> . Note that 'xsd:integer:10' is *not* any kind of qname form. It is a URI (URV). Type can then be assigned via aboutEachPrefix (yes, a pity it is deprecated...) <rdf:Description aboutEachPrefix="xsd:integer:"> <rdf:type rdf:resource="http://www.w3.org/2001/XMLSchema#int"/> </rdf:Description> or can be implicitly understood by the API. And note that the binding of type to lexical form as encapsulated in a URV is portable across any URI aware application, and thereby offers greater utility than an RDF-only representation. > >Applications are free to parse and interpret > >URVs according to their data type, and bindings to queries based > >on logical inference of subClass relations do not effect the > >data type identity of the particular value. > > > >Thus, one avoids the "basket of fruit" problem which plagued > >early OOP languages (including C++) where the identity of > >a value was dependent on the type of the variable, and if one > >had a collection for a type 'Fruit' and one put all kinds of > >fruit into the container (basket of fruit), an iterator over > >that container, when assigned each member of the basket, only > >provided a 'Fruit', not a Banana, or an Apple, etc. Ada95 > >was the first OOP language to properly solve this problem. > >Sather hinted at it, but failed. Java fortunately manages it > >quite well. And most C++ compilers now seem to handle it. > > > >But RDF suffers now from the same problem, since type identity is > >not inseparable from the value. It may be that a given value > >corresponding to a nonNegativeInteger is also an integer, even > >a number, but binding that value to a query property of type > >number should not cause the value to loose its type identity > >as a nonNegativeInteger. That's "OOP 101" folks. > > No doubt we missed this because, in part, to consider querying > seriously would be beyond our charter. Perhaps technically, but without considering operations such as inference and queries at least generally, one cannot ensure that the foundational representation of knowledge in the graph will effectively support such essential applications. I.e., it may not be in the charter to define inference or querying functionality, but surely we would want to ensure solid support of them. Right? (sorry for making so much noise right off the bat, being the new guy and all... ;-) Cheers, Patrick
Received on Wednesday, 7 November 2001 01:52:27 UTC