- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Tue, 17 Sep 2002 15:26:17 +0300
- To: "w3c-rdfcore-wg" <w3c-rdfcore-wg@w3.org>
The chair has set the context for deciding between string-based (tidy) and value-based (untidy) semantics of inline literals to be based primarily on practical considerations, therefore rather than addressing any technical issues, I will constrain myself to comments relating to the practical implications of choosing one option over the other. I hope the following will be found to be concise and clear. If any statement appears to be expressed too tersely or require additional clarification or support, I will be happy to expound further. Despite appearances, I did endeavor to be brief. Supporting links/references are provided at the end. -- 1. Apparent "support" of string-based (tidy) semantics by generic RDF triple stores or query engines is highly suspect There exist generic RDF applications which provide access to the objects of statements, and in the case of inline literals, this typically correlates to the literal string. M&S offers no generic and portable mechanism for such tools to provide anything but the literal string. Any datatype value which might be denoted by that literal string cannot be reliably known by current generic RDF tools. Therefore, all that generic tools are able to provide is string-based comparison. This is inevitable, given the silence of M&S on the subject, and should not be interpreted as any interpretation of M&S in favor of string-based semantics. There exist generic RDF triple stores which provide access to their native internal representation. Native internal representation does not equate to abstract representation. Equality of nodes in the native internal representation does not equate to equality of denotation in the abstract representation. Failure of generic RDF tools to make a clear distinction between abstract representation and internal native representation may result in such tools appearing to presume or support string-based (tidy) semantics, by merging string-equal inline literals to the same internal node structure, but this may be nothing more than an artifact of their internal storage optimization, and should not be interpreted as any interpretation of M&S in favor of string-based semantics, nor even any preference of the application itself for string-based semantics. Now that the RDF WG is providing clarification of the structure and meaning of the abstract syntax, it is expected that such applications will be revised to reflect these clarifications and more explicitly differentiate between the syntax and semantics of the standardized abstract representation and their own proprietary internal representations. This would include making the nature of access and comparison functions clear, as to whether they operate on or reflect the abstract graph or their own proprietary structures. Thus, the nature of present day generic RDF applications cannot serve as a valid argument for or against either string-based or value-based semantics as they are merely echoing the ambiguity of this issue inherent in the original M&S spec, and are limited by interests of genericity from providing anything but string-based operations as any and all value-based semantics remain fully in the domain of the problem-specific application. -- 2. Impact on existing information models with deployed content is substantially greater with adoption of string-based than value-based semantics There exist RDF information models which presume string-based semantics for inline literals. There exist RDF information models which presume value-based semantics for inline literals. Neither presumption is clearly supported by M&S and both may be seen as equally reasonable, insofar as M&S is concerned. In the case of those models which presume string-based semantics, these can be divided into two types: (a) those which implicitly assume a datatype of xsd:string or similar for all properties taking literal objects, and (b) those which act as a closed system where all literals are local names with fixed local meaning and those local meanings are imposed on all external knowledge syndicated into that system regardless of original intended meaning. In the case of those models which presume value-based semantics, the datatype of the literal is typically fixed for particular properties and left implicit in the RDF. If value-based (untidy) semantics is adopted, there is negligible impact: * For models which presume an implicit string datatype for all properties taking literal objects, one need only express the implicit datatyping assumption in a schema. No existing content need be changed. The impact is negligible, and positive in that it promotes increased clarity of intended meaning. * For models which presume string-based semantics, such that all literals have fixed meaning, one may continue to operate based on those closed system assumptions, and may continue to disregard any meaning external to or conflicting with that closed system. No existing content need be changed. The impact is purely a social one, making clear the closed nature of such models. * For models which presume value-based semantics, one need only express the implicit datatyping assumptions in a schema. No existing content need be changed. The impact is negligible, and positive in that it promotes increased clarity of intended meaning. If string-based (tidy) semantics is adopted, there is SUBSTANTIAL impact: * For models which presume string-based semantics of either type, nothing need change. The implicit assumptions are made explicit by the RDF spec. No existing content need be changed. No impact. * For models which presume value-based semantics, one is still left with no standard mechanism for making those datatyping assumptions explicit. However, it is possible, even likely, that generic RDF reasoners will draw different entailments based on string-based semantics than model specific applications will based on value-based semantics, therefore if such inconsistencies are to be remedied, *ALL* existing content for such models will have to be changed to explicitly and locally specify the intended datatype -- irregardless of the gross redundancy and complete irrelevancy (to the model) of such local datatyping assertions. The impact here is *HUGE*, and given the deployed base of content for such models (e.g. Adobe PDF, DC, CC/PP, RSS, etc.) modification of deployed content is unlikely to happen and thus there will occur a schism between generic RDF tools and inference engines presuming string-based semantics and these particular information models which presume value-based semantics for inline idioms. The results will be catastrophic for RDF as a standard. *** It is far easier and cheaper to modify a few software applications *** to reflect value-based semantics than it is to correct and re-deploy *** large volumes of existing content to add explicit local datatyping. *** And there already is a substantial amount of DC, CC/PP, and PDF5 *** content. -- 3. Value-based semantics reflects the world most accurately RDF is a tool for making statements about the world. If I say <rdf:Description rdf:about="#Jenny"> <age>10</age> </rdf:Description> then it is most reasonable to think that my intent is to say something about Jenny that reflects the world, rather than say something that reflects the form of the expression. It is rather odd to interpret the above statement as asserting that Jenny's age is some lexical representation, some string, rather than some actual value, as this would be reflecting the RDF syntax and not the world. And if I am employing generic RDF inference engines to operate on RDF expressed statements about the world, I'm interested in the meaning of those statements as they reflect the world, not the meaning of those statements as they reflect the form or syntax in which they were expressed. RDF is a tool for *knowledge* representation, not for structured markup. The names and terms used in RDF are supposed to denote things in the world and their characteristics and relations, not characteristics of the form of expression of statements about the world. Interpretations such as 'the object of the property in the (above) statement is the literal string "10" which may or may not mean something special to some extra-RDF application' do not reflect the world. They reflect the form of expression. On the other hand, interpretations such as 'Jenny's age is ten' do reflect the world and are far more useful for semantic web applications concerned with knowledge, rather than the details of the form in which that knowledge was expressed. If RDF is intended to be used to express statements about the world, then all RDF names (including literals) denote things in the world. Yes, sometimes the things denoted by literals are strings (which give the illusion that the literals denote themselves) but that is not always the case. The title of a book is (usually) a string. Fair enough. But the owner of a book is (seldom) a string, and if a string has been used to denote the owner, then it is the value that counts and not the representation of the value in the RDF syntax. Such usage *already* exists and is widely deployed, and insofar as M&S is concerned, is valid. Thus, even when inline literals can be deemed to denote strings, the literals themselves are still names of things (the strings) and thus always exhibit value-based semantics. String-based semantics distorts the nature of RDF by blurring the boundaries between the form of expression and the meaning of expression, and lessens the utility of RDF as a tool for making statements about the world. -- 4. The RDF community prefers value-based (untidy) semantics The results of the inquiry to the RDF community regarding this issue reflect a clear preference for value-based semantics by a ratio of 4 to 1. The WG should respect this preference in its decision on this matter. -- Conclusion: * Current software applications are not a valid metric for this decision * Adoption of value-based semantics has negligible impact * Adoption of string-based semantics has substantial impact * Value-based semantics more accurately reflects the nature of RDF as a tool for knowledge representation, reflecting the world * The RDF community prefers value-based semantics The RDF Core WG should adopt value-based (untidy) semantics for inline literals. -- Supporting References: Results of inquiry to RDF Community http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Aug/0163.html Adobe XMP http://partners.adobe.com/asn/developer/xmp/main.html http://xml.coverpages.org/XMP-Samples20011016.zip CC/PP http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Aug/0150.html RSS (Syndication Module) http://web.resource.org/rss/1.0/modules/syndication/ iCal/RDF http://ilrt.org/discovery/2001/06/schemas/ical-full/hybrid.rdf# http://ilrt.org/discovery/2001/06/content/swws2001-07-30.rdf [Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com]
Received on Tuesday, 17 September 2002 08:26:20 UTC