Arguments for Value-Based Inline Literal Semantics from Patrick Stickler on 2002-09-17 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Tue, 17 Sep 2002 15:26:17 +0300
To: "w3c-rdfcore-wg" <w3c-rdfcore-wg@w3.org>
Message-ID: <002101c25e45$6bce6d00$864416ac@NOE.Nokia.com>
The chair has set the context for deciding between string-based
(tidy) and value-based (untidy) semantics of inline literals to
be based primarily on practical considerations, therefore rather
than addressing any technical issues, I will constrain myself to
comments relating to the practical implications of choosing one
option over the other.

I hope the following will be found to be concise and clear. If
any statement appears to be expressed too tersely or require 
additional clarification or support, I will be happy to expound 
further. Despite appearances, I did endeavor to be brief.

Supporting links/references are provided at the end.

--

1. Apparent "support" of string-based (tidy) semantics by generic
   RDF triple stores or query engines is highly suspect

There exist generic RDF applications which provide access to the objects
of statements, and in the case of inline literals, this typically correlates 
to the literal string. M&S offers no generic and portable mechanism for 
such tools to provide anything but the literal string. Any datatype value 
which might be denoted by that literal string cannot be reliably known by 
current generic RDF tools. Therefore, all that generic tools are able to provide
is string-based comparison. This is inevitable, given the silence of M&S on
the subject, and should not be interpreted as any interpretation of M&S
in favor of string-based semantics.

There exist generic RDF triple stores which provide access to their
native internal representation. Native internal representation does not 
equate to abstract representation. Equality of nodes in the native internal
representation does not equate to equality of denotation in the abstract
representation. Failure of generic RDF tools to make a clear distinction 
between abstract representation and internal native representation may
result in such tools appearing to presume or support string-based (tidy) 
semantics, by merging string-equal inline literals to the same internal
node structure, but this may be nothing more than an artifact of their 
internal storage optimization, and should not be interpreted as any
interpretation of M&S in favor of string-based semantics, nor even
any preference of the application itself for string-based semantics.

Now that the RDF WG is providing clarification of the structure and meaning
of the abstract syntax, it is expected that such applications will be revised
to reflect these clarifications and more explicitly differentiate between
the syntax and semantics of the standardized abstract representation and
their own proprietary internal representations. This would include making the
nature of access and comparison functions clear, as to whether they
operate on or reflect the abstract graph or their own proprietary structures.

Thus, the nature of present day generic RDF applications cannot serve as a 
valid argument for or against either string-based or value-based semantics
as they are merely echoing the ambiguity of this issue inherent in
the original M&S spec, and are limited by interests of genericity from
providing anything but string-based operations as any and all value-based
semantics remain fully in the domain of the problem-specific application.

--

2. Impact on existing information models with deployed content is 
   substantially greater with adoption of string-based than 
   value-based semantics

There exist RDF information models which presume string-based semantics
for inline literals.

There exist RDF information models which presume value-based semantics
for inline literals.

Neither presumption is clearly supported by M&S and both may be seen
as equally reasonable, insofar as M&S is concerned.

In the case of those models which presume string-based semantics,
these can be divided into two types: (a) those which implicitly assume
a datatype of xsd:string or similar for all properties taking literal
objects, and (b) those which act as a closed system where all
literals are local names with fixed local meaning and those local
meanings are imposed on all external knowledge syndicated into that 
system regardless of original intended meaning.

In the case of those models which presume value-based semantics,
the datatype of the literal is typically fixed for particular
properties and left implicit in the RDF.

If value-based (untidy) semantics is adopted, there is negligible impact:

   * For models which presume an implicit string datatype for all
     properties taking literal objects, one need only express the implicit
     datatyping assumption in a schema. No existing content need be
     changed. The impact is negligible, and positive in that
     it promotes increased clarity of intended meaning.

   * For models which presume string-based semantics, such that all
     literals have fixed meaning, one may continue to operate based on
     those closed system assumptions, and may continue to disregard
     any meaning external to or conflicting with that closed system.
     No existing content need be changed. The impact is purely
     a social one, making clear the closed nature of such models.

   * For models which presume value-based semantics, one need only express 
     the implicit datatyping assumptions in a schema. No existing content 
     need be changed. The impact is negligible, and positive in that
     it promotes increased clarity of intended meaning.

If string-based (tidy) semantics is adopted, there is SUBSTANTIAL impact:

   * For models which presume string-based semantics of either type,
     nothing need change. The implicit assumptions are made explicit by 
     the RDF spec. No existing content need be changed. No impact.

   * For models which presume value-based semantics, one is still left with no
     standard mechanism for making those datatyping assumptions explicit.
     However, it is possible, even likely, that generic RDF reasoners
     will draw different entailments based on string-based semantics than
     model specific applications will based on value-based semantics, therefore
     if such inconsistencies are to be remedied, *ALL* existing content for
     such models will have to be changed to explicitly and locally
     specify the intended datatype -- irregardless of the gross redundancy
     and complete irrelevancy (to the model) of such local datatyping
     assertions. 

     The impact here is *HUGE*, and given the deployed base of
     content for such models (e.g. Adobe PDF, DC, CC/PP, RSS, etc.)
     modification of deployed content is unlikely to happen and thus there 
     will occur a schism between generic RDF tools and inference engines 
     presuming string-based semantics and these particular information models 
     which presume value-based semantics for inline idioms. 

     The results will be catastrophic for RDF as a standard.

     *** It is far easier and cheaper to modify a few software applications
     *** to reflect value-based semantics than it is to correct and re-deploy
     *** large volumes of existing content to add explicit local datatyping.
     *** And there already is a substantial amount of DC, CC/PP, and PDF5
     *** content.

--

3. Value-based semantics reflects the world most accurately

RDF is a tool for making statements about the world. If I say

   <rdf:Description rdf:about="#Jenny">
      <age>10</age>
   </rdf:Description>

then it is most reasonable to think that my intent is to say something about
Jenny that reflects the world, rather than say something that reflects
the form of the expression. It is rather odd to interpret the above statement 
as asserting that Jenny's age is some lexical representation, some string, rather 
than some actual value, as this would be reflecting the RDF syntax and not the 
world.

And if I am employing generic RDF inference engines to operate on
RDF expressed statements about the world, I'm interested in the meaning of
those statements as they reflect the world, not the meaning of those
statements as they reflect the form or syntax in which they were expressed.

RDF is a tool for *knowledge* representation, not for structured markup.

The names and terms used in RDF are supposed to denote things in the world
and their characteristics and relations, not characteristics of the form
of expression of statements about the world.

Interpretations such as 'the object of the property in the (above) statement 
is the literal string "10" which may or may not mean something special
to some extra-RDF application' do not reflect the world. They reflect the
form of expression.

On the other hand, interpretations such as 'Jenny's age is ten' do reflect 
the world and are far more useful for semantic web applications concerned
with knowledge, rather than the details of the form in which that knowledge 
was expressed.

If RDF is intended to be used to express statements about the 
world, then all RDF names (including literals) denote things in 
the world. Yes, sometimes the things denoted by literals are strings 
(which give the illusion that the literals denote themselves) but that
is not always the case. The title of a book is (usually) a string.
Fair enough. But the owner of a book is (seldom) a string, and
if a string has been used to denote the owner, then it is the
value that counts and not the representation of the value in 
the RDF syntax. Such usage *already* exists and is widely deployed,
and insofar as M&S is concerned, is valid.

Thus, even when inline literals can be deemed to denote strings,
the literals themselves are still names of things (the strings)
and thus always exhibit value-based semantics.

String-based semantics distorts the nature of RDF by blurring
the boundaries between the form of expression and the meaning
of expression, and lessens the utility of RDF as a tool for making 
statements about the world.

--

4. The RDF community prefers value-based (untidy) semantics

The results of the inquiry to the RDF community regarding this
issue reflect a clear preference for value-based semantics by
a ratio of 4 to 1. The WG should respect this preference in its
decision on this matter.

--

Conclusion:

* Current software applications are not a valid metric for this decision
* Adoption of value-based semantics has negligible impact
* Adoption of string-based semantics has substantial impact
* Value-based semantics more accurately reflects the nature of RDF as
  a tool for knowledge representation, reflecting the world
* The RDF community prefers value-based semantics

The RDF Core WG should adopt value-based (untidy) semantics for
inline literals.

--

Supporting References:

Results of inquiry to RDF Community
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Aug/0163.html

Adobe XMP
http://partners.adobe.com/asn/developer/xmp/main.html
http://xml.coverpages.org/XMP-Samples20011016.zip

CC/PP
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Aug/0150.html

RSS (Syndication Module)
http://web.resource.org/rss/1.0/modules/syndication/

iCal/RDF
http://ilrt.org/discovery/2001/06/schemas/ical-full/hybrid.rdf#
http://ilrt.org/discovery/2001/06/content/swws2001-07-30.rdf



[Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com]
Received on Tuesday, 17 September 2002 08:26:20 UTC