- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Tue, 17 Sep 2002 15:26:17 +0300
- To: "w3c-rdfcore-wg" <w3c-rdfcore-wg@w3.org>
The chair has set the context for deciding between string-based
(tidy) and value-based (untidy) semantics of inline literals to
be based primarily on practical considerations, therefore rather
than addressing any technical issues, I will constrain myself to
comments relating to the practical implications of choosing one
option over the other.
I hope the following will be found to be concise and clear. If
any statement appears to be expressed too tersely or require
additional clarification or support, I will be happy to expound
further. Despite appearances, I did endeavor to be brief.
Supporting links/references are provided at the end.
--
1. Apparent "support" of string-based (tidy) semantics by generic
RDF triple stores or query engines is highly suspect
There exist generic RDF applications which provide access to the objects
of statements, and in the case of inline literals, this typically correlates
to the literal string. M&S offers no generic and portable mechanism for
such tools to provide anything but the literal string. Any datatype value
which might be denoted by that literal string cannot be reliably known by
current generic RDF tools. Therefore, all that generic tools are able to provide
is string-based comparison. This is inevitable, given the silence of M&S on
the subject, and should not be interpreted as any interpretation of M&S
in favor of string-based semantics.
There exist generic RDF triple stores which provide access to their
native internal representation. Native internal representation does not
equate to abstract representation. Equality of nodes in the native internal
representation does not equate to equality of denotation in the abstract
representation. Failure of generic RDF tools to make a clear distinction
between abstract representation and internal native representation may
result in such tools appearing to presume or support string-based (tidy)
semantics, by merging string-equal inline literals to the same internal
node structure, but this may be nothing more than an artifact of their
internal storage optimization, and should not be interpreted as any
interpretation of M&S in favor of string-based semantics, nor even
any preference of the application itself for string-based semantics.
Now that the RDF WG is providing clarification of the structure and meaning
of the abstract syntax, it is expected that such applications will be revised
to reflect these clarifications and more explicitly differentiate between
the syntax and semantics of the standardized abstract representation and
their own proprietary internal representations. This would include making the
nature of access and comparison functions clear, as to whether they
operate on or reflect the abstract graph or their own proprietary structures.
Thus, the nature of present day generic RDF applications cannot serve as a
valid argument for or against either string-based or value-based semantics
as they are merely echoing the ambiguity of this issue inherent in
the original M&S spec, and are limited by interests of genericity from
providing anything but string-based operations as any and all value-based
semantics remain fully in the domain of the problem-specific application.
--
2. Impact on existing information models with deployed content is
substantially greater with adoption of string-based than
value-based semantics
There exist RDF information models which presume string-based semantics
for inline literals.
There exist RDF information models which presume value-based semantics
for inline literals.
Neither presumption is clearly supported by M&S and both may be seen
as equally reasonable, insofar as M&S is concerned.
In the case of those models which presume string-based semantics,
these can be divided into two types: (a) those which implicitly assume
a datatype of xsd:string or similar for all properties taking literal
objects, and (b) those which act as a closed system where all
literals are local names with fixed local meaning and those local
meanings are imposed on all external knowledge syndicated into that
system regardless of original intended meaning.
In the case of those models which presume value-based semantics,
the datatype of the literal is typically fixed for particular
properties and left implicit in the RDF.
If value-based (untidy) semantics is adopted, there is negligible impact:
* For models which presume an implicit string datatype for all
properties taking literal objects, one need only express the implicit
datatyping assumption in a schema. No existing content need be
changed. The impact is negligible, and positive in that
it promotes increased clarity of intended meaning.
* For models which presume string-based semantics, such that all
literals have fixed meaning, one may continue to operate based on
those closed system assumptions, and may continue to disregard
any meaning external to or conflicting with that closed system.
No existing content need be changed. The impact is purely
a social one, making clear the closed nature of such models.
* For models which presume value-based semantics, one need only express
the implicit datatyping assumptions in a schema. No existing content
need be changed. The impact is negligible, and positive in that
it promotes increased clarity of intended meaning.
If string-based (tidy) semantics is adopted, there is SUBSTANTIAL impact:
* For models which presume string-based semantics of either type,
nothing need change. The implicit assumptions are made explicit by
the RDF spec. No existing content need be changed. No impact.
* For models which presume value-based semantics, one is still left with no
standard mechanism for making those datatyping assumptions explicit.
However, it is possible, even likely, that generic RDF reasoners
will draw different entailments based on string-based semantics than
model specific applications will based on value-based semantics, therefore
if such inconsistencies are to be remedied, *ALL* existing content for
such models will have to be changed to explicitly and locally
specify the intended datatype -- irregardless of the gross redundancy
and complete irrelevancy (to the model) of such local datatyping
assertions.
The impact here is *HUGE*, and given the deployed base of
content for such models (e.g. Adobe PDF, DC, CC/PP, RSS, etc.)
modification of deployed content is unlikely to happen and thus there
will occur a schism between generic RDF tools and inference engines
presuming string-based semantics and these particular information models
which presume value-based semantics for inline idioms.
The results will be catastrophic for RDF as a standard.
*** It is far easier and cheaper to modify a few software applications
*** to reflect value-based semantics than it is to correct and re-deploy
*** large volumes of existing content to add explicit local datatyping.
*** And there already is a substantial amount of DC, CC/PP, and PDF5
*** content.
--
3. Value-based semantics reflects the world most accurately
RDF is a tool for making statements about the world. If I say
<rdf:Description rdf:about="#Jenny">
<age>10</age>
</rdf:Description>
then it is most reasonable to think that my intent is to say something about
Jenny that reflects the world, rather than say something that reflects
the form of the expression. It is rather odd to interpret the above statement
as asserting that Jenny's age is some lexical representation, some string, rather
than some actual value, as this would be reflecting the RDF syntax and not the
world.
And if I am employing generic RDF inference engines to operate on
RDF expressed statements about the world, I'm interested in the meaning of
those statements as they reflect the world, not the meaning of those
statements as they reflect the form or syntax in which they were expressed.
RDF is a tool for *knowledge* representation, not for structured markup.
The names and terms used in RDF are supposed to denote things in the world
and their characteristics and relations, not characteristics of the form
of expression of statements about the world.
Interpretations such as 'the object of the property in the (above) statement
is the literal string "10" which may or may not mean something special
to some extra-RDF application' do not reflect the world. They reflect the
form of expression.
On the other hand, interpretations such as 'Jenny's age is ten' do reflect
the world and are far more useful for semantic web applications concerned
with knowledge, rather than the details of the form in which that knowledge
was expressed.
If RDF is intended to be used to express statements about the
world, then all RDF names (including literals) denote things in
the world. Yes, sometimes the things denoted by literals are strings
(which give the illusion that the literals denote themselves) but that
is not always the case. The title of a book is (usually) a string.
Fair enough. But the owner of a book is (seldom) a string, and
if a string has been used to denote the owner, then it is the
value that counts and not the representation of the value in
the RDF syntax. Such usage *already* exists and is widely deployed,
and insofar as M&S is concerned, is valid.
Thus, even when inline literals can be deemed to denote strings,
the literals themselves are still names of things (the strings)
and thus always exhibit value-based semantics.
String-based semantics distorts the nature of RDF by blurring
the boundaries between the form of expression and the meaning
of expression, and lessens the utility of RDF as a tool for making
statements about the world.
--
4. The RDF community prefers value-based (untidy) semantics
The results of the inquiry to the RDF community regarding this
issue reflect a clear preference for value-based semantics by
a ratio of 4 to 1. The WG should respect this preference in its
decision on this matter.
--
Conclusion:
* Current software applications are not a valid metric for this decision
* Adoption of value-based semantics has negligible impact
* Adoption of string-based semantics has substantial impact
* Value-based semantics more accurately reflects the nature of RDF as
a tool for knowledge representation, reflecting the world
* The RDF community prefers value-based semantics
The RDF Core WG should adopt value-based (untidy) semantics for
inline literals.
--
Supporting References:
Results of inquiry to RDF Community
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Aug/0163.html
Adobe XMP
http://partners.adobe.com/asn/developer/xmp/main.html
http://xml.coverpages.org/XMP-Samples20011016.zip
CC/PP
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Aug/0150.html
RSS (Syndication Module)
http://web.resource.org/rss/1.0/modules/syndication/
iCal/RDF
http://ilrt.org/discovery/2001/06/schemas/ical-full/hybrid.rdf#
http://ilrt.org/discovery/2001/06/content/swws2001-07-30.rdf
[Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com]
Received on Tuesday, 17 September 2002 08:26:20 UTC