Re: The case for untidy literal semantics

[Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, patrick.stickler@nokia.com]


> >RSS presumes value-based semantics.
> >
> >C.f. http://web.resource.org/rss/1.0/modules/syndication/, e.g.
> 
> Interesting.  Do you think RSS uses value based semantics throughout?  Hmm, 
> what test do we apply to tell the difference?

I don't see how you could mix tidy and untidy semantics in
the same information model.

It is certainly the case that most of the RSS properties
are expected to take string values (i.e. be of type xsd:string
or similar) but since numeric and date properties clearly
refer to value based semantics, so it is reasonable to assume
that value based semantics is presumed througout, and that
some properties simply have string values which are equivalent
to their lexical representations embodied in the literals.

> Untidy semantics requires the content producer to be explicit about the 
> datatype of a property value.  This will have the benefit of detecting 
> errors that would otherwise not be noticed, e.g. when one system processes 
> something as a string and the other as an integer.
> 
> I think I'm missing something here and explaining it right.  Comparing the 
> tidy and untidy semantics in this case.
> 
> Tidy semantics:  Type of object of a property is Literal. System A asserts 
> range of property is xsd:string.  Thats a type class.  System B asserts 
> range of property is xsd:integer.  Thats a type clash too.

Correct. rdfs:range assertions cannot be used with tidy inline literals.

But that does not mean that applications will not impose value-based
interpretations of those inline literals at the application level. And
if two systems impose different interpretations, which are always
unknown at the RDF level, such incompatabilities can never be detected by
RDF applications operating on the basis of RDF expressed knowledge.

> Untidy semantics:  Type of object of a property is a Literal.  System A 
> asserts range of property is xsd:string.  Systems continues just 
> fine.  System B asserts range of property is xsd:integer.  System B 
> continues just fine.

If each system makes their assertions in a closed manner, sure, but
if those assertions are expressed as RDF Schemas that are merged into
the same knowledge base, the conflict becomes apparent, when a property
ends up with range assertions for both xsd:string and xsd:integer.

And this knowledge about the global property-bound assumptions of 
the different systems is expressed explicitly in RDF and thus visible
to applications operating on the basis of RDF expressed knowledge.

> I guess I've got the scenario wrong.  Can you explain the one you had in mind.

The part you were missing is that knowledge about global datatyping
assertions is, with the untidy approach, captured in RDF and thus
exposed to RDF applications, rather than hidden in the upper extra-RDF
application layers.

> 
> >You appear to have missed one very important issue, namely,
> >
> >Probable Schism of the RDF Community
> >--------------------------------------
> >
> >If tidy semantics is adopted *and* those applications which are
> >already deployed which employ inline literals with value-based
> >semantics (Adobe XMP, CC/PP, DC, RSS, etc.) refuse to change their
> >serializations for reasons of practicality (and that is likely to
> >be the case), then these applications will have conflicting and
> >non-monotonic interpretations of the RDF compared to the RDF MT.
> 
> I did try to capture what I think is the technical issue here, that in 
> cases like bitPerPixel, with tidy semantics a generic RDF processor will 
> not be aware of the integer "nearby" to borrow DanC's phase.  Checking the 
> summary:
> 
>    http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Sep/0251.html
> 
> Yes, section 2, principle of least change is intended to capture that.  Not 
> satisfactorily I guess.
> 
> I think the point here is that with untidy semantics it is easier to 
> retrofit datatype information to existing content.  If we don't do that, 
> then there is information we are not capturing in the RDF, i.e. that a 
> generic RDF processor, rather than one with specific application knowledge, 
> will be aware of.

It's *much* more than that. It's that the interpretation assigned by a
tidy RDF MT will be *different* from the interpretation assigned by the 
application. I.e. there will be non-monotonicity between the layers.

If the global property-based datatyping remains implicit at the RDF level,
because no'one creates the RDF Schemas to assert the datatype ranges
of those properties' objects, the untidy RDF MT will not assign any meaning
to those literals, other than they denote some value of some undefined
datatype. Thus, with the untidy RDF MT, there is not interpretation to
conflict with the application's interpretation.

Ideally, the range assertions *would* be expressed, so that the RDF MT
could assign the *same* interpretation that the application will assign,
and make that knowledge visible at the RDF level, but even if it is absent,
that does not constitute any semantic conflict between the RDF and application
layers, which *would* exist with a tidy RDF MT.


> If that capture the idea, I will try to rephrase to make it more clear.
> 
> Again, I'm not sure that I've got your full point though.  You use the term 
> "schism" which to me is about the community dividing into warring 
> camps.  Is that what you intended to suggest?  If so could you amplify a 
> little.

If the applications Adobe XMP, CC/PP, DC, RSS, etc. refuse to change
their serialization to use local explicit datatyping, given the
volume of already existing serialized instances and tools operating
on those instances (some of which, doing so as XML rather than RDF,
despite the cons associated) and thus, a tidy RDF MT will assign a
meaning to an inline literal which conflicts with the meaning 
assigned by the application.

Inference, query, and other generic RDF tools operating based on 
the tidy RDF MT will not be suitable for use with RDF knowledge
presuming value based interpretations of inline literals, and 
thus, different generic tools based on an untidy MT will likely
emerge to provide for these value based models.

That's a schism. And a schism need not be "warring camps", just
diverging ones.

> 
> >The RDF MT will say that a given inline literal denotes itself,
> >the string, yet the application may say that it denotes something
> >else -- thus, entailments that hold for the RDF MT may not hold
> >for the application MT and visa versa.
> 
> What application model theory.  I can point to the model theory for 
> RDF(S).  But I'm not aware of any application model theories.

The MT (possibly implicit and/or informal) for the information
models: Adobe XMP, CC/PP, RSS, DC, etc.

E.g. the implicit CC/PP MT that says that the object of the property
bitsPerPixel denotes an integer value, not some lexical representation
of an integer value.

> >Furthermore, higher level
> >or client applications which wish to interact with RDF knowledge
> >expressed according to these value-based models will not be able
> >to utilize generic RDF tools and inference engines because they
> >will not behave correctly according to the value-based semantics.
> 
> Right.  To strengthen this point, are you aware of any concrete, pragmatic 
> examples where this really matters.

Query based retrieval based on comparison of property values, which
is a fundamental operation at the heart of many significant applications
such as content optimization, service discovery, client negotiation,
evaluation of trust, etc. etc. etc.

These applications care about what is ultimately *meant* by the object
of the property, not its form of expression (literal, typed literal,
URIref, bnode, etc.). For them, RDF is a language for knowledge
representation, not simply structured markup. It's the meaning that
counts.

The bottom line is that there are significant applications of RDF which
already employ value based interpretations of inline literals and semantic
web agents are going to rely on the ultimate intended meaning of 
those inline literals to make decisions, i.e. the values that they
denote. To that end, that meaning should be as explicit as possible in
the RDF MT and compatable with application layers above RDF.

Yes, we *could* require those already deployed applications to change
every single serialized instance to use explicit local datatyping so
that SW agents understand what they mean, but:

1. I don't expect that these applications will change, given the
   enormous effort that would be required to change deployed content,
   much less the (re-)standardization process that would be associated 
   with any changes to most of them.

2. Most of these applications are the primary "success stories" for
   RDF so it's pretty damn rude (to choose a very mild expression
   that doesn't begin to reflect the full degree of my intended
   meaning) to attempt to force them to change, rather than just
   fixing some legacy software to reflect value based semantics
   (especially since any applications presuming tidy-based semantics
   simply need to define an RDF Schema to assert xsd:string for
   their properties and then all will work as before).

Patrick

Received on Wednesday, 25 September 2002 06:45:08 UTC