The case for untidy literal semantics from Brian McBride on 2002-09-24 (w3c-rdfcore-wg@w3.org from September 2002)

From: Brian McBride <bwm@hplb.hpl.hp.com>
Date: Tue, 24 Sep 2002 19:53:00 +0100
To: RDF Core <w3c-rdfcore-wg@w3.org>
Message-Id: <5.1.0.14.0.20020924164148.0223feb8@0-mail-1.hpl.hp.com>
As I said earlier, with the aim of building stronger consensus around the 
tidy/untidy decision, I've been through the submissions last week in 
support of untidy literal semantics and I've tried to summarize my 
understanding of them here.  There are some editorial comments 
intertwined.  Feel free, to send suggestions/corrections.



1.  Ease of Use for the Content Producer
----------------------------------------

Untidy semantics permit long range datatyping whilst maintaining 
monotonicity.  Tidy semantics does not.

With untidy semantics, a writer of RDF/XML can specify a range constraint 
on a property to imply the datatype of the value of the property.

With tidy semantics, to maintain monotonicity, the datatype must be 
specified syntactically and the only way we currently have to do that is to 
put an rdf:datatype attribute on each property element that represents a 
datatyped value.

[Ed note:  This argument would be strengthened if we had specific examples 
where the burden of adding the datatype attributes was a problem, Owl for 
example?
]


2.  Principle of Least Change
-----------------------------

For the RDF Content Creator:

Some RDF assumes string based semantics for literals.  [There ought to be a 
reference here, perhaps something from RSS or FOAF?]  Some RDF is written 
as if literals had value based semantics.  [Reference CC/PP bitsPerPixel?]

They both look the same in RDF/XML, e.g.

   <rdf:Description>
     <foo:bar>10</foo:bar>
   </rdf:Description>

There is no way to tell from this alone whether the string "10" is meant or 
the integer 10.

If a generic RDF processor is to know which, then it must be told 
somehow.  With untidy semantics, this can be done with a range constraint 
in a schema to define the type of the literal.  This is considerably more 
convenient for the content producer.  With tidy semantics the content 
itself must be modified to indicate the actual datatype.

For the content producer therefore, the least change is to have untidy 
literals and range constraints.

[Ed Note:

1) This part of the argument is based on the desire to add datatype 
information to existing content so that the datatype is understood by an 
RDF processor.  The least change is actually to do nothing, when the world 
is no worse off than it is now.  But if it is thought that it is important 
to capture this datatype information in legacy content, then untidy 
semantics are the easiest path for the content producer.

2) As expressed, this argument does not address the tradeoff that 
exists.  Whilst adopting untidy semantics makes it easier to specify that a 
literal really denotes an integer, it adds a burden of requiring the 
content producer to specify that strings were intended, where that would 
not be necessary with tidy semantics.  It could be, depending on the 
distribution of data, that its more work to have to add the range 
constraints on all the literals that are really strings than it is to add 
the range constraints for other datatypes.  We did ask the community which 
they preferred and they said untidy - see below.

3) This argument would carry greater force if:

   o we can establish there are a lot of examples where adding this 
datatype information is desirable

   o there are important examples where adding this datatype information is 
necessary (DAML+OIL?)
]

For the generic tool developer, the principal of least change suggests 
doing what is currently done.  The tools we currently know about (cwm, 
euler, redland and jena) implement tidy semantics.

Overall, it is easier to change a few tools than a lot of content and some 
other specifications.


3.  The user community asked for untidy
---------------------------------------

We asked the user community, as clearly as we could, about the tradeoff 
between tidy and untidy semantics.  There was a clear signal from those who 
responded, that indicated a preference for untidy semantics.


4. Abbreviated Syntax
---------------------
[Ed note:  I've added this one, as I was reminded of it whilst writing the 
above]

With tidy semantics, it is not possible to represent datatyped values using 
property attributes in the abbreviated syntax.  Thus it is not possible to 
embed in html, rdf which represents datatyped values without the browser 
rendering it.
Received on Tuesday, 24 September 2002 14:58:09 UTC