The emperor's new datatypes

I was going to hold my silence in support of WG consensus.  I still support 
the WG consensus, but now the matter of datatyping has come up on the list 
[1][2] I feel compelled to state my opinion lest silence be regarded as 
agreement with some other view.

[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0629.html
[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0634.html

I do not have new information, and I am not asking for any issue to be 
reopened.  But I feel I should now put my opinion on the record.  I don't 
plan to get into an argument about the rightness of my views, at least not 
here, unless the group as a whole decides the reopen the datatyping design.

...

THE EMPEROR'S NEW DATATYPES
---------------------------

I believe we took a wrong turn when we did the datatyping design.  I don't 
think the decision is technically fatal, but having had some time to stand 
back and consider the larger picture, I believe we have introduced a lot of 
unnecessary complexity.

As an alternative, DanC proposed a system [3] of datatype "interpretation 
properties" and constraints on string literals, which has the advantage of 
being a much smaller change to RDF than the current proposal.

[3] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html

The key decision we made was that literals are tidy.  Having made that 
decision, I think the choice between ways to handle literals corresponding 
to datatype values, such as integers, etc., was somewhat rushed.  In a 
private message drafted about the time we took that decision I included an 
analysis of the impact of the different tidy-literal options on CC/PP, 
which I reproduce below, in which I conclude that Dan's proposal works just 
as well for CC/PP as the current proposal.

Later, I argue that I think even Dan's proposal might be slightly 
simplified (in terms of adding less mechanism to RDF).

...

TIDY LITERALS IN CC/PP
----------------------

This analysis of changes to CC/PP is predicated on tidy literals being a 
done deal.

Dan's point, if I represent it correctly, is that we don't need typed
literals because if we assume datatype properties relating values to
lexical forms, we can use them as "interpretation properties".

That is, the intent of the expression:

    jenny age xsd:integer"10" .

can be equivalently expressed as:

    jenny age _:x .
    _:x xsd:integer "10" .

The values of IEXT(I(age)) that make the first form true are exactly those
that make the second true.

The only substantive objection to this that I can think of is the concern
for triple-bloat; e.g. integer valued-properties now need two triples
instead of one.  (I this that optimization is quite possible, but
would be some extra work for implementers.)

So if I now refer to my thoughts about redesigning CC/PP for use with tidy
literals [4], where I wrote:

[[
   <prf:displayWidth
     rdf:datatype="http://www.w3.org/2001/XMLSchema-datatypes#integer">604</prf:displayWidth>
   <prf:displayHeight
     rdf:datatype="http://www.w3.org/2001/XMLSchema-datatypes#integer">200</prf:displayHeight>
]]

I would instead write something like:
[[
           <prf:displayWidth>
              <rdf:Description>
                 <xsd:integer>604</xsd:integer>
              </rdf:Description>
           </prf:displayWidth>
           <prf:displayHeight>
              <rdf:Description>
                 <xsd:integer>200</xsd:integer>
              </rdf:Description>
           </prf:displayHeight>
]]

or just:
[[
           <prf:displayWidth xsd:integer="604" />
           <prf:displayHeight xsd:integer="200" />
]]

Syntactically, the first form is quite a big wrench for CC/PP and the
second is no worse, indeed more attractive, than my previous redesign [4],
but (on the assumption of literals always denoting string values) would
only affect properties with non-string values.  Properties with string
values would not be affected, and I *think* that's reasonably true of many
of the UAProf attributes.

Technically, as I indicated in [4], I think this kind of approach is the
best way forward for CC/PP in RDF, and would, I believe, lay the
appropriate groundwork for a full content negotiation framework based on
CC/PP, using CONNEG-like ideas for capability matching [5], probably based
on OWL vocabulary.

[4] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Sep/0353.html
[5] http://www.ietf.org/rfc/rfc2533

...

SIMPLIFYING DANC'S PROPOSAL
---------------------------

In [3], DanC proposed a new construct, rdfs:format for indicating the 
intended form of literals used as objects of a property.

[3] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html

I think a very similar effect can be achieved using other defined RDF 
constructs:

   my:age rdfs:format xsdt:integer .

I think could equivalently be expressed as:

   my:age rdfs:range _:x .
   xsdt:integer rdfs:range _:x .

[I think someone else suggested this on the list, but I don't know where so 
I'm unable to cite acknowledgement -- sorry.]

which we can now express in RDF/XML thanks to the rdf:nodeId construct we 
introduced.

This is presented as a semantic constraint, but it seems quite plausible to 
me that under the same conditions that we now call datatype entailment, a 
generic RDF processor could recognize that:

    ex:Jenny my:age "foo" .

is not satisfiable under the known properties of xsdt:integer.  So while 
this is strictly a semantic constraint, it could reaonable be detected and 
flagged by an RDF parser, in a fashion similar to the way that C compilers 
may perform flow analysis to detect and warn at compile type the use of an 
uninitialized variable.

...

SUMMARY
-------

In summary, I think that the same effect that we currently achieve can also 
be achieved with a considerable simplification of RDF.  I think the main 
technical arguments against are efficiency-based, and I think that (in 
time) smart implementations could overcome those by recognizing and 
optimizing some common cases (cf. relational databases).

I repeat: in this message I'm not asking the WG to change anything, but 
stating a viewpoint for the record, because the topic has been mentioned.

#g
--

[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0629.html
[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0634.html
[3] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html
[4] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Sep/0353.html
[5] http://www.ietf.org/rfc/rfc2533



-------------------
Graham Klyne
<GK@NineByNine.org>

Received on Wednesday, 27 November 2002 08:39:45 UTC