Re: Any use cases for untidy literals except long range datatyping? from pat hayes on 2002-08-21 (w3c-rdfcore-wg@w3.org from August 2002)

From: pat hayes <phayes@ai.uwf.edu>
Date: Tue, 20 Aug 2002 21:00:48 -0700
To: Sergey Melnik <melnik@db.stanford.edu>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05111b0cb9830dc0ab74@[65.212.118.249]>
>I'd like to restate the questions, which Jan raised recently, more explicitly.
>
>Much of the ongoing discussion about tidy/untidy literals amounts to 
>arguing about different readings of a given piece of RDF/XML or 
>NTriples syntax. From what I can tell, both tidy and untidy literals 
>are implementable, so we have to pick one and wrap up.
>
>To my knowledge, untidy literals have been first suggested in the 
>context of long range datatyping (aka implicit/global idiom). 
>Specifically, untidy literals provide a shortcut for using a bNode 
>with a property (two triples are essentially merged into one).

I think it is rather more fundamental than this. First, bear in mind 
that this entire issue only arises in the context of RDF graph 
syntax; all normal lexicalized notations, including XML, are 
inherently untidy. I think that databases are usually untidy as well 
(eg consider a RDB table where the second column is all integers and 
the third column is all strings: we could have ten in one and "10" 
in the other, and that would not be considered an implementation 
issue, right?) Second, the basic point is that people will, whether 
we like it or not, tend to use things like numerals as names of 
numbers. They are so used almost universally throughout human 
discourse and all programming languages. However, we are committed to 
RDF incorporating XML datatypes, which means that we cannot build 
this assumption into the language, since the meaning of a numeral 
string depends on the datatype applied to it (it *could* be a string. 
) So there is a 'natural' default that we are prevented from 
assuming. Our options are to provide a different default (literals 
are strings unless stated otherwise) which will make some 
implementors happy but is unlikely to be found congenial by the rest 
of the world; or, to provide a mechanism which treats 'bare' literals 
as having a incomplete meaning and allows the datatyping information 
to be supplied from elsewhere (range datatyping or some external 
assumption). But the second requires that we allow one occurrence of 
a bare literal to be associated with a different datatype than 
another occurrence of the same literal.

>Is this shortcut so fundamental that there is value of making it 
>part of the spec?

I think so, cf. above. In other words, its not just a shortcut. But 
even if it were, we have Mike Dean and Patrick S. insisting that 
saving one triple per entry is critical for their applications (on 
palm pilots and cell phones respectively, I note :-)

BTW, isn't range datatyping exactly like having a datatype associated 
with a table column in a database, rather than having to rewrite it 
in every separate entry? And isn't that normal DB practice?

>Is there an appealing use case for untidy literals that is not long 
>range datatyping (aka implicit/global idiom)?
>
>Are we closing off any important extensibility paths if we go for 
>tidy literals?

With regards to this last point, yes. DAML and OIL and probably OWL 
will need the flexibility of allowing (semantically) untidy literals, 
and if we forbid them then the DAML spec will need to be rewritten 
and OWL will probably no longer base itself on RDF (or, an 
alternative scenario, the Webont WG will split apart into two rival 
groups which will produce incompatible standards. It is perilously 
close to this already.)

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Wednesday, 21 August 2002 05:04:51 UTC