W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > February 2002

Oh my GOD, another datatype document.

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Mon, 4 Feb 2002 17:32:28 -0600
Message-Id: <p05101407b884bf8fcef2@[65.212.118.208]>
To: w3c-rdfcore-wg@w3.org
Guys, Ive been trying to write a followup to the 'bermuda triangle' 
message, and its still in an incomplete state, but I thought it might 
be worth giving y'all the URI in any case as time is so short:

http://www.coginst.uwf.edu/users/phayes/DatatypesUnifiedMT-draft1.html

I will be updating this when I get a chance, maybe tonite. (Maybe. 
The html is a crock, so ignore the colors, font sizes etc.; and its a 
bit repetitive because I was writing it in streaming mode. )

In sum: one can have the TDL syntactic idioms without having pairs in 
the MT, and the local and 'range' typing all work together smoothly 
and many of the proposed alternative idioms all work together and in 
fact are semantically equivalent, and literal nodes are tidy, and 
literals denote strings.

That's the simple version.  What this does not handle, however, is 
the case where a literal is used in-line (no bnode) to refer to a 
value; in this simple mode,

<mary> <age> "10" .

says that Mary's age is a character string, and there's no arguing 
about it; if you assert some range info on <age> that disagrees with 
that, then you are just plain wrong.

Or, there is another option, which handles all the other idioms in 
just the same way and gives them the same meaning, but where a bare 
in-line literal with no typing information is under-determined; it 
means pretty much the same as a bnode, by itself. This 'subtle' 
option handles all the datatyping idioms that have been proposed so 
far (I think), but it requires some complications. First, we have to 
allow graphs to be untidy on literal nodes (cost: need to extend 
N-triples notation in some way to indicate whether or not two 
occurrences of a literal are on the same node.). Second, it is almost 
impossible to make range-typing work on arbitrary datatypes unless we 
allow literals to be subjects. (Cost: literals as subjects, which 
some users will not like, eg might break DAML compatibility.) Third, 
some queries and rules might turn out to have a stronger meaning than 
people currently expect, and so have some unexpected conclusions , cf 
discussion near end of document. (Cost: I don't think this last is a 
fatal problem, in fact, but it might ruffle some feathers.)

Finally, this proposal has some overall costs. It uses a new namespace
http://www.coginst.uwf.edu/users/phayes/rdf-datatype-schema.html
and requires that it be used intelligently. (You really only need the 
datatype and type, the rest are for being fancy-schmansy.) And it 
requires imposing some semantic conditions that go beyond what can be 
stated using closure rules; which is another way of saying that this 
is a genuine semantic extension to RDF; it can't be thought of as 
just an abbreviated form of a whole lot of RDF triples (as RDFS can, 
for example.)

Datatyping this sensitive really does go slightly beyond what can be 
said *in* RDF, in a sense. I would argue that this is inevitable if 
we are going to deal properly with XML datatypes, but I can see that 
some folk will be worried by it.

Anyway, I offer it for consideration/comment/trashing/whatever.

I'll try to summarize what it does for all the 'issues' soon. 
However, a quick comment about issue B4 in V4. When writing things 
like

_:f <rdf:title> <film> .
_:f <dc:title> "10" .
<mary> <age> "10" .

you have to say whether those are the same literal NODE or not. 
Graphs can be untidy on literal nodes, so this is ambiguous (4 nodes 
or 5 in the graph?). And it matters, because one gets different 
entailments in the two cases. If those "10"s are the same literal 
node than this entails

_:x <cd:Title> _:y .
_:z <age> _:y .

but if they are not the same node, and if nodes can be typed 
separately, then it doesn't, in general. Either way the inference is 
quite clear, but the discussion is confused because the Ntriples 
*syntax* is ambiguous when literal nodes are not tidy.

Pat


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Monday, 4 February 2002 18:31:55 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:45:04 EDT