Re: Literals as subjects, labels for nodes from Pat Hayes on 2001-10-16 (w3c-rdfcore-wg@w3.org from October 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Tue, 16 Oct 2001 10:21:52 -0500
To: Dave Beckett <dave.beckett@bristol.ac.uk>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101023b7f1f2f8d0db@[205.160.76.193]>
>Rather than get into detailed N-Triples syntax issues, can we discuss
>what this change to the model means; an outline of why it is required
>and its implications.
>
>Consequent syntax changes can be discussed in another thread.

OK, fair enough (though it really is mostly a syntax issue, in fact). 
Let me try to summarize.

First, there are two separate issues/proposals which can be discussed 
separately, though they do fall naturally together:

1. Provide a way to allow two different occurrences of the same 
literal to be distinguished in the syntax. (must-do)

2. Allow literals as subjects. (optional)

Let me focus on the first as it is the most important.

In the current draft of the MT document, RDF graphs are required to 
be 'tidy', which is defined to mean that labels uniquely identify 
their node, ie no two nodes have the same label. "Label" here refers 
to both urirefs and literals; that means that, with this definition, 
any literal can occur in a graph only in one place, labelling one 
single node.

This is not acceptable if we are ever going to allow nontrivial 
datatyping, since (if we do) that would mean that we will want one 
occurrence of a literal to mean one thing, and a different occurrence 
to mean something else, so we cannot require that they be the same 
occurrence in the graph. (Exactly how the different occurrences of, 
say, "010701" are inferred to be of type integer, or string, or date, 
or whatever, is another issue that we can discuss separately; the 
present point is only that different occurrences of the same literal 
might be somehow treated differently, so cannot be forced to be 
syntactically identical.) So, if we have (or even contemplate the 
possibility of some extension of RDF ever having) nontrivial literal 
datatyping, then we have to relax the strict tidiness condition on 
literals, and allow the same literal label to appear at several 
places in the graph.

So far, no problem. The problem arises in describing these graphs in 
N-literals syntax. In brief, you can't. That is, there is no way to 
tell, given an Ntriples document, which of several alternative RDF 
graphs it might be supposed to indicate, since if the same literal 
occurs in several different triples, you might or might not make them 
them be the same node, and there's nothing in the Ntripledoc that 
tells you which is intended. And moreover, it matters, since the 
different graphs will have different meanings in the model theory (if 
it is extended to allow complex datatyping.) So we need a way to 
indicate, in an Ntriples document, which of the various occurrences 
of a literal are supposed to go onto the same node in the graph, and 
which are not. The proposed 'Ntriples++' extension is one way to do 
that, which uses the same node-indicating scheme as Ntriples already 
uses for blank nodes, but lets it be applied to literals as well 
(optionally).

This wasn't a problem, to emphasize, when literals are not typed (or 
if they all have the same type, eg string) since then we can safely 
impose tidiness on literals as well as on urirefs.  But I really do 
think that it would be a mistake to take this (slightly) easier way 
out, since the resulting language will not *permit* complex 
datatyping, even in any extension built on top of it. This would for 
example cause DAML+OIL to no longer be RDF-compatible.

OK, that's the pressing matter. The other matter is less pressing. If 
we were to allow literals as subjects, then this proposed extension 
to the Ntriples syntax could also be used to specify graphs in which 
datatyping information was specified directly, by simply asserting 
that a (particular occurrence of a) literal has a datatype as its 
rdf:type. (see 
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0277.html 
) This would provide a simple, obvious mechanism for declaring 
datatypes which would integrate smoothly into both the (extended) RDF 
syntax, and also the proposed model-theory treatment of datatypes. 
(sketched in 
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0164.html )
And I can see no rational reason to prohibit it, other than it 
possibly being beyond our charter. So this seems to me to be some 
extra support for the proposal to allow literals as subjects. 
(However, even if this proposal is rejected, the previous point 
stands on its own. )

I hope this covers the issues adequately.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Tuesday, 16 October 2001 11:21:48 UTC