RE: Subject literals from Pat Hayes on 2001-11-07 (w3c-rdfcore-wg@w3.org from November 2001)

From: Pat Hayes <phayes@ai.uwf.edu>
Date: Tue, 6 Nov 2001 20:38:56 -0600
To: Patrick.Stickler@nokia.com
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05101049b80e48f87ed9@[65.212.118.166]>
>  > Right. YOu need to extend the Ntriples notation slightly to be able
>>  to fully capture the structures that can be built. One proposal
>>  (still not adopted) is to allow nodeIds (the new name for the _:x
>>  labels) to identify not just blank nodes but also literal nodes. So
>>  one might write the graph I had in an earlier message:
>>
>>  aaa ---eg:prop--->10--rdf:type--->xsd:integer
>>
>>  could be written in Ntriples++ as:
>>
>>  aaa eg:prop _:1:"10" .
>>  _:1 rdf:type xsd:integer .
>
>Well, now I'm just gonzo confused (a common state for me these
>days is seems ;-)
>
>Exactly what is the difference between this "new"
>representation
>
>   aaa eg:prop _:1:"10" .
>   _:1 rdf:type xsd:integer .
>
>and
>
>   aaa eg:prop <genid:123> .
>   <genid:123> rdf:value "10" .
>   <genid:123> rdf:type xsd:integer .
>
>aside from the fact that the literal value is now part
>of the *unique* identifier?

The first one has three nodes and two edges; the second one has four 
nodes and three edges.

Graphs in ascii-art, respectively (view in Courier):

aaa ---eg:prop--> "10" ---rdf:type--->xsd:integer

aaa ---eg:prop-->[   ]---rdf:value--->"10"
                    |
                    '---rdf:type--->xsd:integer

The second graph has a blank node in the middle.

>And since the label of the node is now unique, why
>then not use a URI.

That gets into another debate, which we have had to exhaustion, and 
decided that literals and bnodes were to be permitted. Done deal.

But be careful with that 'label'. The nodeIDs in Ntriples are not in 
the graph itself: they are just used by Ntriples to keep track of 
which node is which in its lexicalization of the graph structure.

>I.e. why not just
>
>   aaa eg:prop <xsd:integer:10> .
>
>and be done with it?

Well, what's that in a graph? Is 'xsd:literal:10' a node label? If 
so, I tend to agree, that would certainly make everything a hell of a 
lot simpler (even if it does throw away several weeks work:-). 
Literals wear their datatype on their sleeves, they have a single 
globally fixed interpretation, are never ambiguous; end of story.

>
>Interpretation of literals is for applications above the RDF
>space anyway, right? So why not just use a self contained package
>of value and type, which doesn't get munged when binding to
>query variables employing inference based on subClass relations?

Right, good point.

>
>>  where the subject of the second triple is the same nodeID as the
>>  object of the first one. The general rule to make a graph from such a
>>  document is: make a separate graph for each triple, then merge all
>>  nodes with the same nodeID or uriref label; then erase the nodeIDs.
>>
>>  Now, the examples given above might look like this:
>>  _:1:"fi" rdf:type <urn:iso:3166_1> .
>>  _:2:"fi" rdf:type <urn:iso:639> .
>>  <urn:foo> xyz:someProperty _:1:"fi" .
>
>Well, that's *alot* different than the earlier examples
>which had the object nodes labled identically. This treatment
>seems the same to me as the current "genid:" approach
>which of course is required in order to get to triples.
>
>Each bNode has a "system" identity, and statements are
>expressed using that system identity as the subject. And
>in essence, that system identity is a kind of "local URI".

I'm lost. I really don't follow what you are saying here.

>So your label really *is* the same as a URI, but it's
>the URI of a resource node (or bNode) not the literal itself,
>and properties (arcs) hung on that node are properties of
>the object for that particular statement, not the literal.

Think of the graph as follows: its the NODES that denote things. 
Nodes with a uriref label denote the resource with that uriref. Blank 
nodes denote things, but we don't have names for them. Literal nodes 
(in my understanding) are like uriref nodes in that they denote 
through their labels, but literal labels denote things by a different 
route than urirefs; their meaning is determined by a datatyping 
scheme rather than by an interpretation.

Now, nodeIDs ('_:2' and so on) are not mentioned, because they are 
not in the graph at all; they are only used by an Ntriples parser to 
keep track of the correspondence between labels in triples and nodes 
in the graph.

>
>Thus, to that end there is no functional difference
>between
>
>  _:1:"fi" rdf:type <urn:iso:3166_1> .
>
>and
>
>  _:1 rdf:type <urn:iso:3166_1> .
>  _:1 rdf:value "fi" .

The first is two nodes and an edge; the second is three nodes and two 
edges. There is no edge labelled 'rdf:value' in the first graph. The 
nodeID in the first triple is redundant, by the way.

>except that in the latter case, the literal itself is
>"visible" according to generic semantics, without the
>need to parse a system-specific identifier.

The literal is just as "visible" in both cases: it is a label on a node.

>Sorry, still not "getting it"...
>
>>  >To treat literals as node labels is to introduce that
>>  >ambiguity into the graph. Why? How is that any more
>>  >flexible or useful than bNodes?
>>
>>  But literals are node labels in ALL the proposals. How else could
>>  they be in the graph at all?
>
>Firstly, I missed the point that every occurrence of a literal
>gets its own node. Fair enough.
>
>The problem is that a literal is not a globally unique identifier.
>So to that end, if you are using a literal as the sole identity
>of a bNode, you are losing the connection between its statement
>context and the value itself.

Right, but we are not so using it. There is a global assumption that 
urirefs uniquely identify nodes, but not that literals do.

>
>A resource is a resource is a resource no matter where it occurs
>but that is not so with literals. They are contextual, and their
>interpretation is specific to each occurrence (even though there
>may be common intepretations applied frequentely).
>
>If one has multiple nodes with the same URI label, one can presume
>that the total knowledge defined for that resource is the union
>of all such nodes. The same is *not* true of all nodes labeled
>with the same literal. Right?

Right. It is not in general valid to merge two distinct literal 
nodes, even if they have the same label.

>
>So, you must give each occurrence of each literal (bNode) a
>unique label in the graph, so that you don't lose the context of
>the statements in which those literals serve as objects, right,

Right.

>and if so, then how is that anything different than the present
>"genid:" or "online:" tricks used at present?

It uses fewer nodes, for one thing; but more significantly (IMHO), it 
allows the datatype 'context' to be inferred from other parts of the 
graph by using RDFS reasoning. However, I confess that the issue you 
have raised about inappropriate bindings has got me more worried 
about this than I was previously.

Pat

-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Tuesday, 6 November 2001 21:39:12 UTC