RE: Subject literals

From: <Patrick.Stickler@nokia.com>
Date: Wed, 7 Nov 2001 11:20:11 +0200
Message-ID: <2BF0AD29BC31FE46B78877321144043114C072@trebe003.NOE.Nokia.com>
To: phayes@ai.uwf.edu
Cc: w3c-rdfcore-wg@w3.org

From: ext Pat Hayes [mailto:phayes@ai.uwf.edu]
Sent: 07 November, 2001 04:39
To: Stickler Patrick (NRC/Tampere)
Cc: w3c-rdfcore-wg@w3.org
> Right. YOu need to extend the Ntriples notation slightly to be able
> to fully capture the structures that can be built. One proposal
> (still not adopted) is to allow nodeIds (the new name for the _:x
> labels) to identify not just blank nodes but also literal nodes. So
> one might write the graph I had in an earlier message:
> aaa ---eg:prop--->10--rdf:type--->xsd:integer
> could be written in Ntriples++ as:
> aaa eg:prop _:1:"10" .
> _:1 rdf:type xsd:integer .

Well, now I'm just gonzo confused (a common state for me these
days is seems ;-)

Exactly what is the difference between this "new"

  aaa eg:prop _:1:"10" .
  _:1 rdf:type xsd:integer .


  aaa eg:prop <genid:123> .
  <genid:123> rdf:value "10" .
  <genid:123> rdf:type xsd:integer .

aside from the fact that the literal value is now part
of the *unique* identifier?

The first one has three nodes and two edges; the second one has four nodes
and three edges.

Graphs in ascii-art, respectively (view in Courier):

aaa ---eg:prop--> "10" ---rdf:type--->xsd:integer

aaa ---eg:prop-->[   ]---rdf:value--->"10"

The second graph has a blank node in the middle.

So labels on bNodes are just a means of compression, in the case of
literals, to avoid the extra rdf:value arc?  
And how are labels represented in e.g. a set of triples describing that
compressed subgraph?
You'd anyway have to expand that out to some kind of arc (statement) based
on the nodes identity, so
what exactly does it buy us?
Sorry I missed the bNode discussions, and I don't want to open up a closed
issue, I just
would like to at least understand the key benefits to the label
representation as opposed to
the former anonymous node representation.

And since the label of the node is now unique, why
then not use a URI.

That gets into another debate, which we have had to exhaustion, and decided
that literals and bnodes were to be permitted. Done deal.

But be careful with that 'label'. The nodeIDs in Ntriples are not in the
graph itself: they are just used by Ntriples to keep track of which node is
which in its lexicalization of the graph structure.

I.e. why not just

  aaa eg:prop <xsd:integer:10> .

and be done with it?

Well, what's that in a graph? Is 'xsd:literal:10' a node label?  

It's a URI, and hence a resource. Thus it's a uriref and it is a label. But
the typing is "built in".

 If so, I tend to agree, that would certainly make everything a hell of a
lot simpler (even if it does throw away several weeks work:-). Literals wear
their datatype on their sleeves, they have a single globally fixed
interpretation, are never ambiguous; end of story.

Exactly. That's the point of the URV encoding. No questions about data type
ever, even beyond RDF space. 
Not that I wouldn't hate to see weeks of work thrown away ;-) 

Interpretation of literals is for applications above the RDF
space anyway, right? So why not just use a self contained package
of value and type, which doesn't get munged when binding to
query variables employing inference based on subClass relations?

Right, good point.

> where the subject of the second triple is the same nodeID as the
> object of the first one. The general rule to make a graph from such a
> document is: make a separate graph for each triple, then merge all
> nodes with the same nodeID or uriref label; then erase the nodeIDs.
> Now, the examples given above might look like this:
> _:1:"fi" rdf:type <urn:iso:3166_1> .
> _:2:"fi" rdf:type <urn:iso:639> .
> <urn:foo> xyz:someProperty _:1:"fi" .

Well, that's *alot* different than the earlier examples
which had the object nodes labled identically. This treatment
seems the same to me as the current "genid:" approach
which of course is required in order to get to triples.

Each bNode has a "system" identity, and statements are
expressed using that system identity as the subject. And

in essence, that system identity is a kind of "local URI".

I'm lost. I really don't follow what you are saying here.

So your label really *is* the same as a URI, but it's
the URI of a resource node (or bNode) not the literal itself,
and properties (arcs) hung on that node are properties of

the object for that particular statement, not the literal.

Think of the graph as follows: its the NODES that denote things. Nodes with
a uriref label denote the resource with that uriref. Blank nodes denote
things, but we don't have names for them. Literal nodes (in my
understanding) are like uriref nodes in that they denote through their
labels, but literal labels denote things by a different route than urirefs;
their meaning is determined by a datatyping scheme rather than by an

Now, nodeIDs ('_:2' and so on) are not mentioned, because they are not in
the graph at all; they are only used by an Ntriples parser to keep track of
the correspondence between labels in triples and nodes in the graph.

OK, things are becoming clearer. The literal itself does not constitute the
identity of the node, only its lexical representation, just as a URI may
represent the lexical identity of a global resource in the serialization.
Right. Got it. But still, the fact that a uriref label is used as a nodeID
but a literal label is not, seems a little messy (for lack of a more
technical term ;-)
I guess that was why I was equating a literal label as a nodeID.
It's interesting that, if a URV approach were adopted wholesale, then
literals as they are now could be eliminated entirely, interpreting untyped
content data values in XML/RDF or NTriples serializations as implicit
defintions of <xsd:anySimpleType:*>  I.e.
becomes <xsd:anySimpleType:foo> in the graph. If someone wants
interpretation of the value according to some other data type, then they
have to declare it locally using an explicit URV encoding, e.g.
   <someProperty rdf:resource="xsd:token:foo"/>
and we have the locally defined <xsd:token:foo> as the object node of the
property in the graph.
Then, label would always equate to nodeID and in fact label could be
discarded and we'd just have nodeIDs, all of which are urirefs. Eh? Or is
that a bit too radical ;-)

It uses fewer nodes, for one thing;  

Fair enough. I'm all for more efficient representations.

 but more significantly (IMHO), it allows the datatype 'context' to be
inferred from other parts of the graph by using RDFS reasoning. However, I
confess that the issue you have raised about inappropriate bindings has got
me more worried about this than I was previously.

Well, hopefully I've not worried you needlessly. I think that the issue of
RDF not providing any kind of compilation of lexical forms into canonical
representations and that a descriptive interpetation of rdfs:range presuming
such a canonical representation does need to be addressed. Or else we will
have the risk of bindings that cannot be reliably interpreted according to
the inferred data type.
