Post format from Brian McBride on 2001-11-07 (w3c-rdfcore-wg@w3.org from November 2001)

From: Brian McBride <bwm@hplb.hpl.hp.com>
Date: Wed, 07 Nov 2001 11:06:16 +0000
To: rdf core <w3c-rdfcore-wg@w3.org>
Message-ID: <3BE915A8.8060900@hplb.hpl.hp.com>
I note that the copy of the post in the archive:

   http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0184.html

is not the same as the version I received in my inbox.  The archive version has 
lost some of the "this was copied from a previous message" structure.  I suspect 
  this may be due to the origninal post being in HTML or some such.

Could folks please stick to using plain text.  Its kinda old fashioned, but it 
works.

Brian


Patrick.Stickler@nokia.com wrote:

>  
> 
>     -----Original Message-----
>     From: ext Pat Hayes [mailto:phayes@ai.uwf.edu]
>     Sent: 07 November, 2001 04:39
>     To: Stickler Patrick (NRC/Tampere)
>     Cc: w3c-rdfcore-wg@w3.org
>     Subject: RE: Subject literals
> 
>>     > Right. YOu need to extend the Ntriples notation slightly to be able
>>     > to fully capture the structures that can be built. One proposal
>>     > (still not adopted) is to allow nodeIds (the new name for the _:x
>>     > labels) to identify not just blank nodes but also literal nodes. So
>>     > one might write the graph I had in an earlier message:
>>     >
>>     > aaa ---eg:prop--->10--rdf:type--->xsd:integer
>>     >
>>     > could be written in Ntriples++ as:
>>     >
>>     > aaa eg:prop _:1:"10" .
>>     > _:1 rdf:type xsd:integer .
>>
>>     Well, now I'm just gonzo confused (a common state for me these
>>     days is seems ;-)
>>
>>     Exactly what is the difference between this "new"
>>     representation
>>
>>       aaa eg:prop _:1:"10" .
>>       _:1 rdf:type xsd:integer .
>>
>>     and
>>
>>       aaa eg:prop <genid:123> .
>>       <genid:123> rdf:value "10" .
>>       <genid:123> rdf:type xsd:integer .
>>
>>     aside from the fact that the literal value is now part
>>     of the *unique* identifier?
> 
> 
>     The first one has three nodes and two edges; the second one has four
>     nodes and three edges.
> 
> 
>     Graphs in ascii-art, respectively (view in Courier):
> 
> 
>     aaa ---eg:prop--> "10" ---rdf:type--->xsd:integer
> 
> 
>     aaa ---eg:prop-->[   ]---rdf:value--->"10"
> 
>                        |
> 
>                        '---rdf:type--->xsd:integer
> 
> 
>     The second graph has a blank node in the middle.
> 
> So labels on bNodes are just a means of compression, in the case of 
> literals, to avoid the extra rdf:value arc?  
>  
> 
> And how are labels represented in e.g. a set of triples describing that 
> compressed subgraph?
> 
> You'd anyway have to expand that out to some kind of arc (statement) 
> based on the nodes identity, so
> 
> what exactly does it buy us?
> 
>  
> 
> Sorry I missed the bNode discussions, and I don't want to open up a 
> closed issue, I just
> 
> would like to at least understand the key benefits to the label 
> representation as opposed to
> 
> the former anonymous node representation.
> 
>  
> 
>>     And since the label of the node is now unique, why
>>     then not use a URI.
> 
> 
>     That gets into another debate, which we have had to exhaustion, and
>     decided that literals and bnodes were to be permitted. Done deal.
> 
> 
>     But be careful with that 'label'. The nodeIDs in Ntriples are not in
>     the graph itself: they are just used by Ntriples to keep track of
>     which node is which in its lexicalization of the graph structure.
> 
> 
>>     I.e. why not just
>>
>>       aaa eg:prop <xsd:integer:10> .
>>
>>     and be done with it?
> 
> 
>     Well, what's that in a graph? Is 'xsd:literal:10' a node label?  
> 
>      
> 
> It's a URI, and hence a resource. Thus it's a uriref and it is a label. 
> But the typing is "built in".
> 
>       
> 
>      If so, I tend to agree, that would certainly make everything a hell
>     of a lot simpler (even if it does throw away several weeks work:-).
>     Literals wear their datatype on their sleeves, they have a single
>     globally fixed interpretation, are never ambiguous; end of story.
> 
> Exactly. That's the point of the URV encoding. No questions about data 
> type ever, even beyond RDF space. 
>  
> 
> Not that I wouldn't hate to see weeks of work thrown away ;-) 
> 
>>
>>     Interpretation of literals is for applications above the RDF
>>     space anyway, right? So why not just use a self contained package
>>     of value and type, which doesn't get munged when binding to
>>     query variables employing inference based on subClass relations?
> 
> 
>     Right, good point.
> 
>>
>>     > where the subject of the second triple is the same nodeID as the
>>     > object of the first one. The general rule to make a graph from
>>     such a
>>     > document is: make a separate graph for each triple, then merge all
>>     > nodes with the same nodeID or uriref label; then erase the nodeIDs.
>>     >
>>     > Now, the examples given above might look like this:
>>     > _:1:"fi" rdf:type <urn:iso:3166_1> .
>>     > _:2:"fi" rdf:type <urn:iso:639> .
>>     > <urn:foo> xyz:someProperty _:1:"fi" .
>>
>>     Well, that's *alot* different than the earlier examples
>>     which had the object nodes labled identically. This treatment
>>     seems the same to me as the current "genid:" approach
>>     which of course is required in order to get to triples.
>>
>>     Each bNode has a "system" identity, and statements are
>>     expressed using that system identity as the subject. And
> 
>>     in essence, that system identity is a kind of "local URI".
> 
> 
>     I'm lost. I really don't follow what you are saying here.
> 
> 
>>     So your label really *is* the same as a URI, but it's
>>     the URI of a resource node (or bNode) not the literal itself,
>>     and properties (arcs) hung on that node are properties of
> 
>>     the object for that particular statement, not the literal.
> 
> 
>     Think of the graph as follows: its the NODES that denote things.
>     Nodes with a uriref label denote the resource with that uriref.
>     Blank nodes denote things, but we don't have names for them. Literal
>     nodes (in my understanding) are like uriref nodes in that they
>     denote through their labels, but literal labels denote things by a
>     different route than urirefs; their meaning is determined by a
>     datatyping scheme rather than by an interpretation.
> 
> 
>     Now, nodeIDs ('_:2' and so on) are not mentioned, because they are
>     not in the graph at all; they are only used by an Ntriples parser to
>     keep track of the correspondence between labels in triples and nodes
>     in the graph.
> 
> OK, things are becoming clearer. The literal itself does not constitute 
> the identity of the node, only its lexical representation, just as a URI 
> may represent the lexical identity of a global resource in the 
> serialization. Right. Got it. But still, the fact that a uriref label is 
> used as a nodeID but a literal label is not, seems a little messy (for 
> lack of a more technical term ;-)
>  
> 
> I guess that was why I was equating a literal label as a nodeID.
> 
>  
> 
> It's interesting that, if a URV approach were adopted wholesale, then 
> literals as they are now could be eliminated entirely, interpreting 
> untyped content data values in XML/RDF or NTriples serializations as 
> implicit defintions of <xsd:anySimpleType:*>  I.e.
> 
>  
> 
>    <someProperty>foo</someProperty>
> 
>  
> 
> becomes <xsd:anySimpleType:foo> in the graph. If someone wants 
> interpretation of the value according to some other data type, then they 
> have to declare it locally using an explicit URV encoding, e.g.
> 
>  
> 
>    <someProperty rdf:resource="xsd:token:foo"/>
> 
>  
> 
> and we have the locally defined <xsd:token:foo> as the object node of 
> the property in the graph.
> 
>  
> 
> Then, label would always equate to nodeID and in fact label could be 
> discarded and we'd just have nodeIDs, all of which are urirefs. Eh? Or 
> is that a bit too radical ;-)
> 
>  
> 
>     It uses fewer nodes, for one thing;  
> 
>      
> 
> Fair enough. I'm all for more efficient representations.
> 
>      but more significantly (IMHO), it allows the datatype 'context' to
>     be inferred from other parts of the graph by using RDFS reasoning.
>     However, I confess that the issue you have raised about
>     inappropriate bindings has got me more worried about this than I was
>     previously.
> 
>  
> 
> Well, hopefully I've not worried you needlessly. I think that the issue 
> of RDF not providing any kind of compilation of lexical forms into 
> canonical representations and that a descriptive interpetation of 
> rdfs:range presuming such a canonical representation does need to be 
> addressed. Or else we will have the risk of bindings that cannot be 
> reliably interpreted according to the inferred data type.
> 
>  
> 
> Cheers,
> 
>  
> 
> Patrick
> 
>  
> 
>  
>
Received on Wednesday, 7 November 2001 06:10:54 UTC