- From: <Patrick.Stickler@nokia.com>
- Date: Wed, 7 Nov 2001 11:20:11 +0200
- To: phayes@ai.uwf.edu
- Cc: w3c-rdfcore-wg@w3.org
- Message-ID: <2BF0AD29BC31FE46B78877321144043114C072@trebe003.NOE.Nokia.com>
-----Original Message----- From: ext Pat Hayes [mailto:phayes@ai.uwf.edu] Sent: 07 November, 2001 04:39 To: Stickler Patrick (NRC/Tampere) Cc: w3c-rdfcore-wg@w3.org Subject: RE: Subject literals > Right. YOu need to extend the Ntriples notation slightly to be able > to fully capture the structures that can be built. One proposal > (still not adopted) is to allow nodeIds (the new name for the _:x > labels) to identify not just blank nodes but also literal nodes. So > one might write the graph I had in an earlier message: > > aaa ---eg:prop--->10--rdf:type--->xsd:integer > > could be written in Ntriples++ as: > > aaa eg:prop _:1:"10" . > _:1 rdf:type xsd:integer . Well, now I'm just gonzo confused (a common state for me these days is seems ;-) Exactly what is the difference between this "new" representation aaa eg:prop _:1:"10" . _:1 rdf:type xsd:integer . and aaa eg:prop <genid:123> . <genid:123> rdf:value "10" . <genid:123> rdf:type xsd:integer . aside from the fact that the literal value is now part of the *unique* identifier? The first one has three nodes and two edges; the second one has four nodes and three edges. Graphs in ascii-art, respectively (view in Courier): aaa ---eg:prop--> "10" ---rdf:type--->xsd:integer aaa ---eg:prop-->[ ]---rdf:value--->"10" | '---rdf:type--->xsd:integer The second graph has a blank node in the middle. So labels on bNodes are just a means of compression, in the case of literals, to avoid the extra rdf:value arc? And how are labels represented in e.g. a set of triples describing that compressed subgraph? You'd anyway have to expand that out to some kind of arc (statement) based on the nodes identity, so what exactly does it buy us? Sorry I missed the bNode discussions, and I don't want to open up a closed issue, I just would like to at least understand the key benefits to the label representation as opposed to the former anonymous node representation. And since the label of the node is now unique, why then not use a URI. That gets into another debate, which we have had to exhaustion, and decided that literals and bnodes were to be permitted. Done deal. But be careful with that 'label'. The nodeIDs in Ntriples are not in the graph itself: they are just used by Ntriples to keep track of which node is which in its lexicalization of the graph structure. I.e. why not just aaa eg:prop <xsd:integer:10> . and be done with it? Well, what's that in a graph? Is 'xsd:literal:10' a node label? It's a URI, and hence a resource. Thus it's a uriref and it is a label. But the typing is "built in". If so, I tend to agree, that would certainly make everything a hell of a lot simpler (even if it does throw away several weeks work:-). Literals wear their datatype on their sleeves, they have a single globally fixed interpretation, are never ambiguous; end of story. Exactly. That's the point of the URV encoding. No questions about data type ever, even beyond RDF space. Not that I wouldn't hate to see weeks of work thrown away ;-) Interpretation of literals is for applications above the RDF space anyway, right? So why not just use a self contained package of value and type, which doesn't get munged when binding to query variables employing inference based on subClass relations? Right, good point. > where the subject of the second triple is the same nodeID as the > object of the first one. The general rule to make a graph from such a > document is: make a separate graph for each triple, then merge all > nodes with the same nodeID or uriref label; then erase the nodeIDs. > > Now, the examples given above might look like this: > _:1:"fi" rdf:type <urn:iso:3166_1> . > _:2:"fi" rdf:type <urn:iso:639> . > <urn:foo> xyz:someProperty _:1:"fi" . Well, that's *alot* different than the earlier examples which had the object nodes labled identically. This treatment seems the same to me as the current "genid:" approach which of course is required in order to get to triples. Each bNode has a "system" identity, and statements are expressed using that system identity as the subject. And in essence, that system identity is a kind of "local URI". I'm lost. I really don't follow what you are saying here. So your label really *is* the same as a URI, but it's the URI of a resource node (or bNode) not the literal itself, and properties (arcs) hung on that node are properties of the object for that particular statement, not the literal. Think of the graph as follows: its the NODES that denote things. Nodes with a uriref label denote the resource with that uriref. Blank nodes denote things, but we don't have names for them. Literal nodes (in my understanding) are like uriref nodes in that they denote through their labels, but literal labels denote things by a different route than urirefs; their meaning is determined by a datatyping scheme rather than by an interpretation. Now, nodeIDs ('_:2' and so on) are not mentioned, because they are not in the graph at all; they are only used by an Ntriples parser to keep track of the correspondence between labels in triples and nodes in the graph. OK, things are becoming clearer. The literal itself does not constitute the identity of the node, only its lexical representation, just as a URI may represent the lexical identity of a global resource in the serialization. Right. Got it. But still, the fact that a uriref label is used as a nodeID but a literal label is not, seems a little messy (for lack of a more technical term ;-) I guess that was why I was equating a literal label as a nodeID. It's interesting that, if a URV approach were adopted wholesale, then literals as they are now could be eliminated entirely, interpreting untyped content data values in XML/RDF or NTriples serializations as implicit defintions of <xsd:anySimpleType:*> I.e. <someProperty>foo</someProperty> becomes <xsd:anySimpleType:foo> in the graph. If someone wants interpretation of the value according to some other data type, then they have to declare it locally using an explicit URV encoding, e.g. <someProperty rdf:resource="xsd:token:foo"/> and we have the locally defined <xsd:token:foo> as the object node of the property in the graph. Then, label would always equate to nodeID and in fact label could be discarded and we'd just have nodeIDs, all of which are urirefs. Eh? Or is that a bit too radical ;-) It uses fewer nodes, for one thing; Fair enough. I'm all for more efficient representations. but more significantly (IMHO), it allows the datatype 'context' to be inferred from other parts of the graph by using RDFS reasoning. However, I confess that the issue you have raised about inappropriate bindings has got me more worried about this than I was previously. Well, hopefully I've not worried you needlessly. I think that the issue of RDF not providing any kind of compilation of lexical forms into canonical representations and that a descriptive interpetation of rdfs:range presuming such a canonical representation does need to be addressed. Or else we will have the risk of bindings that cannot be reliably interpreted according to the inferred data type. Cheers, Patrick
Received on Wednesday, 7 November 2001 04:20:30 UTC