- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Wed, 11 Sep 2002 14:57:54 +0300
- To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Jeremy, I think you've touched on some very important points, though it appears that we are in fact not in agreement on how they should be addressed. That's a pity, as I thought we were both in favor of syntactically untidy and explicitly named inline literals. (I'm secretly hoping you're playing Devil's advocate here ;-) Comments follow... ----- Original Message ----- From: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com> To: <w3c-rdfcore-wg@w3.org> Sent: 11 September, 2002 12:26 Subject: Datatyping, reification, syntactic tidyness > > > Proposal: > > The RDF specification explicits says that implementations of the RDF graph > may represent literal nodes with the same label as a single node or as > multiple nodes; and that nothing in the specs allow these different > implementations to be distinguished. Hence, an operation like: > > RDFGraph.countLiteralNodes() > > cannot be defined in a way that conforms with our recommendation. Well, that depends. If it is counting nodes specific to the internal, application-specific representation, then no, it can't, but if it is meant to reflect the number of nodes as defined for the abstract syntax, then it should. I.e. it has to be clear whether the above function reflects the implementation graph or the abstract graph (and I can think of lots of utility for the latter, such as an implementation-neutral query API, etc.). > ======================== > > > Consider > > <rdf:Description rdf:bagID="Reify"> > <eg:p1 rdf:datatype="&xsd;int">10</eg:p1> > <eg:p2 rdf:datatype="&xsd;int">10</eg:p2> > <eg:p3 >10</eg:p1> > <eg:p4 >10</eg:p2> > </rdf:Description> > > This creates a graph with: > > four initial triples > sixteen triples reifying those four triples > five triples forming the bag Do you mean according to the abstract syntax? Or some hypothetical implementation? Or perhaps ARP? I'm presuming you mean the abstract graph here (but then, this thread is specifically about how many literal nodes there are in the abstract graph so...) > This message is about: > - how many Literal nodes are there? 3 > - do we care? In the abstract graph? Absolutely. In some application's internal structures? Not at all. > My preference is to be able to systematically say we do not care. If we are to have generic, portable APIs which allow disparate RDF applications to interact consistently on the same knowledge base, I would argue that we should care a whole lot precisely how many nodes are in the abstract syntax. As for the application syntax, we should explicitly not care nor ever impose any requirements on internal representations. > There are at least two literal nodes, one labelled with an int 10, the other > labelled with a RDF String Literal "10". Since these labels are different > the nodes must be different. > > Of the twentyfive triples in the graph eight have literal objects, thus > there are at most eight literal nodes. > > A syntactically tidy implementation would stop at two nodes. > > A thorough untidy one would have eight nodes. > Some would argue that the object of the rdf:object triple in the reification > is the same node as the object of the original triple. Thus an > implementation following this rationale would get four literals. I would suggest that in the abstract syntax (leaving semantics aside) there would be exactly three literal nodes. One node denoting the explicitly typed literal (xsd:integer, "10") and two nodes denoting the non-explicitly-typed literals, e.g. (_:x, "10") and (_:y, "10"). > Of course, sensible implementations could choose to treat datatyped literals > tidyly and RDF String Literals untidyly (or vice versa) which suggests that > maybe six is also a plausible number of literals. Sensible implementations will be employing numerous mechanisms to maximize storage and processing efficiency. That is not our concern. > If in fact, our normative serialization of the graph does not allow us to > distinguish these cases then we do not need to, and in fact, SHOULD NOT say > either way. I would expect that N-Triples would explicitly and accurately reflect the abstract syntax, and that RDF/XML would implicitly yet accurately reflect the abstract syntax. Thus both normative serializations would say precisely how many literal nodes are in the abstract graph. Whether that abstract syntax is used literally as the basis for some implementation is not our concern -- though one would expect and hope that generic APIs would reflect the abstract syntax, hiding all implementation-specific deviations from users. > The model theory needs to reflect this inability to represent the two > different cases and not depend on some hidden node identity that we cannot > serialize (this only rules out certain types of untidiness in the model > theory). Or rather, the MT needs to reflect that all literal nodes have either a URIref or systemID prefix, and given that, they are all syntactically tidy. Explicitly typed literal nodes with URIref prefix are also semantically tidy and denote datatype values. It remains to be seen whether we say anything more about systemID prefixed literal nodes, as to whether they are semantically tidy (by string equality of the string literal) or untidy, with the systemID implicitly denoting a datatype. As for serializing the "hidden" node identity, I would suggest that the attribute rdf:nodeID is precisely the correct means to do so. See below... > In fact, we should explicitly say that we are not saying, and that this is > deliberately underspecified, since nothing depends on it. > > I believe that these two RDF/XML documents are entirely equivalent: > > <rdf:Description rdf:bagID="Reify"> > <eg:p1 rdf:datatype="&xsd;int">10</eg:p1> > </rdf:Description> > > > <rdf:Description rdf:nodeID="subj"> > <eg:p1 rdf:datatype="&xsd;int">10</eg:p1> > </rdf:Description> > <rdf:Bag rdf:ID="Reify"> > <rdf:li> > <rdf:Statement> > <rdf:subject rdf:nodeID="subj"/> > <rdf:predicate rdf:resource="⪚p1/> > <rdf:object rdf:datatype="&xsd;int">10</rdf:object> > </rdf:Statement> > </rdf:li> > </rdf:Bag> I agree. However, I do not consider the following two RDF/XML documents as equivalent (syntactically at least): <rdf:Description rdf:bagID="Reify"> <eg:p1>10</eg:p1> </rdf:Description> <rdf:Description rdf:nodeID="subj"> <eg:p1>10</eg:p1> </rdf:Description> <rdf:Bag rdf:ID="Reify"> <rdf:li> <rdf:Statement> <rdf:subject rdf:nodeID="subj"/> <rdf:predicate rdf:resource="⪚p1/> <rdf:object>10</rdf:object> </rdf:Statement> </rdf:li> </rdf:Bag> Though, the following would IMO be equivalent (if made legal): <rdf:Description rdf:bagID="Reify"> <eg:p1>10</eg:p1> </rdf:Description> <rdf:Description rdf:nodeID="subj"> <eg:p1 nodeID="x">10</eg:p1> </rdf:Description> <rdf:Bag rdf:ID="Reify"> <rdf:li> <rdf:Statement> <rdf:subject rdf:nodeID="subj"/> <rdf:predicate rdf:resource="⪚p1/> <rdf:object nodeID="x">10</rdf:object> </rdf:Statement> </rdf:li> </rdf:Bag> (in the case of the bagID refication, it's up to the parser to use the same systemID for the literal in both the eg:p1 statement and the rdf:object statement) > And I buy Guha's point at the Bristol F2F that with untidy literal semantics > rdf:object refers to the syntax of the triple not its semantics. Well, I thought that was the official view. After all, a "stating" is about the expression of the statements, not the meaning, right? And an expression is captured in the syntax, not the MT. Cheers, Patrick
Received on Wednesday, 11 September 2002 07:57:56 UTC