- From: Bijan Parsia <bparsia@cs.man.ac.uk>
- Date: Tue, 22 Aug 2006 13:34:44 +0100
- To: Pat Hayes <phayes@ihmc.us>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Let me, for the sake of argument and my other deadlines, concede some version of this. Indeed, let me strengthen it with a quote from the RDF semantics: """. Notice also that the blank nodes themselves are perfectly well- defined entities; they differ from other nodes only in not being assigned a denotation by an interpretation, reflecting the intuition that they have no 'global' meaning (i.e. outside the graph in which they occur). ... This effectively treats all blank nodes as having the same meaning as existentially quantified variables in the RDF graph in which they occur, and which have the scope of the entire graph. In terms of the N-Triples syntax, this amounts to the convention that would place the quantifiers just outside, or at the outer edge of, the N-Triples document corresponding to the graph. This in turn means that there is a subtle but important distinction in meaning between the operation of forming the union of two graphs and that of forming the merge. The simple union of two graphs corresponds to the conjunction ( 'and' ) of all the triples in the graphs, maintaining the identity of any blank nodes which occur in both graphs. This is appropriate when the information in the graphs comes from a single source, or where one is derived from the other by means of some valid inference process, as for example when applying an inference rule to add a triple to a graph. Merging two graphs treats the blank nodes in each graph as being existentially quantified in that graph, so that no blank node from one graph is allowed to stray into the scope of the other graph's surrounding quantifier. This is appropriate when the graphs come from different sources and there is no justification for assuming that a blank node in one refers to the same entity as any blank node in the other.""" (None of this requires the, well, unusual graph syntax stuff. If BNodes could be lexical entities quite easily with conditions on when you had to rewrite them in order to preserve meaning.) Clearly, if you dork with the scope of the quantifier, the "lexical" distinctness or similarity (or BNodes identity) will carry through all the formulae in the scope of the quantifier. So, yes, if you put the quantifier outside ...er...the source graph, query, and result set, you can have behavior in various variants as you've described. The scoping graph and set are all to manage this, though they are atrociously described. However, let me point out three things: 1) we use the term "graph" with abandon in the SPARQL query spec and 2) results have a life outside the "query process", and 3) answer sets are subject to interpretation and thus misinterpretation by users. 1) In point of fact, the background graph, named graphs, basic graph pattern, and the construct template (at the least) all are either straightforwardly RDF graphs, or have similar scoping conditions. Thus, the *decision* to, e.g., ignore the graph edges, at least some of the time, needs to be carefully examined and presented (e.g., to add the results of a construct back to the graph in order to, say, reach a fixed point of some sort seems not a good default behavior). Given that the queried Graphs and the query document are going to, typically, be very distinct things (e.g., they are more like the case of things coming from different sources), I think we should be consider that a straightforward reading is going to scope things rather narrowly. Now the common implementation scopes bnodes rather widely, but I'm sceptical that that is due to this rather subtle reading of BNodes. I'm still waiting for a case where the (narrowly scoped) existential reading is worth all this toil and misunderstanding. I really don't see it. 2) The particular situation we were discussing was the interpretation of DISTINCT. Let me rephrase the issue: We definitely can think of the results of a CONSTRUCT as being RDF redundant, i.e., non lean. Why can we not think of answers sets in exactly the same way? Yes, if we want to have sessions, we have to push the quantifiers out even further, but that's one reason I think it's not such a great idea. As a UMD rep, I pushed it, but I'll confess that I didn't expect *this* many ramifications and difficulties. As a UMan rep, I do not support it anymore. Personally, I think for portally cases (which drove the UMD position) that I'd have my portal coin URIs. It solves a world of problems. 3) My usability point is that people want to see BNodes (and URIs!) as being subject to a form of the UNA. Fred's examples confirm this. The specs should either embrace this (breaking with RDF) or firmly reject it and not make it so easy to get confused about it. Having a form of distinct that leans the answer set goes a long way to doing that. Finally, as a specification point, it would be very nice if BGP matching and the subsequent algebra were fairly distinct. Cheers, Bijan.
Received on Tuesday, 22 August 2006 12:35:39 UTC