- From: Richard Cyganiak <richard@cyganiak.de>
- Date: Wed, 9 Mar 2011 08:02:24 +0000
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: public-rdf-wg@w3.org
On 8 Mar 2011, at 18:43, Andy Seaborne wrote: > """ > The same blank node cannot occur in two graphs at the same time. > """ > > If there is knowledge it's the same blank node why not allow it to be the same? As long as the nodes aren't accidentally equated. I understand where you're coming from but am unconvinced. The reason for wanting blank node scoped to the graph is that they need to be scoped somehow, and the obvious alternative (scoping them to the dataset) just pushes the problem slightly further out -- you'd run into the same questions again in case you wanted to have multiple overlapping datasets that contain the same graphs (e.g., a public endpoint and an access-protected one). Scoping blank nodes to the graph has the nice property of making it possible to move graphs around between stores without anything surprising happening. I'm uncomfortable with the notion of some not further specified “knowledge” that the same blank node occurs in multiple places. How does this “cross-graph blank node knowledge” fit with SPARQL 1.0 and SPARQL 1.1? Can I somehow construct graphs with such overlap given just the graph management features and update features found in those specs? Are there implementations that allow blank nodes to occur in multiple graphs, and if so, then how does the knowledge get into the store? > As in the default-graph-as-union and the base+inference cases, there are uses for the subgraph relationship and then it is the same blank node. I don't see how it's relevant to default-graph-as-union, you can have that no matter how you scope the blank nodes. But it's true that we have use cases that perhaps require it *if* blank nodes occur in certain places in the data: “Slicing datasets according to multiple dimensions” and “Tracing inference results.” > For TriG and N-quads, I suggest blank node labels are scoped to the document, and across graphs. It's confusing to see two _:a to mean different things without much stronger scoping intuitions (esp. N-Quads); it makes it possible to record when you do know they are the same bnode (one graph a subgraph of another). Especially for N-Quads I would argue against this. We've found the ability to sensibly “merge” N-Quads files just by concatenating them, as well as other ad hoc string/line based operations, quite handy. If _:a in two different graphs means the same thing, then that's no longer possible, and we'd have to do the “standardize apart” dance. Also we use N-Quads a lot for storing results of web crawls, where the notion of a blank node shared between graphs is counter-intuitive, and where ensuring uniqueness of blank node labels across hundreds of millions of graphs would be expensive in various ways. This whole discussion just shows again what a bloody pain blank nodes are. I guess my position is that “blank nodes should have less magic”, and blank nodes shared between graphs in a dataset under certain circumstances just adds more magic that will trip people over and cause headaches for future users and implementers (and spec writers). When you're down a hole, the first thing to do is stop digging. Best, Richard > > Andy >
Received on Wednesday, 9 March 2011 08:02:55 UTC