- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Wed, 09 Mar 2011 09:07:17 +0000
- To: Richard Cyganiak <richard@cyganiak.de>
- CC: public-rdf-wg@w3.org
On 09/03/11 08:02, Richard Cyganiak wrote: > On 8 Mar 2011, at 18:43, Andy Seaborne wrote: >> """ The same blank node cannot occur in two graphs at the same >> time. """ >> >> If there is knowledge it's the same blank node why not allow it to >> be the same? As long as the nodes aren't accidentally equated. > > I understand where you're coming from but am unconvinced. > > The reason for wanting blank node scoped to the graph is that they > need to be scoped somehow, and the obvious alternative (scoping them > to the dataset) just pushes the problem slightly further out -- you'd > run into the same questions again in case you wanted to have multiple > overlapping datasets that contain the same graphs (e.g., a public > endpoint and an access-protected one). Scoping blank nodes to the > graph has the nice property of making it possible to move graphs > around between stores without anything surprising happening. > > I'm uncomfortable with the notion of some not further specified > “knowledge” that the same blank node occurs in multiple places. > > How does this “cross-graph blank node knowledge” fit with SPARQL 1.0 > and SPARQL 1.1? Can I somehow construct graphs with such overlap > given just the graph management features and update features found in > those specs? > > Are there implementations that allow blank nodes to occur in multiple > graphs, and if so, then how does the knowledge get into the store? Yes. >> As in the default-graph-as-union and the base+inference cases, >> there are uses for the subgraph relationship and then it is the >> same blank node. > > I don't see how it's relevant to default-graph-as-union, you can have > that no matter how you scope the blank nodes. But it's true that we > have use cases that perhaps require it *if* blank nodes occur in > certain places in the data: “Slicing datasets according to multiple > dimensions” and “Tracing inference results.” If it's a union of graphs (not RDF merge), then it's the same blank node. That's what set union gives you and is the effect of ignoring the 4th column in a quad store (you have to ensure distinct-ness). > >> For TriG and N-quads, I suggest blank node labels are scoped to the >> document, and across graphs. It's confusing to see two _:a to mean >> different things without much stronger scoping intuitions (esp. >> N-Quads); it makes it possible to record when you do know they are >> the same bnode (one graph a subgraph of another). > > Especially for N-Quads I would argue against this. We've found the > ability to sensibly “merge” N-Quads files just by concatenating them, > as well as other ad hoc string/line based operations, quite handy. If > _:a in two different graphs means the same thing, then that's no > longer possible, and we'd have to do the “standardize apart” dance. You can't concatentate even N-Triples if you want to merge graphs. Concatenation is union. > Also we use N-Quads a lot for storing results of web crawls, where > the notion of a blank node shared between graphs is > counter-intuitive, and where ensuring uniqueness of blank node labels > across hundreds of millions of graphs would be expensive in various > ways. There are schemes like UUIDs that provide uniqueness without central authority. A UUID is 16bytes, 128 bits. The chances of even V4 UUIDs clashing (they are 122 bit random numbers) is so remote you should worry more about disasters hitting the data-centre and backup at the same time. If you have access to a MAC address, and non-Byzantine software, V1 is even more robust and cheap to allocate. > > This whole discussion just shows again what a bloody pain blank nodes > are. I guess my position is that “blank nodes should have less > magic”, and blank nodes shared between graphs in a dataset under > certain circumstances just adds more magic that will trip people over > and cause headaches for future users and implementers (and spec > writers). When you're down a hole, the first thing to do is stop > digging. Respectfully, I disagree. This is the simple route. Once parsed and carefully kept apart, bNodes can be treated as things - later inference or application can decide whether to smush, lean or whatever. Andy > > Best, Richard > > >> >> Andy >> >
Received on Wednesday, 9 March 2011 09:07:55 UTC