Re: Linked Data, Blank Nodes and Graph Names

Hi Nathan,

On Thu, 2011-04-07 at 18:45 +0100, Nathan wrote: 
> Hi All,
> 
> To cut a long story short, blank nodes are a bit of a PITA to work with, 
> they make data management more complex, new comers don't "get" them 
> (lest presented as anonymous objects), and they make graph operations 
> much more complex than they need be, because although a graph is a set 
> of triples, you can't (easily) do basic set operations on non-ground 
> graphs, which ultimately filters down to making things such as graph 
> diff, signing, equality testing, checking if one graph is a super/sub 
> set of another very difficult. Safe to say then, on one side of things 
> Linked Data / RDF would be a whole lot simpler without those blank nodes.
> 
> It's probably worth asking then, in a Linked Data + RDF environment:
> 
> - would you be happy to give up blank nodes?

Happy, no.

>From the point of view of data modelling and management I could live
without them, though I do find them helpful.

>From the point of view of managing legacy, and having to maintain tool
chains that do that, then "no". Maybe if they had never existed that
might have been better but the cost of putting the genie back in the
bottle is too great. Just imagine, for example, the cost re-specifying
OWL so it could be encoded in an RDF-without-blank nodes and that's just
one example.

> - just the [] syntax?

? What's the syntax got to do with it?

> - do you always have a "name" for your graphs? (for instance when 
> published on the web, the URL you GET, and when in a store, the ?G of 
> the quad?

Nope.

> I'm asking because there are multiple things that could be done:
> 
> 1) change nothing

+1

> 2) remove blank nodes from RDF

-1

> 3) create a subset of RDF which doesn't have blank nodes and only deals 
> with ground graphs

-1

That's may be the worst of both worlds. Then you would have tools which
only deal with "ground" RDF and tools that support and use blank nodes.
The simpler tools wouldn't even be able to parse large tracts of
existing data including the normative encoding of OWL. I don't see such
fragmentation as healthy.

> 4) create a subset of RDF which does have a way of differentiating blank 
> nodes from URI-References, where each blank node is named persistently 
> as something like ( graph-name , _:b1 ), which would allow the subset to 
> be effectively "ground" so that all the benefits of stable names and set 
> operations are maintained for data management, but where also it can be 
> converted (one way) to full RDF by removing those persistent names.

How that does that solve anything?

Assuming the semantics is retained then to do any graph comparisons or
deltas you will still need to do the equivalent of graph isomorphism its
just that now you are matching nodes with an external arbitrary label
instead of ones which just have an internal arbitrary label. Doesn't
change *any* of the problems you list, even complicates things by having
one more concept to explain to people.

The one thing this approach would facilitate is essentially round
tripping back to the same graph in the same store. If you get a query
result containing leaf nodes you would then be guaranteed to be able to
ask for more about those leaves. I can see benefit in that and it would
be make to coexist with the current RDF but it doesn't touch the other
problems.

Dave

Received on Thursday, 7 April 2011 21:29:55 UTC