Re: Blank Node Identifiers and RDF Dataset Normalization

[ TL;DR - stop messing ]

Some systems (specifically 4store and 5store that I'm aware of, but I expect others) use the fact that graph labels have to be URIs as a source of optimisation.

For example:

SELECT * WHERE {
   ?g dc:date ?d .
   GRAPH ?g { ?x a foaf:Person }
}

You can restrict your search to values for ?g to URIs under current RDF semantics. Often you would want to bind dc:date first - e.g. if dc:date predicates with URI subjects in the "default graph" were rarer than graphs containing foaf:Person-s.

Specifically 5store has no index space for quads where the graph label isn't a URI - this again is an optimisation (but 4store doesn't do that). Changing that would involve a significant amount of effort, and is not something we would commit to for a feature that would be of no benefit. SPARQL explicitly states that graph labels must be URIs, so this is legit. 

Also, it's highly subjective whether having bNodes as graph identifiers is a "good thing", I have evidence from 3store that it's not, users found it generally weird (this was in the days before graph-spanning bNodes were common, perhaps that's a factor?), and didn't like the fact there there were graphs without stable identifiers. You *can* preserve bNode labels between (de)serialisations, but not many systems do, and you're not required to.

However, neither of those really the issue - I think people in the community should recognise that RDF is now a deployed system with many implementations. I believe it does serious harm to RDFs image as a "real" technology if we go about making deep changes like this for no particularly good reason.

We should have moved way beyond the time where RDF is an "emerging tech" only suitable for early-stage startups and academics. Lets start to act like we believe that.

</rant>

- Steve

On 2013-02-25, at 11:21, William Waites <ww@styx.org> wrote:

> Some RDF databases use the fact that the number of different
> predicates will be small compared to the number of different nodes in
> the subject or object position as a source of optimisation. Allowing
> blank nodes as predicates, though it would be convenient and in some
> respects more elegant would tend to break this assumption to the
> detriment of the databases that are affected. This is a very real
> concern.
> 
> Allowing blank nodes in the graph position would not, as far as I am
> aware, have a similar impact on existing implementations. My
> impression from the previous discussion is that it's an easy patch to
> the standards documents as well.

-- 
Steve Harris
Experian
+44 20 3042 4132
Registered in England and Wales 653331 VAT # 887 1335 93
80 Victoria Street, London, SW1E 5JL

Received on Monday, 25 February 2013 12:27:37 UTC