Re: bNodes as graph identifiers

On Thu, May 30, 2013 at 1:21 PM, Charles Greer <cgreer@marklogic.com> wrote:

> Hi Steve,
>
> (This is not a formal response.)
>
> As a database practitioner and implementer I can appreciate your position,
> but of late I've not seen a conflict between an RDF data model that
> supports bnodes as graph labels, and a database technology that does not
> and cannot use them internally.  The RDF data model in this case applies to
> possible serialization formats and parsing, but it seems to me that the
> internal representation of bnodes inside of a database will always be tied
> to some kind of more universal identifier.
>
> We know that bnode identifiers are meaningless outside of the context of a
> given document.  Consequently we can't require that bnode identifiers be
> preserved when copying/merging datasets; so the requirement that your
> system be able to ingest (and skolemize) bnodes doesn't seem to require
> that RDF itself not use bnodes as graph labels.
>
> In other words, I don't see why this change would necessarily affect your
> implementation, except as far as ingestion goes.
>
> Does that at all start to mitigate your objection?


Kinda makes mine for me. Blank nodes as graph labels are effectively
useless, you can't exchange them as every parse creates not just new graphs
with the same labels (normal blank nodes in graphs that you can test for
isomorphism with), but a totally new dataset. And unlike with graphs we
haven't defined how datasets are isomorphic (or even if they are!) so you
can't even tell that it's an equivalent dataset... ewww.

Cheers,
Gavin


> Charles
>
>
>
> On 05/29/2013 10:47 AM, Steve Harris wrote:
>
>> [ as a side note I find it bizarre that I'm having to advocate NOT
>> changing a 14
>> year old, industrially deployed spec, at the 11th hour of the
>> standardisation
>> process, to add a feature that's used by a tiny minority of deployed
>> systems -
>> if anything was to strike an outsider as peculiar about this WGs process,
>> it
>> would surely be this feature ]
>>
>> TL;DR: don't mess.
>>
>> We know that bNode graph identifiers are possible (I've designed a system
>> myself
>> that had them) and that there are usecases that are addressed by it, but
>> I've
>> not heard anything yet that can't already be addressed using RDF/SPARQL
>> as it
>> stands. It is the opinion of some people that bNodes as graph identifiers
>> address it better, in some way, but that's another matter.
>>
>> There are however some costs to extending RDF (datasets) to require that
>> bNodes
>> be usable as graph identifiers:
>>
>> * We (Experian) have invested millions of dollars in our RDF engine -
>> it's very
>> tightly optimised to the current specs, and opening up the space of graph
>> identifiers from a single class (URIs) to two classes (URIs and bNodes)
>> would
>> have a significant engineering, and storage cost. Put simply, we wouldn't
>> do it,
>> and would just step away from later RDF specs, becoming an RDF/SPARQL
>> flavoured
>> graph database.
>>
>> * RDF is already too complex for people coming into it to learn easily.
>> Every
>> time we add a new feature to the language we increase the barrier to
>> entry.
>>
>> * There's no practical way to refer to long lived bNodes in SPARQL
>> (without
>> enforced skolemisation), people will import datasets with bNode graphs,
>> and then
>> realise they can't isolate their data (presumably after posting on stack
>> overflow or similar :) ).
>> The following will not retrieve your original data* and this will just
>> promote
>> more confusion:
>>
>>         Data:
>>         _:abc { :s :p :o }
>>
>>         Query:
>>         SELECT * WHERE { GRAPH _:abc { :s ?p ?o } }
>>
>>         you could possibly do something like:
>>
>>         SELECT * WHERE { GRAPH ?g { :s ?p ?o } } FILTER(STR(?g) =abc")
>>
>>         That's pretty inconvenient, in many ways, and isn't required to
>> work by SPARQL 1.1.
>>         It is only possible at all in systems that preserve bNode labels,
>> which is not
>> required.
>>
>> * Confusion with bNodes-in-graphs, and bNodes-as-graph-identifiers - the
>> discussion seems to assume that they're separate kinds of thing, maybe
>> with
>> identifier bNodes not being existential variables? Which ever way it goes
>> the
>> relationship between bNodes-in-graphs, and bNodes-identifying-graphs is
>> going to
>> be complex.
>>
>> * Of all the extensions that are implemented by a small number of systems
>> as an
>> extension, this seems like an odd one to pick. IMHO there are far more
>> serious
>> problems with RDF. There is a cost (to this group, and the wider
>> community) of
>> any changes, so lets pick our battles wisely.
>>
>> * There's very little implementation experience - compared to the other
>> things
>> we're standardising: URI quads, bNode skolemisation, Turtle, NQuads. It's
>> not
>> clear how far the existential variable-ness should extend - do we
>> sanction graph
>> leaning? Do URI-identified graphs infer identical graphs identified by
>> bNodes?
>> If not, why not? What do bNodes with a given label, in graphs identified
>> by a
>> bNode with a different label refer to, etc.
>>
>>         _:abc {
>> } _:def (
>>
>> } One graph, or two, or undefined? I don't think we know the right answer
>> yet.
>> So, in summary, I think the cost is high, and the benefit is vanishingly
>> small.
>> Nothing stops people that feel they really need it adding them to RDF
>> systems,
>> as they have in the past. One counter argument is that JSON-LD will do it
>> anyway, but that's fine - if it is widely used, it can be adopted into
>> RDF 1.2,
>> with plenty of implementation experience. In the meantime JSON-LD
>> serialisers
>> can skolemise when transforming JSON-LD into RDF - there's other places
>> where
>> the transform is lossy anyway, as far as I understand it. - Steve * this
>> was
>> possibly an error in the SPARQL 1.0 spec, but sadly the bNodes as
>> variables
>> feature is quite widely used, and many people argued in favour of the
>> feature.
>>
>> -- Steve Harris Experian +44 20 3042 4132 Registered in England and Wales
>> 653331
>> VAT # 887 1335 93 80 Victoria Street, London, SW1E 5JL
>>
>
> --
> Charles Greer
> Senior Engineer
> MarkLogic Corporation
> charles.greer@marklogic.com
> Phone: +1 707 408 3277
> www.marklogic.com
>
>
>

Received on Thursday, 30 May 2013 20:28:27 UTC