Re: shared bnodes (Re: RDF dataset semantics again)

On 2012-08-29, at 10:39, Andy Seaborne wrote:
> On 29/08/12 10:08, Steve Harris wrote:
>> On 2012-08-28, at 17:55, Andy Seaborne wrote:
>>> On 28/08/12 16:49, Steve Harris wrote:
>>>>> If you don't have a syntax or protocol (such as an RDF API) for constructing graphs with shared bnodes, then, yes, you need to indicate that some kind of unification is desired/appropriate.
>>>>>> It seems to me the simple. obvious, and appropriate way to handle this for most use cases is to allow blank node labels to be shared between different parts of a multi-graph document.
>>>> It's very easy in the case where you want to indicate that the bNodes are shared - but there is some cost to it - when you want to produce the multi-graph document you need to ensure that the labels for distinct bNodes are kept distinct.
>>>> Consequently you can't do tricks like:
>>>> ( for i in *.ttl; do echo "<$i> {" ; cat $i ; echo "}" ; done ) > foo.trig
>>>> I've never done anything exactly like that, and I have no feel for
>>>> how  common a usecase it is, but it's worth noting that in RDF-2004 it would
>>>> be "safe", and in RDF 1.1 it might result in shared bNodes, depending on
>>>> how lucky you were.
>>> @prefixes ?
>> Sure, you'd need to do something tricksier if your data used a
>> fuller  range of Turtle syntax, you could add some grep and grep -v into the mix
>> but I'm not going to advocate that people parse Turtle with text
>> processing tools.
> >
>> Perhaps
>>    ( for i in *.nt; do echo "<$i> {" ; cat $i ; echo "}" ; done ) > foo.trig
>> would be a better example?
> Yes - although the relative URI for the <$i> is not portable.

Sure, it works in raptor, so passes the Tuffield Test for "real RDF".

I could equally have done
   ( for i in *.rdf; do echo "<$i> {" ; rapper -o ntriples $i ; echo "}" ; done ) > foo.trig

Which would definitely have undesirable effects, as rapper starts numbering bNode genids from 1 for each file.

> N-Triples can be turned into N-Quads quite easily ... with the same issues of share bNode labels.

N-Quads is potentially a different situation, as it has different reasons for its existence. I'm not sure I'd be comfortable having different bNode label scopes for N-Quads and Trig, but it is possible.

>> Historically (and currently) what did/does Jena do with bNode labels
>> shared between graphs in Trig? Have users ever commented one way or the
>> other? I can't remember anyone asking for 4store, but the user base is a
>> lot smaller.
> Graphs can share bNodes.

Yes, no argument about that.

>  This can happen because:
> 1/ One graph can be a subgraph of another
> 2/ One graph can be the base data for another graph which has inference.
> 3/ The app copied a statement from one model to another
>   e.g. generalization of graph being related

And typos ;)

> Because of this, when working with the data, the app may switch between which graph it is looking in for the same bNode so we do not rename them apart when they move between graphs.
> Putting the base data and the inference graph in a SPARQL dataset is something that happens.
> So shared bNodes matter in Jena.  The TriG parser scopes bNode labels to the document (although it is policy driven and the other policies exist but are not exposed in the API).

OK, interesting, and that's not caused any issues for people?

The only way you can import shared bNodes into (and export from) 4store is using Skolem URIs. I know some people have used those, but I don't have any idea of how many.

- Steve

Steve Harris, CTO
Garlik, a part of Experian
+44 7854 417 874
Registered in England and Wales 653331 VAT # 887 1335 93
Registered office: Landmark House, Experian Way, Nottingham, Notts, NG80 1ZZ

Received on Wednesday, 29 August 2012 10:34:14 UTC