W3C home > Mailing lists > Public > public-rdf-wg@w3.org > August 2012

Re: shared bnodes (Skolems, SPARQL)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 29 Aug 2012 16:19:47 +0100
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-Id: <31DDFBBB-3FB4-4B53-877A-EC757701C7CC@garlik.com>
To: Sandro Hawke <sandro@w3.org>
On 2012-08-29, at 15:59, Sandro Hawke wrote:
> On 08/29/2012 08:43 AM, Steve Harris wrote:
>> On 2012-08-29, at 13:15, Sandro Hawke wrote:
>>> On 08/28/2012 11:52 AM, Steve Harris wrote:
>>>> On 2012-08-24, at 16:52, Sandro Hawke wrote:
>>>> … snip …
>>>>>> And sub/union graphs in general.
>>>>>> Union graphs for those systems that already make one graph the union of all others.  Whether we like it or not, those systems are common, even maybe even the majority, and have been for several years.
>>>>>> It is the compromise of the context point-of-view and the multiple-graphs point-of-view.  In the context POV,
>>>>>> (this is not advocacy, more like 'history')
>>>>> agreed.
>>>>> To put that slightly differently: shared bnodes are also required for the SPARQL dump & restore use case.
>>>> Yup, that was one of the motivations for Skolem URIs.
>>> How would that work, if there was already Skolmized RDF in the dataset?    (And there will be, if your dataset comes from crawling other people's data sources, and those data sources emit Skolemized RDF, as we're expecting they will sometimes.)
>>> I guess you could make a new Skolem prefix (eg http://example.com/.well-known/genid/backup-20120829T081103/) and genid your bnodes to new URLs starting with that string -- and then pass that string along with the backup file.     But keeping those together might be difficult, and if you're going to do that, there's no need for any sort of standard format for Skolems.
>> Well, in 4store (for e.g.) the Skolem URIs generally look like:
>>    http://4store.org/.well-known/genid/[UUID_for_DB]/[ID_number]
>> so the store can recognise it's own bNodes, and convert them back into internal IDs if it gets them back.
> I'm not quite sure what a DB is, but it seems like it would be kind of hard control whether the nodes are de-Skolemized on database-restore -- users would have to understand whether they were loading it into the "same" DB or not.

DB = Database / graphstore / instance / whatever term you want to use.

The users don't need to understand it, that case was covered by my 2nd para, if it did come from this one, then it will recognise the prefix+UUID if it gets it back. That case is easy.

>> Other Skolem URIs in .well-known form can also be converted into new internal bNode identifiers, in the same way bNode labels are, but they're globally unique, so you can safely map any Skolem URI to the same bNode ID across graphs. I don't know for sure if 4store does this or not, but it could.
> Is it okay for RDF client software to silently and automatically turn Skolem IRIs back into blank nodes?    (That will change the results of some SPARQL queries on that data.)   If it does this, how long does it have to keep the IRI-bnode map around?   For as long as it has that blank node?

That's a good question. http://www.w3.org/TR/2011/WD-rdf11-concepts-20110830/#section-skolemization says you can turn bNodes into Skolem URIs, which also changes SPARQL queries… unless ISBLANK(<http://example.com/.well-known/genid/1>) is true, which I believe it is not.

It doesn't say that you can turn Skolem URIs into internal bNode identifiers (not actually the same thing as a bNode, but that's more-or-less an implementation detail). It is something we discussed at the F2F though, the rationale being that internal bNode identifiers are more efficient to store.

> I don't know the best answer to these questions, or even if we have to answer them, I guess.

The one thing we need to agree is what happens if you see:

<http://example.com/a> { _:a a <Foo> }
<http://example.com/b> { _:a a <Bar> } 

i.e. is there one bNode in two graphs, or two one in each graph.

- Steve

>>  To be honest bNode sharing in Trig hasn't come up in user requests, so I don't think anyone's tried it.
> *nod*
>      -- Sandro

Steve Harris, CTO
Garlik, a part of Experian
+44 7854 417 874  http://www.garlik.com/
Registered in England and Wales 653331 VAT # 887 1335 93
Registered office: Landmark House, Experian Way, Nottingham, Notts, NG80 1ZZ
Received on Wednesday, 29 August 2012 15:20:23 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:20 UTC