Re: shared bnodes (Skolems, SPARQL) from Sandro Hawke on 2012-08-29 (public-rdf-wg@w3.org from August 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 29 Aug 2012 14:50:11 -0400
To: Steve Harris <steve.harris@garlik.com>
CC: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-ID: <503E6463.80401@w3.org>
On 08/29/2012 11:19 AM, Steve Harris wrote:
> On 2012-08-29, at 15:59, Sandro Hawke wrote:
>> On 08/29/2012 08:43 AM, Steve Harris wrote:
>>> On 2012-08-29, at 13:15, Sandro Hawke wrote:
>>>> On 08/28/2012 11:52 AM, Steve Harris wrote:
>>>>> On 2012-08-24, at 16:52, Sandro Hawke wrote:
>>>>>
>>>>> … snip …
>>>>>
>>>>>>> And sub/union graphs in general.
>>>>>>>
>>>>>>> Union graphs for those systems that already make one graph the union of all others.  Whether we like it or not, those systems are common, even maybe even the majority, and have been for several years.
>>>>>>>
>>>>>>> It is the compromise of the context point-of-view and the multiple-graphs point-of-view.  In the context POV,
>>>>>>>
>>>>>>> (this is not advocacy, more like 'history')
>>>>>>>
>>>>>> agreed.
>>>>>>
>>>>>> To put that slightly differently: shared bnodes are also required for the SPARQL dump & restore use case.
>>>>> Yup, that was one of the motivations for Skolem URIs.
>>>> How would that work, if there was already Skolmized RDF in the dataset?    (And there will be, if your dataset comes from crawling other people's data sources, and those data sources emit Skolemized RDF, as we're expecting they will sometimes.)
>>>>
>>>> I guess you could make a new Skolem prefix (eg http://example.com/.well-known/genid/backup-20120829T081103/) and genid your bnodes to new URLs starting with that string -- and then pass that string along with the backup file.     But keeping those together might be difficult, and if you're going to do that, there's no need for any sort of standard format for Skolems.
>>> Well, in 4store (for e.g.) the Skolem URIs generally look like:
>>>     http://4store.org/.well-known/genid/[UUID_for_DB]/[ID_number]
>>> so the store can recognise it's own bNodes, and convert them back into internal IDs if it gets them back.
>> I'm not quite sure what a DB is, but it seems like it would be kind of hard control whether the nodes are de-Skolemized on database-restore -- users would have to understand whether they were loading it into the "same" DB or not.
> DB = Database / graphstore / instance / whatever term you want to use.
>
> The users don't need to understand it, that case was covered by my 2nd para, if it did come from this one, then it will recognise the prefix+UUID if it gets it back. That case is easy.

But how does a user know when two 4store instances are the "same" DB?   
Sometimes I want to move a database from one instance to another (or is 
it the same one? I dunno) without anything about being changed.  It 
sounds like with the Skolem approach, I couldn't do that.    (Although, 
yes, the change would be at a level that we might be considering noise.)

>>> Other Skolem URIs in .well-known form can also be converted into new internal bNode identifiers, in the same way bNode labels are, but they're globally unique, so you can safely map any Skolem URI to the same bNode ID across graphs. I don't know for sure if 4store does this or not, but it could.
>> Is it okay for RDF client software to silently and automatically turn Skolem IRIs back into blank nodes?    (That will change the results of some SPARQL queries on that data.)   If it does this, how long does it have to keep the IRI-bnode map around?   For as long as it has that blank node?
> That's a good question. http://www.w3.org/TR/2011/WD-rdf11-concepts-20110830/#section-skolemization says you can turn bNodes into Skolem URIs, which also changes SPARQL queries… unless ISBLANK(<http://example.com/.well-known/genid/1>) is true, which I believe it is not.
>
> It doesn't say that you can turn Skolem URIs into internal bNode identifiers (not actually the same thing as a bNode, but that's more-or-less an implementation detail). It is something we discussed at the F2F though, the rationale being that internal bNode identifiers are more efficient to store.
>
>> I don't know the best answer to these questions, or even if we have to answer them, I guess.
> The one thing we need to agree is what happens if you see:
>
> <http://example.com/a> { _:a a <Foo> }
> <http://example.com/b> { _:a a <Bar> }
>
> i.e. is there one bNode in two graphs, or two one in each graph.

Exactly.   This is ISSUE-21 ("Can Node-IDs be shared between parts of a 
quad/multigraph format?")

We could do a strawpoll on that here and now.

My vote, not surprising anyone, would be:

+1 (shared bnodes are needed for several use cases and are simpler than 
using Skolem nodes)

    -- Sandro



> - Steve
>
>>>   To be honest bNode sharing in Trig hasn't come up in user requests, so I don't think anyone's tried it.
>> *nod*
>>
>>       -- Sandro
>>
>>
>>
Received on Wednesday, 29 August 2012 18:50:21 UTC