Re: RDF-ISSUE-17 (graph merge): How are RDF datasets to be merged? [RDF Graphs] from Steve Harris on 2011-03-29 (public-rdf-dawg@w3.org from January to March 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Tue, 29 Mar 2011 14:59:42 +0100
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: Axel Polleres <axel.polleres@deri.org>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <E24BFFE3-B779-4337-BCDC-910AFD441030@garlik.com>

On 2011-03-29, at 14:35, Andy Seaborne wrote:
>>  b) The second example that one might think would make sense would be to have ADD not preserving bnodes... what is worrying me here a bit is the fact that graphs in diffferent named graphs may have overlapping bnode labels, and that an ADD (likewise any INSERT that transfers data between graphs in the graph store) may result in unexprected new co-references... example.
>> 
>> 
>>  graph<a>    _:b1 :p _:b2 .
>>  graph<b>    _:b2 :p _:b1 .
>> 
>> Now note that
>>    ADD<a>  TO<b>
>> will result in:
>> 
>>  graph<a>    _:b1 :p _:b2 .
>>  graph<b>    _:b2 :p _:b1 . _:b1 :p _:b2 .
>> 
>> that is, bnode labels matter...  since now we have created a coreference in graph<b>  which wouldn't have happended if ADD would rely on MERGE, i.e. where the result would be something like:
>> 
>>  graph<a>    _:b1 :p _:b2 .
>>  graph<b>    _:b3 :p _:b4 . _:b5 :p _:b6 .
>> 
>> Opinions?
> 
> Operations within the store shoudl not name apart.  Either they already are apart, in which case there is no problem, or they are not, in which case something intentional was done to make it so.

Well... intentional comes by degree.

As a user I'm not sure that I would expect either

ADD <a> TO <b>

or

INSERT {
  GRAPH <G2> { ?x ?y ?z }
}
WHERE {
  GRAPH <G1> { ?x ?y ?z }
}

top necessarily copy bnodes across as-is.

It allows you to easily get into a situation where bNodes are shared between >1 graph, which wasn't previous possible with standards, I think.

It's not necessarily a bad thing, but it's also not necessarily expected.

As an implementor I see the internal logic, but I'm, not sure to what degree users see a difference between

LOAD <G1> INTO <G2>

ADD <G1> INTO <G2>

> When ever a file is read, the bNodes don't clash - it takes active measures to have them be the same and that means application intent.

Except if someone uses ADD, or INSERT?

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Tuesday, 29 March 2011 14:00:24 UTC