Re: Dealing with distributed nature of Linked Data and SPARQL from David Booth on 2016-06-08 (public-declarative-apps@w3.org from June 2016)

From: David Booth <david@dbooth.org>
Date: Wed, 8 Jun 2016 11:17:48 -0400
To: Martynas Jusevičius <martynas@graphity.org>, Mikel Egaña Aranguren <mikel.egana.aranguren@gmail.com>
Cc: public-lod <public-lod@w3.org>, public-declarative-apps@w3.org, James Anderson <james@dydra.com>, Arto Bendiken <arto@dydra.com>
Message-ID: <5758371C.30805@dbooth.org>

On 06/08/2016 08:55 AM, Martynas Jusevičius wrote:
> So I think it would be
> wrong to ignore the "older" description -- or any "other" description
> in general.

This gets into the whole area of what data you choose to believe.  Some 
data is just plain wrong, and lots of data is "correct" (i.e. usable) 
for some uses but wrong for others and will cause inconsistency when 
merged.  Very little data is universally "correct".

I think it is inescapable that when merging data from multiple sources 
you need to be careful about which data you choose to include.  Putting 
data from each source in its own named graph is one good way to help 
keep track of where it came from, and this is useful in deciding whether 
to include it.  But that provides only coarse-grained control.  You may 
well need to eliminate only a few triples from some source data in order 
to make it merge without causing inconsistencies, and it can be tedious 
to figure out which triples to drop.

Bottom line: I don't think there is any simple answer to the question of 
which data to include.  It requires a judgement call.

David Booth

Received on Wednesday, 8 June 2016 15:18:25 UTC