Dealing with distributed nature of Linked Data and SPARQL

Hey all,

we are developing software that consumes data both from Linked Data
and SPARQL endpoints.

Most of the time, these technologies complement each other. We've come
across an issue though, which occurs in situations where RDF
description of the same resources is available using both of them.

Lest take a resource http://data.semanticweb.org/person/andy-seaborne
as an example. Its RDF description is available in at least 2
locations:
- on a SPARQL endpoint:
http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
- as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf

These descriptions could be identical (I haven't checked), but it is
more likely than not that they're out of sync, complementary, or
possibly even contradicting each other, if reasoning is considered.

If a software agent has access to both the SPARQL endpoint and Linked
Data resource, what should it consider as the resource description?
There are at least 3 options:
1. prioritize SPARQL description over Linked Data
2. prioritize Linked Data description over SPARQL
3. merge both descriptions

I am leaning towards #3 as the sensible solution. But then I think the
end-user should be informed which part of the description came from
which source. This would be problematic if the descriptions are
triples only, but should be doable with quads. That leads to another
problem however, that both LD and SPARQL responses are under-specified
in terms of quads.

What do you think? Maybe this is a well-known issue, in which case
please enlighten me with some articles :)


Martynas
atomgraph.com
@atomgraphhq

Received on Wednesday, 8 June 2016 12:07:14 UTC