Re: Dealing with distributed nature of Linked Data and SPARQL from Martynas Jusevičius on 2016-06-08 (public-lod@w3.org from June 2016)

From: Martynas Jusevičius <martynas@graphity.org>
Date: Wed, 8 Jun 2016 14:55:15 +0200
To: Mikel Egaña Aranguren <mikel.egana.aranguren@gmail.com>
Cc: public-lod <public-lod@w3.org>, public-declarative-apps@w3.org, James Anderson <james@dydra.com>, Arto Bendiken <arto@dydra.com>
Message-ID: <CAE35VmwRcPHSGHjc8ebpvz=0SU=j4n5cCUBN_qiZGaLa_aRvww@mail.gmail.com>

Mikel, a lot of them do, but they are not required to. Both
datasources work as expected, it is only when trying to combine both
of them that one runs into this situation.

I agree that each of the descriptions could go into separate named
graphs, where the graph name could be the source URI. That is why I
mentioned quads.

Alasdair, with provenance do you mean PROV? I'm afraid that it is not
available in the general case. HTTP headers could possibly be used to
extract Last-Modified dates etc. But according to RDF semantics, isn't
it the case that assertions are never removed? So I think it would be
wrong to ignore the "older" description -- or any "other" description
in general.

On Wed, Jun 8, 2016 at 2:31 PM, Mikel Egaña Aranguren
<mikel.egana.aranguren@gmail.com> wrote:
> Hi Martynas;
>
> I thought that the majority of Linked Data servers work like Pubby, i.e.,
> they serve Linked Data resources by doing a DESCRIBE on a Triple Store,
> therefore serving the same triples. But it seems like you have encountered
> the opposite (Different triples served) in many systems, do you have data on
> how prevalent this issue is?
>
> Cheers
>
> 2016-06-08 14:06 GMT+02:00 Martynas Jusevičius <martynas@graphity.org>:
>>
>> Hey all,
>>
>> we are developing software that consumes data both from Linked Data
>> and SPARQL endpoints.
>>
>> Most of the time, these technologies complement each other. We've come
>> across an issue though, which occurs in situations where RDF
>> description of the same resources is available using both of them.
>>
>> Lest take a resource http://data.semanticweb.org/person/andy-seaborne
>> as an example. Its RDF description is available in at least 2
>> locations:
>> - on a SPARQL endpoint:
>>
>> http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
>> - as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf
>>
>> These descriptions could be identical (I haven't checked), but it is
>> more likely than not that they're out of sync, complementary, or
>> possibly even contradicting each other, if reasoning is considered.
>>
>> If a software agent has access to both the SPARQL endpoint and Linked
>> Data resource, what should it consider as the resource description?
>> There are at least 3 options:
>> 1. prioritize SPARQL description over Linked Data
>> 2. prioritize Linked Data description over SPARQL
>> 3. merge both descriptions
>>
>> I am leaning towards #3 as the sensible solution. But then I think the
>> end-user should be informed which part of the description came from
>> which source. This would be problematic if the descriptions are
>> triples only, but should be doable with quads. That leads to another
>> problem however, that both LD and SPARQL responses are under-specified
>> in terms of quads.
>>
>> What do you think? Maybe this is a well-known issue, in which case
>> please enlighten me with some articles :)
>>
>>
>> Martynas
>> atomgraph.com
>> @atomgraphhq
>>
>
>
>
> --
> Mikel Egaña Aranguren, Ph.D.
>
> http://mikeleganaaranguren.com
>
>

Received on Wednesday, 8 June 2016 12:55:49 UTC