Re: Dealing with distributed nature of Linked Data and SPARQL from Daniel Herzig • SearchHaus GmbH on 2016-06-08 (public-lod@w3.org from June 2016)

From: Daniel Herzig • SearchHaus GmbH <herzig@searchhaus.net>
Date: Wed, 8 Jun 2016 15:29:37 +0200
To: Martynas Jusevičius <martynas@graphity.org>
Cc: Mikel Egaña Aranguren <mikel.egana.aranguren@gmail.com>, public-lod <public-lod@w3.org>, public-declarative-apps@w3.org, James Anderson <james@dydra.com>, Arto Bendiken <arto@dydra.com>
Message-Id: <C4094B5B-5C18-41DF-BB30-1D618647B07C@searchhaus.net>

Hi Martynas,


We worked on that problem in [0] and used a merging strategy to consolidate entities.
In [1] you find a more detailed description with a screenshot [2], how this was presented to the user.
In essence, the user saw that the resulting entity was merged from several separate co-references and could open a drop-down to see the individual entities and their sources.


Cheers
Daniel



[0] http://www.aifb.kit.edu/images/9/90/82180161-federated-entity-search-using-on-the-fly-consolidation.pdf

[1] http://digbib.ubka.uni-karlsruhe.de/volltexte/documents/2938247

[2] direct link to page 171 of [1]
https://books.google.de/books?id=umg4AwAAQBAJ&lpg=PR1&ots=cp4-zPB_HH&dq=info%3AC0NKXwkks_gJ%3Ascholar.google.com&lr&pg=PA171#v=onepage&q&f=false


--
Dr. Daniel Herzig
SearchHaus GmbH

GraphScope - the smart graphsearch engine
https://graphscope.io


> On 08.06.2016, at 14:55, Martynas Jusevičius <martynas@graphity.org> wrote:
> 
> Mikel, a lot of them do, but they are not required to. Both
> datasources work as expected, it is only when trying to combine both
> of them that one runs into this situation.
> 
> I agree that each of the descriptions could go into separate named
> graphs, where the graph name could be the source URI. That is why I
> mentioned quads.
> 
> Alasdair, with provenance do you mean PROV? I'm afraid that it is not
> available in the general case. HTTP headers could possibly be used to
> extract Last-Modified dates etc. But according to RDF semantics, isn't
> it the case that assertions are never removed? So I think it would be
> wrong to ignore the "older" description -- or any "other" description
> in general.
> 
> On Wed, Jun 8, 2016 at 2:31 PM, Mikel Egaña Aranguren
> <mikel.egana.aranguren@gmail.com> wrote:
>> Hi Martynas;
>> 
>> I thought that the majority of Linked Data servers work like Pubby, i.e.,
>> they serve Linked Data resources by doing a DESCRIBE on a Triple Store,
>> therefore serving the same triples. But it seems like you have encountered
>> the opposite (Different triples served) in many systems, do you have data on
>> how prevalent this issue is?
>> 
>> Cheers
>> 
>> 2016-06-08 14:06 GMT+02:00 Martynas Jusevičius <martynas@graphity.org>:
>>> 
>>> Hey all,
>>> 
>>> we are developing software that consumes data both from Linked Data
>>> and SPARQL endpoints.
>>> 
>>> Most of the time, these technologies complement each other. We've come
>>> across an issue though, which occurs in situations where RDF
>>> description of the same resources is available using both of them.
>>> 
>>> Lest take a resource http://data.semanticweb.org/person/andy-seaborne
>>> as an example. Its RDF description is available in at least 2
>>> locations:
>>> - on a SPARQL endpoint:
>>> 
>>> http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
>>> - as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf
>>> 
>>> These descriptions could be identical (I haven't checked), but it is
>>> more likely than not that they're out of sync, complementary, or
>>> possibly even contradicting each other, if reasoning is considered.
>>> 
>>> If a software agent has access to both the SPARQL endpoint and Linked
>>> Data resource, what should it consider as the resource description?
>>> There are at least 3 options:
>>> 1. prioritize SPARQL description over Linked Data
>>> 2. prioritize Linked Data description over SPARQL
>>> 3. merge both descriptions
>>> 
>>> I am leaning towards #3 as the sensible solution. But then I think the
>>> end-user should be informed which part of the description came from
>>> which source. This would be problematic if the descriptions are
>>> triples only, but should be doable with quads. That leads to another
>>> problem however, that both LD and SPARQL responses are under-specified
>>> in terms of quads.
>>> 
>>> What do you think? Maybe this is a well-known issue, in which case
>>> please enlighten me with some articles :)
>>> 
>>> 
>>> Martynas
>>> atomgraph.com
>>> @atomgraphhq
>>> 
>> 
>> 
>> 
>> --
>> Mikel Egaña Aranguren, Ph.D.
>> 
>> http://mikeleganaaranguren.com
>> 
>> 
> 
>

Received on Wednesday, 8 June 2016 14:34:02 UTC