Re: Dealing with distributed nature of Linked Data and SPARQL from Paul Houle on 2016-06-08 (public-lod@w3.org from June 2016)

From: Paul Houle <ontology2@gmail.com>
Date: Wed, 8 Jun 2016 09:52:03 -0400
To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
Cc: Martynas Jusevičius <martynas@graphity.org>, public-lod <public-lod@w3.org>, "public-declarative-apps@w3.org" <public-declarative-apps@w3.org>, James Anderson <james@dydra.com>, Arto Bendiken <arto@dydra.com>
Message-ID: <CAE__kdTB0uHC30NHjWW8e0aZ-cf4NK8vg0V5GhvH0o5ZM=YZ7Q@mail.gmail.com>

The vanilla RDF answer is that the data gathering module ought to pack all
of the graphs it got into named graphs that are part of a data set and then
pass that towards the consumer.

You can union the named graphs for a primitive but effective kind of
"merge" or put in some module downstream that composites the graphs in some
arbitrary manner,  such as something that converts statements about people
to foaf: vocabulary to produce enough graph that would be piped downstream
to a foaf: consumer for instance.

The named graphs give you sufficient anchor points to fill up another
dataset with metadata about what happened in the processing process so you
can follow "who is responsible for fact X?" past the initial data
transformations.

On Wed, Jun 8, 2016 at 8:29 AM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
wrote:

> Hi
>
> Option 3 seems sensible, particularly if you keep them in separate graphs.
>
> However shouldn’t you consider the provenance of the sources and
> prioritise them on how recent they were updated?
>
> Alasdair
>
> On 8 Jun 2016, at 13:06, Martynas Jusevičius <martynas@graphity.org>
> wrote:
>
> Hey all,
>
> we are developing software that consumes data both from Linked Data
> and SPARQL endpoints.
>
> Most of the time, these technologies complement each other. We've come
> across an issue though, which occurs in situations where RDF
> description of the same resources is available using both of them.
>
> Lest take a resource http://data.semanticweb.org/person/andy-seaborne
> as an example. Its RDF description is available in at least 2
> locations:
> - on a SPARQL endpoint:
>
> http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E
> - as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf
>
> These descriptions could be identical (I haven't checked), but it is
> more likely than not that they're out of sync, complementary, or
> possibly even contradicting each other, if reasoning is considered.
>
> If a software agent has access to both the SPARQL endpoint and Linked
> Data resource, what should it consider as the resource description?
> There are at least 3 options:
> 1. prioritize SPARQL description over Linked Data
> 2. prioritize Linked Data description over SPARQL
> 3. merge both descriptions
>
> I am leaning towards #3 as the sensible solution. But then I think the
> end-user should be informed which part of the description came from
> which source. This would be problematic if the descriptions are
> triples only, but should be doable with quads. That leads to another
> problem however, that both LD and SPARQL responses are under-specified
> in terms of quads.
>
> What do you think? Maybe this is a well-known issue, in which case
> please enlighten me with some articles :)
>
>
> Martynas
> atomgraph.com
> @atomgraphhq
>
>
> Alasdair J G Gray
> Fellow of the Higher Education Academy
> Assistant Professor in Computer Science,
> School of Mathematical and Computer Sciences
> (Athena SWAN Bronze Award)
> Heriot-Watt University, Edinburgh UK.
>
> Email: A.J.G.Gray@hw.ac.uk
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/0000-0002-5711-4872
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
>
>
>
>
>
>
>
>
>
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> science.
>
> The contents of this e-mail (including any attachments) are confidential.
> If you are not the intended recipient of this e-mail, any disclosure,
> copying, distribution or use of its contents is strictly prohibited, and
> you should please notify the sender immediately and then delete it
> (including any attachments) from your system.
>



-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   ontology2@gmail.com

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275

Received on Wednesday, 8 June 2016 13:52:34 UTC