- From: Paul Houle <ontology2@gmail.com>
- Date: Wed, 8 Jun 2016 14:11:23 -0400
- To: Rob Davidson <rob.les.davidson@gmail.com>
- Cc: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, Martynas Jusevičius <martynas@graphity.org>, public-lod <public-lod@w3.org>, "public-declarative-apps@w3.org" <public-declarative-apps@w3.org>, James Anderson <james@dydra.com>, Arto Bendiken <arto@dydra.com>
- Message-ID: <CAE__kdSSQDN3kURonWEV4ugAp7bytPXAMkZ+T2j5UQ8ZMHWMhg@mail.gmail.com>
You've got it! What matters is what your system believes is owl:sameAs based on its viewpoint, which could be based on who you trust to say owl:sameAs. If you are worried about "inference crashes" pruning this data is the place to start. You might want to apply algorithm X to a graph, but data Y fails to have property Z necessary for X to succeed. It is a general problem if you are sending a product downstream. A processing module can massage a dataset so that the output graph Y always has property Z or it fails and calls bloody murder if Z is not set, etc. It can emit warning messages that you could use to sweep for bad spots, etc. On Wed, Jun 8, 2016 at 1:50 PM, Rob Davidson <rob.les.davidson@gmail.com> wrote: > I'm not sure if I'm following exactly, so bear with me... > > If we have the same entity served up by two different sources then we > might expect in an ideal world that there would be an OWL:sameAs or > SKOS:exactMatch linking the two. > > If we have the same entity served by the same provider but via two > different endpoints then we might expect something a bit like a > DCAT:distribution link relating the two. > > Of course we might not have these specific links but I'm just trying to > define the likely scenarios/use-cases. > > In either case, it's possible that the descriptions would be out of date > and/or contradictory - this might cause inference crashes or simply be > confusing if we tried to merge them too closely. > > Prioritising description fields based on the distribution method seems a > little naive in that I might run either endpoint for a while, realise my > users prefer the alternative and thus change technology in a direction > unique to my users - not in a predictable fashion. > > So the only way I can see around this is to pool the descriptions but have > them distinguished using the other metadata that indicates they come from > different endpoints/sources/authors - keeping the descriptions on different > graphs I suppose. > > > > > On 8 June 2016 at 14:52, Paul Houle <ontology2@gmail.com> wrote: > >> The vanilla RDF answer is that the data gathering module ought to pack >> all of the graphs it got into named graphs that are part of a data set and >> then pass that towards the consumer. >> >> You can union the named graphs for a primitive but effective kind of >> "merge" or put in some module downstream that composites the graphs in some >> arbitrary manner, such as something that converts statements about people >> to foaf: vocabulary to produce enough graph that would be piped downstream >> to a foaf: consumer for instance. >> >> The named graphs give you sufficient anchor points to fill up another >> dataset with metadata about what happened in the processing process so you >> can follow "who is responsible for fact X?" past the initial data >> transformations. >> >> On Wed, Jun 8, 2016 at 8:29 AM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk> >> wrote: >> >>> Hi >>> >>> Option 3 seems sensible, particularly if you keep them in separate >>> graphs. >>> >>> However shouldn’t you consider the provenance of the sources and >>> prioritise them on how recent they were updated? >>> >>> Alasdair >>> >>> On 8 Jun 2016, at 13:06, Martynas Jusevičius <martynas@graphity.org> >>> wrote: >>> >>> Hey all, >>> >>> we are developing software that consumes data both from Linked Data >>> and SPARQL endpoints. >>> >>> Most of the time, these technologies complement each other. We've come >>> across an issue though, which occurs in situations where RDF >>> description of the same resources is available using both of them. >>> >>> Lest take a resource http://data.semanticweb.org/person/andy-seaborne >>> as an example. Its RDF description is available in at least 2 >>> locations: >>> - on a SPARQL endpoint: >>> >>> http://xmllondon.com/sparql?query=DESCRIBE%20%3Chttp%3A%2F%2Fdata.semanticweb.org%2Fperson%2Fandy-seaborne%3E >>> - as Linked Data: http://data.semanticweb.org/person/andy-seaborne/rdf >>> >>> These descriptions could be identical (I haven't checked), but it is >>> more likely than not that they're out of sync, complementary, or >>> possibly even contradicting each other, if reasoning is considered. >>> >>> If a software agent has access to both the SPARQL endpoint and Linked >>> Data resource, what should it consider as the resource description? >>> There are at least 3 options: >>> 1. prioritize SPARQL description over Linked Data >>> 2. prioritize Linked Data description over SPARQL >>> 3. merge both descriptions >>> >>> I am leaning towards #3 as the sensible solution. But then I think the >>> end-user should be informed which part of the description came from >>> which source. This would be problematic if the descriptions are >>> triples only, but should be doable with quads. That leads to another >>> problem however, that both LD and SPARQL responses are under-specified >>> in terms of quads. >>> >>> What do you think? Maybe this is a well-known issue, in which case >>> please enlighten me with some articles :) >>> >>> >>> Martynas >>> atomgraph.com >>> @atomgraphhq >>> >>> >>> Alasdair J G Gray >>> Fellow of the Higher Education Academy >>> Assistant Professor in Computer Science, >>> School of Mathematical and Computer Sciences >>> (Athena SWAN Bronze Award) >>> Heriot-Watt University, Edinburgh UK. >>> >>> Email: A.J.G.Gray@hw.ac.uk >>> Web: http://www.macs.hw.ac.uk/~ajg33 >>> ORCID: http://orcid.org/0000-0002-5711-4872 >>> Office: Earl Mountbatten Building 1.39 >>> Twitter: @gray_alasdair >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With >>> campuses and students across the entire globe we span the world, delivering >>> innovation and educational excellence in business, engineering, design and >>> science. >>> >>> The contents of this e-mail (including any attachments) are >>> confidential. If you are not the intended recipient of this e-mail, any >>> disclosure, copying, distribution or use of its contents is strictly >>> prohibited, and you should please notify the sender immediately and then >>> delete it (including any attachments) from your system. >>> >> >> >> >> -- >> Paul Houle >> >> *Applying Schemas for Natural Language Processing, Distributed Systems, >> Classification and Text Mining and Data Lakes* >> >> (607) 539 6254 paul.houle on Skype ontology2@gmail.com >> >> :BaseKB -- Query Freebase Data With SPARQL >> http://basekb.com/gold/ >> >> Legal Entity Identifier Lookup >> https://legalentityidentifier.info/lei/lookup/ >> <http://legalentityidentifier.info/lei/lookup/> >> >> Join our Data Lakes group on LinkedIn >> https://www.linkedin.com/grp/home?gid=8267275 >> >> > -- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254 paul.houle on Skype ontology2@gmail.com :BaseKB -- Query Freebase Data With SPARQL http://basekb.com/gold/ Legal Entity Identifier Lookup https://legalentityidentifier.info/lei/lookup/ <http://legalentityidentifier.info/lei/lookup/> Join our Data Lakes group on LinkedIn https://www.linkedin.com/grp/home?gid=8267275
Received on Wednesday, 8 June 2016 18:11:53 UTC