Re: RDF Datasets with provenance data from Tobias Kuhn on 2016-09-23 (semantic-web@w3.org from September 2016)

From: Tobias Kuhn <kuhntobias@gmail.com>
Date: Fri, 23 Sep 2016 19:37:56 +0200
To: semantic-web@w3.org
Message-ID: <76842f02-a241-87c4-3196-48963d9c985b@gmail.com>
Dear Kay,

See for example here for nanopublication datasets, which come with 
fine-grained provenance: https://datahub.io/organization/nanopublications

Best regards,
Tobias


On 23.09.2016 19:09, Laufer wrote:
>
>
> Hi Kay,
>
> Maybe nano publications could help you:
> http://nanopub.org/
>
> Best Regards,
> Laufer
>
> ---
>
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>
>
>
> Em 23/09/2016 12:59, Maximilian Schich escreveu:
>
>> This will ease the choice between David's Named Graph and Mark's RDF
>> Reification solution: Potentially every (part of a) statement is
>> subject to a multiplicity of opinion. Therefore, every (part of a)
>> statement can eventually have multiple provenances, with different
>> probabilities in even another layer. If your intention is nothing more
>> than keeping track where you got your triples from, then you should go
>> with David's solution. If you want to integrate/normalize all your
>> statements while solving the multiplicity of opinion, Mark's solution
>> may end up less costly in the end.
>>
>> Hope this helps.
>>
>> Maximilian Schich
>>
>>
>>
>> *Dr. Maximilian Schich*
>> Associate Professor, Arts & Technology
>> Founding member, The Edith O'Donnell Institute of Art History
>>
>> */The University of Texas at Dallas/*
>> 800 West Campbell Road, AT10
>> Richardson, Texas 75080 – USA
>> US phone: +1-214-673-3051
>> EU phone: +49-179-667-8041
>>
>> www.utdallas.edu/atec/schich/ <http://www.utdallas.edu/atec/schich/>
>> www.schich.info <http://www.schich.info>
>> www.cultsci.net <http://www.cultsci.net>
>>
>> Current location: Dallas, Texas
>>
>>
>> On 2016-09-23 10:16, Mark Wallace wrote:
>>> I like David's guidance.
>>>
>>> We have projects which require provenance on individual facts/triples (as opposed to groups of them).  As David mentions, one alternative is to use a named graph for each triple (it acts like a statement ID in this case).  An alternative is to use RDF Reification[1] to create a statement ID (resource) to which provenance can be "attached."  The reification approach requires lots more triples, but it has the advantage in our case of leaving named graphs for other uses.   In such cases, provenance triples can be 10x larger than the data set.  For performance reasons, we sometimes put the provenance triples in a separate repository/store, and query/join them (using federated queries) only when the provenance is needed.
>>>
>>> [1] https://www.w3.org/TR/rdf11-mt/#whatnot
>>>
>>> --
>>> Mark Wallace
>>> PRINCIPAL ENGINEER, SEMANTIC APPLICATIONS
>>> MODUS OPERANDI, INC.
>>>
>>> -----Original Message-----
>>> From: David Booth [mailto:david@dbooth.org]
>>> Sent: Friday, September 23, 2016 10:45 AM
>>> To: Kay Müller <kay.mueller@informatik.uni-leipzig.de>; semantic-web@w3.org
>>> Cc: Johannes Frey <frey@informatik.uni-leipzig.de>; Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
>>> Subject: Re: RDF Datasets with provenance data
>>>
>>> On 09/23/2016 10:07 AM, Kay Müller wrote:
>>>> Dear Sir/Madam,
>>>>
>>>> My name is Kay Mueller and I am a researcher at the University of
>>>> Leipzig. Currently we are planing to evaluate whether it is feasible
>>>> to store provenance and meta data for each triple in a graph, hence we
>>>> are wondering whether you are aware of any dataset which either stores
>>>> data at the triple level or which could be converted into this format (e.g.
>>>> Yago, Wikidata).
>>> The usual technique for associating provenance or other metadata with certain triples is to put those triples into a named graph, and make the provenance/metadata assertions about that named graph.  A named graph can hold any number of triples, so it could hold a single triple if you want to be that fine grained.  But triples are not usually created individually -- they are usually created in bunches -- so for efficiency one would usually create a named graph containing multiple triples that all have the same provenance.
>>>
>>> All major "triplestores" -- quad stores really -- and SPARQL servers support named graphs.
>>>
>>> David Booth
>>>
>>>> We would be very grateful, if you could give us any pointers to
>>>> datasets, related work, etc.
>>>>
>>>> Thank you very much in advance.
>>>> --
>>>> Kind regards / Mit freundlichem Gruß
>>>>
>>>> Kay Müller
>>>>
>>>> AKSW/KILT <http://aksw.org/Groups/KILT.html>
>>>>  Office: InfAI e.V., Hainstr. 11, Room 101a, 04109 Leipzig, Germany
>>>> Homepage: http://aksw.org/KayMueller.html My Twitter
>>>> <https://twitter.com/mullekay> My LinkedIn
>>>> <https://de.linkedin.com/in/mullerkay> My Xing
>>>> <https://www.xing.com/profile/Kay_Mueller12> My GitHub
>>>> <https://github.com/mullekay> My Google Scholar
>>>> <https://scholar.google.de/citations?user=8tFijv0AAAAJ>
>>>>
>>>>
>>>
>>
>>
Received on Friday, 23 September 2016 17:38:31 UTC