Re: RDF Datasets with provenance data

There is also the Wikidata approach:
https://meta.wikimedia.org/wiki/Wikidata/Development/RDF#Statements_with_qualifiers

This paper compares different approaches:
http://ceur-ws.org/Vol-1457/SSWS2015_paper3.pdf



On 23 Sep 2016, at 15:14, Michel Dumontier wrote:

> Hi Sebastian,
>  Bio2RDF provides its data in nquads, in which the graph name is
> annotated with dataset metadata.
>   see http://download.bio2rdf.org/release/3/drugbank/ , where the .nq
> file is the provenance data as an example
>
> m.
> Michel Dumontier
> Associate Professor of Medicine (Biomedical Informatics), Stanford 
> University
> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest 
> Group
> http://dumontierlab.com
>
> On Fri, Sep 23, 2016 at 2:58 PM,  <hellmann@informatik.uni-leipzig.de> 
> wrote:
>> Hi David and Mark,
>> both your answer were not helpful, sorry.
>> We are looking for triple datasets that have Metadata, i.e. 
>> serialized
>> downloadable files in any format (N3, nquad, trix, etc) that come 
>> with
>> sensible metadata (provenance, last updated/update frequncy) or as an
>> alternative triples converted from a legacy source where we could 
>> extend the
>> extractor software easily to spew out useful metadata per triple.
>>
>> An example would be the datasets in the meta section here:
>> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/
>>
>> Thanks,
>> Sebastian
>>
>> Am 23. September 2016 17:16:43 MESZ, schrieb Mark Wallace
>> <mwallace@modusoperandi.com>:
>>>
>>> I like David's guidance.
>>>
>>> We have projects which require provenance on individual 
>>> facts/triples (as
>>> opposed to groups of them).  As David mentions, one alternative is 
>>> to use a
>>> named graph for each triple (it acts like a statement ID in this 
>>> case).  An
>>> alternative is to use RDF Reification[1] to create a statement ID 
>>> (resource)
>>> to which provenance can be "attached."  The reification approach 
>>> requires
>>> lots more triples, but it has the advantage in our case of leaving 
>>> named
>>> graphs for other uses.   In such cases, provenance triples can be 
>>> 10x larger
>>> than the data set.  For performance reasons, we sometimes put the 
>>> provenance
>>> triples in a separate repository/store, and query/join them (using 
>>> federated
>>> queries) only when the provenance is needed.
>>>
>>> [1] https://www.w3.org/TR/rdf11-mt/#whatnot
>>>
>>> --
>>> Mark Wallace
>>> PRINCIPAL ENGINEER, SEMANTIC APPLICATIONS
>>> MODUS OPERANDI,
>>> INC.
>>>
>>> -----Original Message-----
>>> From: David Booth [mailto:david@dbooth.org]
>>> Sent: Friday, September 23, 2016 10:45 AM
>>> To: Kay Müller <kay.mueller@informatik.uni-leipzig.de>;
>>> semantic-web@w3.org
>>> Cc: Johannes Frey <frey@informatik.uni-leipzig.de>; Sebastian 
>>> Hellmann
>>> <hellmann@informatik.uni-leipzig.de>
>>> Subject: Re: RDF Datasets with provenance data
>>>
>>> On 09/23/2016 10:07 AM, Kay Müller wrote:
>>>>
>>>>  Dear Sir/Madam,
>>>>
>>>>  My name is Kay Mueller and I am a researcher at the University of
>>>>  Leipzig. Currently we are planing to evaluate whether it is 
>>>> feasible
>>>>  to store provenance and meta data for each triple in a graph, 
>>>> hence we
>>>>  are wondering whether you are aware of any dataset which either 
>>>> stores
>>>>  data at the triple level or which could be converted into this 
>>>> format
>>>> (e.g.
>>>>
>>>> Yago, Wikidata).
>>>
>>>
>>> The usual technique for associating provenance or other metadata 
>>> with
>>> certain triples is to put those triples into a named graph, and make 
>>> the
>>> provenance/metadata assertions about that named graph.  A named 
>>> graph can
>>> hold any number of triples, so it could hold a single triple if you 
>>> want to
>>> be that fine grained.  But triples are not usually created 
>>> individually --
>>> they are usually created in bunches -- so for efficiency one would 
>>> usually
>>> create a named graph containing multiple triples that all have the 
>>> same
>>> provenance.
>>>
>>> All major "triplestores" -- quad stores really -- and SPARQL servers
>>> support named graphs.
>>>
>>> David Booth
>>>
>>>>
>>>>  We would be very grateful, if you could give us any pointers to
>>>>  datasets, related work, etc.
>>>>
>>>>  Thank you very much in advance.
>>>>  --
>>>>  Kind
>>>> regards / Mit freundlichem Gruß
>>>>
>>>>  Kay Müller
>>>>
>>>>  AKSW/KILT <http://aksw.org/Groups/KILT.html>
>>>>   Office: InfAI e.V., Hainstr. 11, Room 101a, 04109 Leipzig, 
>>>> Germany
>>>>  Homepage: http://aksw.org/KayMueller.html My Twitter
>>>>  <https://twitter.com/mullekay> My LinkedIn
>>>>  <https://de.linkedin.com/in/mullerkay> My Xing
>>>>  <https://www.xing.com/profile/Kay_Mueller12> My GitHub
>>>>  <https://github.com/mullekay> My Google Scholar
>>>>  <https://scholar.google.de/citations?user=8tFijv0AAAAJ>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail 
>> gesendet.
>

Received on Friday, 23 September 2016 23:48:00 UTC