Re: RDF Datasets with provenance data from Michel Dumontier on 2016-09-23 (semantic-web@w3.org from September 2016)

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Fri, 23 Sep 2016 16:56:07 -0700
To: Chris Mungall <cjmungall@lbl.gov>
Cc: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, Mark Wallace <mwallace@modusoperandi.com>, David Booth <david@dbooth.org>, Kay Müller <kay.mueller@informatik.uni-leipzig.de>, "semantic-web@w3.org" <semantic-web@w3.org>, Johannes Frey <frey@informatik.uni-leipzig.de>
Message-ID: <CALcEXf4UaR6z7YW9ZcAgO8GBf-iPL2h4bd4oc9NdQ=c-h5dPrA@mail.gmail.com>

yes, we've also done evaluations like this:

Exposing Provenance Metadata Using Different RDF Models
http://arxiv.org/abs/1509.02822

On Reasoning with RDF Statements about Statements using Singleton
Property Triples
http://arxiv.org/abs/1509.04513
Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com


On Fri, Sep 23, 2016 at 4:47 PM, Chris Mungall <cjmungall@lbl.gov> wrote:
> There is also the Wikidata approach:
> https://meta.wikimedia.org/wiki/Wikidata/Development/RDF#Statements_with_qualifiers
>
> This paper compares different approaches:
> http://ceur-ws.org/Vol-1457/SSWS2015_paper3.pdf
>
>
>
>
> On 23 Sep 2016, at 15:14, Michel Dumontier wrote:
>
>> Hi Sebastian,
>>  Bio2RDF provides its data in nquads, in which the graph name is
>> annotated with dataset metadata.
>>   see http://download.bio2rdf.org/release/3/drugbank/ , where the .nq
>> file is the provenance data as an example
>>
>> m.
>> Michel Dumontier
>> Associate Professor of Medicine (Biomedical Informatics), Stanford
>> University
>> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest
>> Group
>> http://dumontierlab.com
>>
>> On Fri, Sep 23, 2016 at 2:58 PM,  <hellmann@informatik.uni-leipzig.de>
>> wrote:
>>>
>>> Hi David and Mark,
>>> both your answer were not helpful, sorry.
>>> We are looking for triple datasets that have Metadata, i.e. serialized
>>> downloadable files in any format (N3, nquad, trix, etc) that come with
>>> sensible metadata (provenance, last updated/update frequncy) or as an
>>> alternative triples converted from a legacy source where we could extend
>>> the
>>> extractor software easily to spew out useful metadata per triple.
>>>
>>> An example would be the datasets in the meta section here:
>>>
>>> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/
>>>
>>> Thanks,
>>> Sebastian
>>>
>>> Am 23. September 2016 17:16:43 MESZ, schrieb Mark Wallace
>>> <mwallace@modusoperandi.com>:
>>>>
>>>>
>>>> I like David's guidance.
>>>>
>>>> We have projects which require provenance on individual facts/triples
>>>> (as
>>>> opposed to groups of them).  As David mentions, one alternative is to
>>>> use a
>>>> named graph for each triple (it acts like a statement ID in this case).
>>>> An
>>>> alternative is to use RDF Reification[1] to create a statement ID
>>>> (resource)
>>>> to which provenance can be "attached."  The reification approach
>>>> requires
>>>> lots more triples, but it has the advantage in our case of leaving named
>>>> graphs for other uses.   In such cases, provenance triples can be 10x
>>>> larger
>>>> than the data set.  For performance reasons, we sometimes put the
>>>> provenance
>>>> triples in a separate repository/store, and query/join them (using
>>>> federated
>>>> queries) only when the provenance is needed.
>>>>
>>>> [1] https://www.w3.org/TR/rdf11-mt/#whatnot
>>>>
>>>> --
>>>> Mark Wallace
>>>> PRINCIPAL ENGINEER, SEMANTIC APPLICATIONS
>>>> MODUS OPERANDI,
>>>> INC.
>>>>
>>>> -----Original Message-----
>>>> From: David Booth [mailto:david@dbooth.org]
>>>> Sent: Friday, September 23, 2016 10:45 AM
>>>> To: Kay Müller <kay.mueller@informatik.uni-leipzig.de>;
>>>> semantic-web@w3.org
>>>> Cc: Johannes Frey <frey@informatik.uni-leipzig.de>; Sebastian Hellmann
>>>> <hellmann@informatik.uni-leipzig.de>
>>>> Subject: Re: RDF Datasets with provenance data
>>>>
>>>> On 09/23/2016 10:07 AM, Kay Müller wrote:
>>>>>
>>>>>
>>>>>  Dear Sir/Madam,
>>>>>
>>>>>  My name is Kay Mueller and I am a researcher at the University of
>>>>>  Leipzig. Currently we are planing to evaluate whether it is feasible
>>>>>  to store provenance and meta data for each triple in a graph, hence we
>>>>>  are wondering whether you are aware of any dataset which either stores
>>>>>  data at the triple level or which could be converted into this format
>>>>> (e.g.
>>>>>
>>>>> Yago, Wikidata).
>>>>
>>>>
>>>>
>>>> The usual technique for associating provenance or other metadata with
>>>> certain triples is to put those triples into a named graph, and make the
>>>> provenance/metadata assertions about that named graph.  A named graph
>>>> can
>>>> hold any number of triples, so it could hold a single triple if you want
>>>> to
>>>> be that fine grained.  But triples are not usually created individually
>>>> --
>>>> they are usually created in bunches -- so for efficiency one would
>>>> usually
>>>> create a named graph containing multiple triples that all have the same
>>>> provenance.
>>>>
>>>> All major "triplestores" -- quad stores really -- and SPARQL servers
>>>> support named graphs.
>>>>
>>>> David Booth
>>>>
>>>>>
>>>>>  We would be very grateful, if you could give us any pointers to
>>>>>  datasets, related work, etc.
>>>>>
>>>>>  Thank you very much in advance.
>>>>>  --
>>>>>  Kind
>>>>> regards / Mit freundlichem Gruß
>>>>>
>>>>>  Kay Müller
>>>>>
>>>>>  AKSW/KILT <http://aksw.org/Groups/KILT.html>
>>>>>   Office: InfAI e.V., Hainstr. 11, Room 101a, 04109 Leipzig, Germany
>>>>>  Homepage: http://aksw.org/KayMueller.html My Twitter
>>>>>  <https://twitter.com/mullekay> My LinkedIn
>>>>>  <https://de.linkedin.com/in/mullerkay> My Xing
>>>>>  <https://www.xing.com/profile/Kay_Mueller12> My GitHub
>>>>>  <https://github.com/mullekay> My Google Scholar
>>>>>  <https://scholar.google.de/citations?user=8tFijv0AAAAJ>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail
>>> gesendet.
>>
>>
>

Received on Friday, 23 September 2016 23:56:57 UTC