- From: Michel Dumontier <michel.dumontier@gmail.com>
- Date: Fri, 23 Sep 2016 16:56:07 -0700
- To: Chris Mungall <cjmungall@lbl.gov>
- Cc: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>, Mark Wallace <mwallace@modusoperandi.com>, David Booth <david@dbooth.org>, Kay Müller <kay.mueller@informatik.uni-leipzig.de>, "semantic-web@w3.org" <semantic-web@w3.org>, Johannes Frey <frey@informatik.uni-leipzig.de>
yes, we've also done evaluations like this: Exposing Provenance Metadata Using Different RDF Models http://arxiv.org/abs/1509.02822 On Reasoning with RDF Statements about Statements using Singleton Property Triples http://arxiv.org/abs/1509.04513 Michel Dumontier Associate Professor of Medicine (Biomedical Informatics), Stanford University Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group http://dumontierlab.com On Fri, Sep 23, 2016 at 4:47 PM, Chris Mungall <cjmungall@lbl.gov> wrote: > There is also the Wikidata approach: > https://meta.wikimedia.org/wiki/Wikidata/Development/RDF#Statements_with_qualifiers > > This paper compares different approaches: > http://ceur-ws.org/Vol-1457/SSWS2015_paper3.pdf > > > > > On 23 Sep 2016, at 15:14, Michel Dumontier wrote: > >> Hi Sebastian, >> Bio2RDF provides its data in nquads, in which the graph name is >> annotated with dataset metadata. >> see http://download.bio2rdf.org/release/3/drugbank/ , where the .nq >> file is the provenance data as an example >> >> m. >> Michel Dumontier >> Associate Professor of Medicine (Biomedical Informatics), Stanford >> University >> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest >> Group >> http://dumontierlab.com >> >> On Fri, Sep 23, 2016 at 2:58 PM, <hellmann@informatik.uni-leipzig.de> >> wrote: >>> >>> Hi David and Mark, >>> both your answer were not helpful, sorry. >>> We are looking for triple datasets that have Metadata, i.e. serialized >>> downloadable files in any format (N3, nquad, trix, etc) that come with >>> sensible metadata (provenance, last updated/update frequncy) or as an >>> alternative triples converted from a legacy source where we could extend >>> the >>> extractor software easily to spew out useful metadata per triple. >>> >>> An example would be the datasets in the meta section here: >>> >>> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/ >>> >>> Thanks, >>> Sebastian >>> >>> Am 23. September 2016 17:16:43 MESZ, schrieb Mark Wallace >>> <mwallace@modusoperandi.com>: >>>> >>>> >>>> I like David's guidance. >>>> >>>> We have projects which require provenance on individual facts/triples >>>> (as >>>> opposed to groups of them). As David mentions, one alternative is to >>>> use a >>>> named graph for each triple (it acts like a statement ID in this case). >>>> An >>>> alternative is to use RDF Reification[1] to create a statement ID >>>> (resource) >>>> to which provenance can be "attached." The reification approach >>>> requires >>>> lots more triples, but it has the advantage in our case of leaving named >>>> graphs for other uses. In such cases, provenance triples can be 10x >>>> larger >>>> than the data set. For performance reasons, we sometimes put the >>>> provenance >>>> triples in a separate repository/store, and query/join them (using >>>> federated >>>> queries) only when the provenance is needed. >>>> >>>> [1] https://www.w3.org/TR/rdf11-mt/#whatnot >>>> >>>> -- >>>> Mark Wallace >>>> PRINCIPAL ENGINEER, SEMANTIC APPLICATIONS >>>> MODUS OPERANDI, >>>> INC. >>>> >>>> -----Original Message----- >>>> From: David Booth [mailto:david@dbooth.org] >>>> Sent: Friday, September 23, 2016 10:45 AM >>>> To: Kay Müller <kay.mueller@informatik.uni-leipzig.de>; >>>> semantic-web@w3.org >>>> Cc: Johannes Frey <frey@informatik.uni-leipzig.de>; Sebastian Hellmann >>>> <hellmann@informatik.uni-leipzig.de> >>>> Subject: Re: RDF Datasets with provenance data >>>> >>>> On 09/23/2016 10:07 AM, Kay Müller wrote: >>>>> >>>>> >>>>> Dear Sir/Madam, >>>>> >>>>> My name is Kay Mueller and I am a researcher at the University of >>>>> Leipzig. Currently we are planing to evaluate whether it is feasible >>>>> to store provenance and meta data for each triple in a graph, hence we >>>>> are wondering whether you are aware of any dataset which either stores >>>>> data at the triple level or which could be converted into this format >>>>> (e.g. >>>>> >>>>> Yago, Wikidata). >>>> >>>> >>>> >>>> The usual technique for associating provenance or other metadata with >>>> certain triples is to put those triples into a named graph, and make the >>>> provenance/metadata assertions about that named graph. A named graph >>>> can >>>> hold any number of triples, so it could hold a single triple if you want >>>> to >>>> be that fine grained. But triples are not usually created individually >>>> -- >>>> they are usually created in bunches -- so for efficiency one would >>>> usually >>>> create a named graph containing multiple triples that all have the same >>>> provenance. >>>> >>>> All major "triplestores" -- quad stores really -- and SPARQL servers >>>> support named graphs. >>>> >>>> David Booth >>>> >>>>> >>>>> We would be very grateful, if you could give us any pointers to >>>>> datasets, related work, etc. >>>>> >>>>> Thank you very much in advance. >>>>> -- >>>>> Kind >>>>> regards / Mit freundlichem Gruß >>>>> >>>>> Kay Müller >>>>> >>>>> AKSW/KILT <http://aksw.org/Groups/KILT.html> >>>>> Office: InfAI e.V., Hainstr. 11, Room 101a, 04109 Leipzig, Germany >>>>> Homepage: http://aksw.org/KayMueller.html My Twitter >>>>> <https://twitter.com/mullekay> My LinkedIn >>>>> <https://de.linkedin.com/in/mullerkay> My Xing >>>>> <https://www.xing.com/profile/Kay_Mueller12> My GitHub >>>>> <https://github.com/mullekay> My Google Scholar >>>>> <https://scholar.google.de/citations?user=8tFijv0AAAAJ> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> -- >>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail >>> gesendet. >> >> >
Received on Friday, 23 September 2016 23:56:57 UTC