- From: Chris Mungall <cjmungall@lbl.gov>
- Date: Fri, 23 Sep 2016 16:47:24 -0700
- To: "Michel Dumontier" <michel.dumontier@gmail.com>
- Cc: "Sebastian Hellmann" <hellmann@informatik.uni-leipzig.de>, "Mark Wallace" <mwallace@modusoperandi.com>, "David Booth" <david@dbooth.org>, "Kay Müller" <kay.mueller@informatik.uni-leipzig.de>, "semantic-web@w3.org" <semantic-web@w3.org>, "Johannes Frey" <frey@informatik.uni-leipzig.de>
There is also the Wikidata approach: https://meta.wikimedia.org/wiki/Wikidata/Development/RDF#Statements_with_qualifiers This paper compares different approaches: http://ceur-ws.org/Vol-1457/SSWS2015_paper3.pdf On 23 Sep 2016, at 15:14, Michel Dumontier wrote: > Hi Sebastian, > Bio2RDF provides its data in nquads, in which the graph name is > annotated with dataset metadata. > see http://download.bio2rdf.org/release/3/drugbank/ , where the .nq > file is the provenance data as an example > > m. > Michel Dumontier > Associate Professor of Medicine (Biomedical Informatics), Stanford > University > Chair, W3C Semantic Web for Health Care and the Life Sciences Interest > Group > http://dumontierlab.com > > On Fri, Sep 23, 2016 at 2:58 PM, <hellmann@informatik.uni-leipzig.de> > wrote: >> Hi David and Mark, >> both your answer were not helpful, sorry. >> We are looking for triple datasets that have Metadata, i.e. >> serialized >> downloadable files in any format (N3, nquad, trix, etc) that come >> with >> sensible metadata (provenance, last updated/update frequncy) or as an >> alternative triples converted from a legacy source where we could >> extend the >> extractor software easily to spew out useful metadata per triple. >> >> An example would be the datasets in the meta section here: >> http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/ >> >> Thanks, >> Sebastian >> >> Am 23. September 2016 17:16:43 MESZ, schrieb Mark Wallace >> <mwallace@modusoperandi.com>: >>> >>> I like David's guidance. >>> >>> We have projects which require provenance on individual >>> facts/triples (as >>> opposed to groups of them). As David mentions, one alternative is >>> to use a >>> named graph for each triple (it acts like a statement ID in this >>> case). An >>> alternative is to use RDF Reification[1] to create a statement ID >>> (resource) >>> to which provenance can be "attached." The reification approach >>> requires >>> lots more triples, but it has the advantage in our case of leaving >>> named >>> graphs for other uses. In such cases, provenance triples can be >>> 10x larger >>> than the data set. For performance reasons, we sometimes put the >>> provenance >>> triples in a separate repository/store, and query/join them (using >>> federated >>> queries) only when the provenance is needed. >>> >>> [1] https://www.w3.org/TR/rdf11-mt/#whatnot >>> >>> -- >>> Mark Wallace >>> PRINCIPAL ENGINEER, SEMANTIC APPLICATIONS >>> MODUS OPERANDI, >>> INC. >>> >>> -----Original Message----- >>> From: David Booth [mailto:david@dbooth.org] >>> Sent: Friday, September 23, 2016 10:45 AM >>> To: Kay Müller <kay.mueller@informatik.uni-leipzig.de>; >>> semantic-web@w3.org >>> Cc: Johannes Frey <frey@informatik.uni-leipzig.de>; Sebastian >>> Hellmann >>> <hellmann@informatik.uni-leipzig.de> >>> Subject: Re: RDF Datasets with provenance data >>> >>> On 09/23/2016 10:07 AM, Kay Müller wrote: >>>> >>>> Dear Sir/Madam, >>>> >>>> My name is Kay Mueller and I am a researcher at the University of >>>> Leipzig. Currently we are planing to evaluate whether it is >>>> feasible >>>> to store provenance and meta data for each triple in a graph, >>>> hence we >>>> are wondering whether you are aware of any dataset which either >>>> stores >>>> data at the triple level or which could be converted into this >>>> format >>>> (e.g. >>>> >>>> Yago, Wikidata). >>> >>> >>> The usual technique for associating provenance or other metadata >>> with >>> certain triples is to put those triples into a named graph, and make >>> the >>> provenance/metadata assertions about that named graph. A named >>> graph can >>> hold any number of triples, so it could hold a single triple if you >>> want to >>> be that fine grained. But triples are not usually created >>> individually -- >>> they are usually created in bunches -- so for efficiency one would >>> usually >>> create a named graph containing multiple triples that all have the >>> same >>> provenance. >>> >>> All major "triplestores" -- quad stores really -- and SPARQL servers >>> support named graphs. >>> >>> David Booth >>> >>>> >>>> We would be very grateful, if you could give us any pointers to >>>> datasets, related work, etc. >>>> >>>> Thank you very much in advance. >>>> -- >>>> Kind >>>> regards / Mit freundlichem Gruß >>>> >>>> Kay Müller >>>> >>>> AKSW/KILT <http://aksw.org/Groups/KILT.html> >>>> Office: InfAI e.V., Hainstr. 11, Room 101a, 04109 Leipzig, >>>> Germany >>>> Homepage: http://aksw.org/KayMueller.html My Twitter >>>> <https://twitter.com/mullekay> My LinkedIn >>>> <https://de.linkedin.com/in/mullerkay> My Xing >>>> <https://www.xing.com/profile/Kay_Mueller12> My GitHub >>>> <https://github.com/mullekay> My Google Scholar >>>> <https://scholar.google.de/citations?user=8tFijv0AAAAJ> >>> >>> >>> >>> >>> >> >> -- >> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail >> gesendet. >
Received on Friday, 23 September 2016 23:48:00 UTC