- From: M. Scott Marshall <mscottmarshall@gmail.com>
- Date: Thu, 7 Oct 2010 15:17:16 +0200
- To: HCLS <public-semweb-lifesci@w3.org>, Anja Jentzsch <anja@anjeve.de>, oktie@cs.toronto.edu
- Cc: david@zepheira.com, Natasha Noy <noy@stanford.edu>, Paea LePendu <plependu@gmail.com>, Nigam Shah <nigam@stanford.edu>, Paul Groth <pgroth@few.vu.nl>, Jun Zhao <jun.zhao@zoo.ox.ac.uk>, "Eric Prud'hommeaux" <eric@w3.org>
I have two simple questions that tie in to the discussion of federation below: What is the refresh rate for DrugBank and LinkedCT? Also, where can I find the scripts used to create the current RDF renderings for DrugBank and LinkedCT? Anja, Oktie: Can you help me with the above questions? I couldn't find the answers by looking at http://esw.w3.org/HCLSIG/LODD/Data . ------------------------------------------------------------------ [CC'ing a sampling of people interested in federation and provenance.] Linked Open Drug Data task force has made important biomedical contributions to the Linked Open Data cloud. Many of us, including those in the BioRDF task force, have observed questions that arise while creating applications that access these data and SPARQL endpoints, as well as the RDF graphs behind them. Considering only the most basic provenance: * Is it an ontology or data that has been mapped to RDF (possibly populated ontology)? * When was the data last refreshed? * Where is the original source data? * What method(s) did they use? * Where is the software located (if a script or code was used)? * Who made this resource (and the script)? * Is it in OWL, SKOS, or another type of RDF? In a Semantic Web (a federation of resources), we will eventually have to choose between many sources, depending on our needs. I know for example, of several different versions of DrugBank in RDF that are being offered from multiple locations, as well as SNOMED in both SKOS and OWL. Ultimately, we would like to automate the selection of sources to federate, so that it can occur dynamically and base it on a selection policy expressed in a Semantic Web language. However, the advantages of a dynamic federation will be lost if we must consult wiki pages or people in order to select data sources and formulate our queries. So, it is crucial to harmonize approaches to expressing such provenance in RDF and make such information available in RDF when publishing any linked data. Making such bread crumbs available in RDF will enable people to carry out the whole process without having to leave SPARQL. Hopefully, what I've written above doesn't seem controversial. Where we need to build consensus is how to best represent such information so that the many federations and linked data clouds now being built can interoperate (i.e. we can eventually query them to get basic information about their contents and origins). We also need consensus on where such information should reside. I like the idea of giving each repository/graph/context in a triplestore its own URI (as done for NCBO's SPARQL endpoint) so that it can include its own 'metadata' or provenance in the graph itself. David Wood, CC'd, suggested that this is a popular approach. You could eventually aggregate such information from all graphs behind an endpoint (e.g. with a 'crawler' ) and provide it from an information service (i.e. a type of SPARQL endpoint provenance index *graph*). I understand that SPARQL 1.1 WG is considering methods of describing what is behind a SPARQL endpoint for incorporation into SPARQL. So, in an ideal scenario, finding a particular graph starting with the SPARQL endpoint might go something like this (ROUGH OUTLINE): 1) Query the SPARQL 1.1 endpoint to find out where it's 'provenance index' is located. [based on whatever SPARQL WG comes up with] 2) Query the 'provenance index' to find graphs that meet your provenance criteria. 3) Possibly query a selected graph for more provenance information that wasn't included in the index. 4) Query the selected graph. -Scott
Received on Thursday, 7 October 2010 13:18:07 UTC