Re: inDataset (was Notes from today's meeting) from Alasdair J G Gray on 2013-06-04 (public-semweb-lifesci@w3.org from June 2013)

From: Alasdair J G Gray <Alasdair.Gray@manchester.ac.uk>
Date: Tue, 4 Jun 2013 10:36:56 +0100
To: Michel Dumontier <michel.dumontier@gmail.com>
Cc: Jerven Bolleman <me@jerven.eu>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Message-Id: <12236A2A-3A45-4ABB-B0DC-9B277EEC2FEB@manchester.ac.uk>

On 3 Jun 2013, at 17:51, Michel Dumontier <michel.dumontier@gmail.com> wrote:

> About void:inDataset I personally don't like it. I suspect it would cost me a 13% growth in triple size for negligible benefits. This also means that the dataset description starts to affect the data. Although I could only present this in the rest / linked data interface and not in the sparql endpoint. I am worried that I can not put it into the FTP data dump rdf. As the data item concept does not map 1:1 on a set of triples that are atomic. 
> 
> 
> i'm not sure that i completely understand your objection. the primary use of void:inDataset is to link data items to the dataset description, and as such supports linked data applications without looking at the graph for a potential, but un-guaranteed provenance description. Using void:inDataset is normal practice in the RDF / linked data community. It would be strange to not include it in any RDF dataset if you have the dataset description.
> 
> http://www.w3.org/TR/void/#backlinks
> 
>  
> e.g. someone can use just the UniProtKB sequences. Once they did that is it still the same dataset that I published it as? I don't think so. Which means uniprot end users need to be careful to remove more triples. Which why I disagree with alasdair's call for MUST.
> 
> 
> if one wanted to know which version/issue of uniprot that the sequences came from, it would be necessary to provide access to the dataset description. if the void:inDataset predicate is used, the user need not even retrieve that to store locally, as you should provide resolution services to those dataset descriptions.

I also do not follow your objection. If you have created a file that contains a subset of the data, then you can declare this to be a subset of the parent-versioned-formatted dataset, ideally with some way of distinguishing the content of the dataset. 

From all the scenarios I have encountered, scientists (not just in the healthcare and life sciences) care about where their data has come from and what version it is. As such, we need some way to allow for the linking of data back to the description of the data.

Alasdair

Dr Alasdair J G Gray
Research Associate
Alasdair.Gray@manchester.ac.uk
+44 161 275 0145

http://www.cs.man.ac.uk/~graya/

Please consider the environment before printing this email.

Received on Tuesday, 4 June 2013 09:37:27 UTC