- From: Jerven Bolleman <me@jerven.eu>
- Date: Tue, 4 Jun 2013 14:40:28 +0200
- To: Alasdair J G Gray <Alasdair.Gray@manchester.ac.uk>
- Cc: Michel Dumontier <michel.dumontier@gmail.com>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
- Message-ID: <CAHM_hUM1LXs4ua8DVBKg56t2OuVkfmkQfD4gerKOAeT9jCLoug@mail.gmail.com>
On Tue, Jun 4, 2013 at 11:36 AM, Alasdair J G Gray < Alasdair.Gray@manchester.ac.uk> wrote: > > On 3 Jun 2013, at 17:51, Michel Dumontier <michel.dumontier@gmail.com> > wrote: > > About void:inDataset I personally don't like it. I suspect it would cost >> me a 13% growth in triple size for negligible benefits. This also means >> that the dataset description starts to affect the data. Although I could >> only present this in the rest / linked data interface and not in the sparql >> endpoint. I am worried that I can not put it into the FTP data dump rdf. As >> the data item concept does not map 1:1 on a set of triples that are atomic. >> >> > i'm not sure that i completely understand your objection. the primary use > of void:inDataset is to link data items to the dataset description, and as > such supports linked data applications without looking at the graph for a > potential, but un-guaranteed provenance description. Using void:inDataset > is normal practice in the RDF / linked data community. It would be strange > to not include it in any RDF dataset if you have the dataset description. > > http://www.w3.org/TR/void/#backlinks > > > >> e.g. someone can use just the UniProtKB sequences. Once they did that is >> it still the same dataset that I published it as? I don't think so. Which >> means uniprot end users need to be careful to remove more triples. Which >> why I disagree with alasdair's call for MUST. >> >> > if one wanted to know which version/issue of uniprot that the sequences > came from, it would be necessary to provide access to the dataset > description. if the void:inDataset predicate is used, the user need not > even retrieve that to store locally, as you should provide resolution > services to those dataset descriptions. > > > I also do not follow your objection. If you have created a file that > contains a subset of the data, then you can declare this to be a subset of > the parent-versioned-formatted dataset, ideally with some way of > distinguishing the content of the dataset. > I will try to explain my objections. The fist is the dataset is a set of triples while the void:inDataset is a predicate on a single resource/entity/subject. So as I have 1.4 billion entities I would add 1.4 billion void:inDataset triples. Which to me seems like the incorrect thing to do. Well you say you should only add them to the "important" resources and then we are down to a 100 million of these statements. Yet for users who use slices of our data these void:inDataset triples are annoying/misleading especially if they merge them with their own sources. e.g. uniref:UniRef100_ up:sequenceFor uniprot:P12345 . uniprot:P12345 a up:Protein ; void:inDataset dataset:uniprot . dataset:uniprot dcterms:licence cc:by-sa-v3 . uniprot:P12345 .roche:activatedBy secretdrugchemical:1000 . secretdrugchemical:1000 void:inDataset top:secret . Given these triples what is the license for knowledge about secretdrugchemical:1000 activating uniprot:P12345? The dataset description is about a set of data, not single triples so single back links seem to me to be the incorrect solution? > From all the scenarios I have encountered, scientists (not just in the > healthcare and life sciences) care about where their data has come from and > what version it is. As such, we need some way to allow for the linking of > data back to the description of the data. > Of course I don't disagree with the usecase. I disagree with the chosen solution because it is on the wrong level of granularity. > Alasdair > > Dr Alasdair J G Gray > Research Associate > Alasdair.Gray@manchester.ac.uk > +44 161 275 0145 > > http://www.cs.man.ac.uk/~graya/ > > Please consider the environment before printing this email. > > -- Jerven Bolleman me@jerven.eu
Received on Tuesday, 4 June 2013 12:40:56 UTC