Re: dataset descriptions from Joachim Baran on 2014-02-14 (public-semweb-lifesci@w3.org from February 2014)

From: Joachim Baran <joachim.baran@gmail.com>
Date: Thu, 13 Feb 2014 19:05:00 -0500
To: Michel Dumontier <michel.dumontier@gmail.com>
Cc: w3c semweb hcls <public-semweb-lifesci@w3.org>, Alasdair Gray <alasdair.gray@gmail.com>
Message-ID: <etPan.52fd5dac.4db127f8.6641@Tiny-6.local>

I have added a few sentences under 6.5 (Statistics) as discussed during the last conf call.

Joachim

On February 12, 2014 at 8:55:50 PM, Michel Dumontier (michel.dumontier@gmail.com) wrote:

Hmmm... maybe every dataset, whether a proper subset or not, should be seen as its own dataset.  that way we keep our focus on the versions and distributions of any dataset.

m.

Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com


On Mon, Feb 10, 2014 at 3:37 PM, Freimuth, Robert R., Ph.D. <Freimuth.Robert@mayo.edu> wrote:
Hi Michel,

 

As you know, I don’t attend this particular call.  However, if I understand the question properly, I’d like to risk weighing in.  Feel free to tell me that I’m off-base and I’ll go back to lurking. J  Two points:

 

If arbitrary collections are supported, it must be assumed that (eventually) collections of collections will be created.  In addition, subsets of subsets will be created.  I assume this is supported.

 

Would the subsets be of the same type as the parent?  If not, problems may arise when one person’s set is another person’s subset.

 

These comments are based on my experiences developing the LS DAM, where we ran into this issue in a couple of places, especially when we tried to incorporate the ISA (Investigation Study Assay)  model.  The distinction between the levels was somewhat arbitrary, which created difficulties as it was up to the user to decide (arbitrarily) how to model a given thing.

 

I hope this helps.

 

Thanks,

Bob

 

From: Michel Dumontier [mailto:michel.dumontier@gmail.com]
Sent: Monday, February 10, 2014 2:34 PM
To: w3c semweb hcls
Cc: Alasdair Gray
Subject: dataset descriptions

 

Hi all,

  on today's call we got some feedback from Chris Mungall, Melissa Haendel, and Harry Hochheiser. Chris asked whether (and how), we could make arbitrary collections, for instance, chembl-rdf as a dataset (without necessarily specifying the version). i wondered if perhaps we could generalize our "version level" to a "subset level", which could very well include version subsets. 

 

https://docs.google.com/drawings/d/136kVhd2ffx8qauyT2qMJKgKcWu7O-uvZ2tuH6DejCQ4/edit

 

I also wondered whether this subset level description could point to the distribution level descriptions as sources used in creating it, as more abstract than our previous distribution-to-distribution case.

 

https://docs.google.com/drawings/d/1qCG2Gl2ZtwuAO2clcya5q067FxPFs7UAHiIk18xzEcY/edit

 

 what do you think?

 

m.

 



Michel Dumontier

Associate Professor of Medicine (Biomedical Informatics), Stanford University

Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group

http://dumontierlab.com

Received on Friday, 14 February 2014 00:05:31 UTC