Re: dataset descriptions from Michel Dumontier on 2014-02-13 (public-semweb-lifesci@w3.org from February 2014)

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Wed, 12 Feb 2014 17:53:02 -0800
To: "Freimuth, Robert R., Ph.D." <Freimuth.Robert@mayo.edu>
Cc: w3c semweb hcls <public-semweb-lifesci@w3.org>, Alasdair Gray <alasdair.gray@gmail.com>
Message-ID: <CALcEXf40mk1it2u+wmPyL1s3FixfWFPgdY0Qm3TzmB05y0UdTg@mail.gmail.com>

Hmmm... maybe every dataset, whether a proper subset or not, should be seen
as its own dataset.  that way we keep our focus on the versions and
distributions of any dataset.

m.

Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford
University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com


On Mon, Feb 10, 2014 at 3:37 PM, Freimuth, Robert R., Ph.D. <
Freimuth.Robert@mayo.edu> wrote:

>  Hi Michel,
>
>
>
> As you know, I don't attend this particular call.  However, if I
> understand the question properly, I'd like to risk weighing in.  Feel free
> to tell me that I'm off-base and I'll go back to lurking. J  Two points:
>
>
>
> If arbitrary collections are supported, it must be assumed that
> (eventually) collections of collections will be created.  In addition,
> subsets of subsets will be created.  I assume this is supported.
>
>
>
> Would the subsets be of the same type as the parent?  If not, problems may
> arise when one person's set is another person's subset.
>
>
>
> These comments are based on my experiences developing the LS DAM, where we
> ran into this issue in a couple of places, especially when we tried to
> incorporate the ISA (Investigation Study Assay)  model.  The distinction
> between the levels was somewhat arbitrary, which created difficulties as it
> was up to the user to decide (arbitrarily) how to model a given thing.
>
>
>
> I hope this helps.
>
>
>
> Thanks,
>
> Bob
>
>
>
> *From:* Michel Dumontier [mailto:michel.dumontier@gmail.com]
> *Sent:* Monday, February 10, 2014 2:34 PM
> *To:* w3c semweb hcls
> *Cc:* Alasdair Gray
> *Subject:* dataset descriptions
>
>
>
> Hi all,
>
>   on today's call we got some feedback from Chris Mungall, Melissa
> Haendel, and Harry Hochheiser. Chris asked whether (and how), we could make
> arbitrary collections, for instance, chembl-rdf as a dataset (without
> necessarily specifying the version). i wondered if perhaps we could
> generalize our "version level" to a "subset level", which could very well
> include version subsets.
>
>
>
>
> https://docs.google.com/drawings/d/136kVhd2ffx8qauyT2qMJKgKcWu7O-uvZ2tuH6DejCQ4/edit
>
>
>
> I also wondered whether this subset level description could point to the
> distribution level descriptions as sources used in creating it, as more
> abstract than our previous distribution-to-distribution case.
>
>
>
>
> https://docs.google.com/drawings/d/1qCG2Gl2ZtwuAO2clcya5q067FxPFs7UAHiIk18xzEcY/edit
>
>
>
>  what do you think?
>
>
>
> m.
>
>
>
>
>   Michel Dumontier
>
> Associate Professor of Medicine (Biomedical Informatics), Stanford
> University
>
> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest
> Group
>
> http://dumontierlab.com
>

Received on Thursday, 13 February 2014 01:53:52 UTC