W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > July 2014

Re: hcls dataset description comments

From: Joachim Baran <joachim.baran@gmail.com>
Date: Tue, 29 Jul 2014 11:55:44 -0700
Message-ID: <CAObSwHWuj0az_yUKJg+=vfAAapUmfk8e5vnn55g=6kSTa-Zzhw@mail.gmail.com>
To: Michael Miller <Michael.Miller@systemsbiology.org>
Cc: w3c semweb hcls <public-semweb-lifesci@w3.org>
Hi!

  Thanks for the suggestions. I have incorporated your minor edits.
Unbelievable how those slipped through after so many re-readings still.

  For other edits, please fork the repository and create a pull request
with your changes.

Best wishes,

Kim



On 23 July 2014 08:53, Michael Miller <Michael.Miller@systemsbiology.org>
wrote:

> hi kim,
>
>
>
> thanks for the pointer, i've updated my comments based on this newer draft
> below.  many fewer and i especially like the complete example in 10.1!
>
>
>
> cheers,
>
> michael
>
>
>
> Michael Miller
>
> Software Engineer
>
> Institute for Systems Biology
>
>
>
> general comments:
>
> ·         s4.4 'Dataset Linking': might mention also that datasets are
> derived from other datasets?
> 'A dataset may incorporate, or link to, data in other datasets, e.g. in
> the creation of a data mashup ' --> 'A dataset may incorporate, be
> derived from, or link to, data in other datasets, e.g. in the analysis of
> original datasets or in the creation of a data mashup '
>
> ·         s8: odd that some of the top sections (8.1-8.3,8.5-8.7) are
> individual organizations but three (8.4, 8.8, 8.9) have subsections for
> different organizations.  maybe organize so all top level sections define a
> type of organization with subsections beneath or make all top-level?
>
> ·         s8: some of the use cases could be more focused on how this
> note will help them (8.5-8.7)
>
> ·         s8.9: how do Data Catalogs fit into this note?  wasn't clear to
> me how this note is relevant to them
>
> our use case questions:
>
> ·         how to reference 3rd party datasets that aren't described by
> this standard, i.e. TCGA data from the DCC, simply use 'pav:retrievedFrom'
> with the IRI being the URL into the repository?
>
> ·         we have a lot of intermediary files that we won't publish, the
> software specified in creating our published datasets from its sources form
> a (branching) workflow with the input being from the previous step(s) in
> the workflow.  how best to represent this?  this note doesn't seem to cover
> how the dataset is created so any recommendations?
>
> minor edits:
>
> ·         there are two s6.2.3 sections
>
> ·         s8.8.1: '... what period it is updated. To know when to...'
> should be '...what period it is updated to know when to...'?
>
>
>
> *From:* Joachim Baran [mailto:joachim.baran@gmail.com]
> *Sent:* Tuesday, July 22, 2014 3:43 PM
> *To:* Michael Miller
> *Cc:* w3c semweb hcls
> *Subject:* Re: hcls dataset description comments
>
>
>
> Hello,
>
>
>
>   I believe you were looking at an old document. There is currently only
> one Figure in the note.
>
>
>
>   Please check the actual draft at:
> http://htmlpreview.github.io/?https://github.com/joejimbo/HCLSDatasetDescriptions/blob/master/Overview.html
>
>
>
> Best wishes,
>
>
>
> Kim
>
>
>
>
>
> On 22 July 2014 15:36, Michael Miller <Michael.Miller@systemsbiology.org>
> wrote:
>
> hi all,
>
>
>
> tremendous work, very clear and well-written.  my group at ISB, the
> Shmulevich lab is looking to provide provenance for the analysis datasets
> we are producing for TCGA.  we're not sure if we'll be able to 'go all the
> way' but we want to make sure we have at hand all the information that we
> could, at least in theory, be compliant.  as long as i was reading the
> document, below are some notes.
>
>
>
> general comments:
>
> ·         s4.4 'Dataset Linking': might mention also that datasets are
> derived from other datasets?
> 'A dataset may incorporate, or link to, data in other datasets, e.g. in
> the creation of a data mashup ' --> 'A dataset may incorporate, be
> derived from, or link to, data in other datasets, e.g. in the analysis of
> original datasets or in the creation of a data mashup '
>
> ·         the chembl example in s5 is not compliant to the property table
> below, it probably is only supposed to show the relationship of the three
> terms but that could be clarified
>
> ·         s6.2.12 could use the example filled in
>
> ·         6.3.2: not sure what an 'X level description' is
>
> ·         s8: odd that some of the top sections (8.1-8.3,8.5-8.7) are
> individual organizations but three (8.4, 8.8, 8.9) have subsections for
> different organizations.  maybe organize so all top level sections define a
> type of organization with subsections beneath or make all top-level?
>
> ·         s8: many of the use cases could be more focused on how this
> note will help them
>
> ·         s8.9: how do Data Catalogs fit into this note?  wasn't clear to
> me how this note is relevant to them
>
> ·         would be nice to have a 'complete' example p[put together,
> maybe based on chembl?
>
>
>
> our use case questions:
>
> ·         how to reference 3rd party datasets that aren't described by
> this standard, i.e. TCGA data from the DCC, simply use 'pav:retrievedFrom'
> with the IRI being the URL into the repository?
>
> ·         we have a lot of intermediary files that we won't publish, the
> software specified in creating our published datasets from its sources form
> a (branching) workflow with the input being from the previous step(s) in
> the workflow.  how best to represent this?  this note doesn't seem to cover
> how the dataset is created so any recommendations?
>
>
>
> text issues:
>
> ·         Figure 1: 'Overview of dataset description level metadata
> profiles and their relationships': reference not resolved, image doesn't
> show
>
> ·         Figure 2: 'Improve diagram. Multiple appearance of
> concepts/description levels unclear.': reference not resolved, image
> doesn't show.  add actual label
>
>
>
> minor edits:
>
> ·         bottom of s.3: 'placeholde' should be 'placeholder'
>
> ·         use straight quotes rather than slant quotes in s6.2.2 example
> (and elsewhere)?
>
> ·         the text runs out of the box in s6.2.3 'Description'
>
> ·         s6.2.3: 'Dates of Creation and Issuance': 'state the date the
> dataset was generated using dct:created and/or the date the dataset was
> made public using dct:created' should be 'state the date the dataset was
> generated using dct:created and/or the date the dataset was made public
> using dct:issued'?
>
> ·         there are two s6.2.3 sections
>
> ·         s6.2.4: 'Creation: ... The date of authorship' should be '...The
> date of creation' and 'Curation:... The date of authorship' should be '...The
> date of curation'?
>
> ·         s8.5: the author list has end parenthesis without beginning
> parenthesis
>
> ·         s8.8.1: '... what period it is updated. To know when to...'
> should be '...what period it is updated to know when to...'
>
>
>
> cheers,
>
> michael
>
>
>
> Michael Miller
>
> Software Engineer
>
> Institute for Systems Biology
>
>
>
>
>
>
>
Received on Tuesday, 29 July 2014 18:56:11 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:09 UTC