Re: hcls dataset description comments

Hello,

  I believe you were looking at an old document. There is currently only
one Figure in the note.

  Please check the actual draft at:
http://htmlpreview.github.io/?https://github.com/joejimbo/HCLSDatasetDescriptions/blob/master/Overview.html

Best wishes,

Kim



On 22 July 2014 15:36, Michael Miller <Michael.Miller@systemsbiology.org>
wrote:

> hi all,
>
>
>
> tremendous work, very clear and well-written.  my group at ISB, the
> Shmulevich lab is looking to provide provenance for the analysis datasets
> we are producing for TCGA.  we're not sure if we'll be able to 'go all the
> way' but we want to make sure we have at hand all the information that we
> could, at least in theory, be compliant.  as long as i was reading the
> document, below are some notes.
>
>
>
> general comments:
>
> ·         s4.4 'Dataset Linking': might mention also that datasets are
> derived from other datasets?
> 'A dataset may incorporate, or link to, data in other datasets, e.g. in
> the creation of a data mashup ' --> 'A dataset may incorporate, be
> derived from, or link to, data in other datasets, e.g. in the analysis of
> original datasets or in the creation of a data mashup '
>
> ·         the chembl example in s5 is not compliant to the property table
> below, it probably is only supposed to show the relationship of the three
> terms but that could be clarified
>
> ·         s6.2.12 could use the example filled in
>
> ·         6.3.2: not sure what an 'X level description' is
>
> ·         s8: odd that some of the top sections (8.1-8.3,8.5-8.7) are
> individual organizations but three (8.4, 8.8, 8.9) have subsections for
> different organizations.  maybe organize so all top level sections define a
> type of organization with subsections beneath or make all top-level?
>
> ·         s8: many of the use cases could be more focused on how this
> note will help them
>
> ·         s8.9: how do Data Catalogs fit into this note?  wasn't clear to
> me how this note is relevant to them
>
> ·         would be nice to have a 'complete' example p[put together,
> maybe based on chembl?
>
>
>
> our use case questions:
>
> ·         how to reference 3rd party datasets that aren't described by
> this standard, i.e. TCGA data from the DCC, simply use 'pav:retrievedFrom'
> with the IRI being the URL into the repository?
>
> ·         we have a lot of intermediary files that we won't publish, the
> software specified in creating our published datasets from its sources form
> a (branching) workflow with the input being from the previous step(s) in
> the workflow.  how best to represent this?  this note doesn't seem to cover
> how the dataset is created so any recommendations?
>
>
>
> text issues:
>
> ·         Figure 1: 'Overview of dataset description level metadata
> profiles and their relationships': reference not resolved, image doesn't
> show
>
> ·         Figure 2: 'Improve diagram. Multiple appearance of
> concepts/description levels unclear.': reference not resolved, image
> doesn't show.  add actual label
>
>
>
> minor edits:
>
> ·         bottom of s.3: 'placeholde' should be 'placeholder'
>
> ·         use straight quotes rather than slant quotes in s6.2.2 example
> (and elsewhere)?
>
> ·         the text runs out of the box in s6.2.3 'Description'
>
> ·         s6.2.3: 'Dates of Creation and Issuance': 'state the date the
> dataset was generated using dct:created and/or the date the dataset was
> made public using dct:created' should be 'state the date the dataset was
> generated using dct:created and/or the date the dataset was made public
> using dct:issued'?
>
> ·         there are two s6.2.3 sections
>
> ·         s6.2.4: 'Creation: ... The date of authorship' should be '...The
> date of creation' and 'Curation:... The date of authorship' should be '...The
> date of curation'?
>
> ·         s8.5: the author list has end parenthesis without beginning
> parenthesis
>
> ·         s8.8.1: '... what period it is updated. To know when to...'
> should be '...what period it is updated to know when to...'
>
>
>
> cheers,
>
> michael
>
>
>
> Michael Miller
>
> Software Engineer
>
> Institute for Systems Biology
>
>
>
>
>

Received on Tuesday, 22 July 2014 22:43:20 UTC