RE: HCLS IG Note on mapping and publishing life sciences RDF

From: Michael Miller <Michael.Miller@systemsbiology.org>
Date: Sun, 18 Mar 2012 12:08:55 -0700
Message-ID: <162a7dc36396bf654056a83fb1318f4f@mail.gmail.com>
To: "M. Scott Marshall" <mscottmarshall@gmail.com>
Cc: HCLS <public-semweb-lifesci@w3.org>, biohackathon@googlegroups.com, linkedlifedatapracticesnote@googlegroups.com, public-lod@w3.org, David Booth <david@dbooth.org>, Erich Gombocz <egombocz@io-informatics.com>
hi scott,

finally got a chance to go through the note and, yes, it is well put
together.  being naive on this subject, some of my comments may safely be

  * instead of being in the body of the test, shouldn't the explanation
for Figure 1 be a caption?
Section 2:
 * what is a "Linked Data interface"?  it doesn't seem to be a defined
standard, rather it seems like each different RDF data would define its
own interface.  some clarity on what is meant by this term would help.
 * Q2
    grammar: "Also, it is often unnecessary to convert every table into a
class and can create scaling problems. "
    these points are mentioned but i didn't see any discussion about how
they affect the DB to RDF mapping (the specific case of data warehousing
is covered but that is but one way to denormalize): "RDB schemas can vary
in their level of normalization as quantified by normalized forms (Date
2009). " and "In practice, many databases are not normalized because the
overhead of working with the schema is not worth the extra reliability and
space savings that may result. "
  * Q3
    perhaps a comment on what in the original non-relational information
affects the quality of the RDF would be nice
  * Q5
    doesn't multiple FROM clauses also allow combining datasets but from
different graphs?
    This sentence implies that "Structure descriptors" always link
datasets containing drugs and small molecules, i think this is supposed to
be more general: "Structure descriptors, such as SMILES strings, and InChi
identifiers may be used to establish links between datasets containing
drugs and small molecules. " should be : " Structure descriptors, such as
SMILES strings and InChi identifiers, may be used to establish links
between datasets. "?
  * Q7
    not a sentence: " Use of the BioPortal for matching entities and their
URIs (including ontologies from Open Biomedical Ontology (OBO) Foundry
(OBO 2011))."
  * Q12
    since this is a note on "Mapping and linking life science data using
RDF", how does the following help one map their RDF data to the web (it's
an important point but seems a little off target in this note, maybe the
emphasis should be how one can use these tools in publishing their data)?
"An important part of improving the utility of the Web is by documenting
the reliability and performance of information services. In the area of
biomedical information services,..."
  * Q14
    grammar (delete 'a'?): "... and to use classes as a values in the
metadata for a graph;"
Section 4:
perhaps change "reflect the state of the art" to " reflect the current
state of the art"?


Michael Miller
Software Engineer
Institute for Systems Biology

> Thanks for the encouragement from you and Erich.
> About the use of a priori , a posteriori - I will mull that over. I
> was pretty happy with the way it seemed to communicate our thoughts, a
> little attached actually.. :(
> > 2. The intro mentions that "a query for Homo sapiens gene label "Alg2"
> > in Entrez Gene (http://www.ncbi.nlm.nih.gov/gene) returns multiple
> > results. Among them is one gene located in chromosome 5 (Entrez
> > ID:85365) and the other in chromosome 9 (Entrez ID:313231), each with
> > multiple aliases".  But the results that I see show ID:85365 as the ID
> > for the one on chromosome 9, and the other one (maybe?) has ID 10016:
> > http://www.ncbi.nlm.nih.gov/gene?term=Alg2[sym]%20homo%20sapiens
> Oops! Thanks for catching that. We had corrected id mixup in the
> article but forgot to correct it in the note.
> Thanks!,
> Scott
