Re: adding pubmed ids to BAMS from Mark Montgomery on 2007-04-20 (public-semweb-lifesci@w3.org from April 2007)

From: Mark Montgomery <markm@kyield.com>
Date: Fri, 20 Apr 2007 10:13:05 -0700
To: "Alan Ruttenberg" <alanruttenberg@gmail.com>, "Kei Cheung" <kei.cheung@yale.edu>
Cc: "Huajun Chen @ Zhejiang University" <huajunsir@gmail.com>, "Jonathan Rees" <jar@mumble.net>, "chris mungall" <cjm@fruitfly.org>, "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>, "Suzanna Lewis" <suzi@berkeleybop.org>, "Judith Blake" <jblake@informatics.jax.org>, "Barry Smith" <phismith@buffalo.edu>, "John Barkley" <jbarkley@nist.gov>
Message-ID: <027d01c7836f$29b429b0$a100a8c0@Inspiron>
Appreciate all of your debates and discussions recently- it's helped me and 
no doubt others think things through a bit. Suspect many of us are not 
chiming in to keep noise level down with rare tidbit of value to add- for 
example in my case we are still primarily on the drawing board in part until 
some of these issues can be resolved, at least internally.

For internal efforts we are leaning towards prefixed numerical identifiers 
simply because it substantially increases the options for future 
differentiation and value, and is (in my mind for now at least) easier to 
amend, which is of course necessary for survival for most tool makers and 
their customers. The nuanced definitions between cultures, whether 
corp/academic/gov/regional/industry etc. are in part what prevents sameness 
of thought and encourages creativity and innovation. I am however also 
constantly mindful of the economics relating to adoption, and therefore 
attempt to reduce the need for costly maintenance.

My rudimentary understanding of ontological languages in current form that 
evolved from the xml promise is appealing- OWL full for example, but my 
instincts suggest (for now at least) that we'll need to rely heavily on 
assistance from the purity of numbers in the translations and descriptions, 
which has often been the case.

I can see the potential for harm in attempting to zoom in too far on 
granularity in standardization efforts (understanding the appeal) and would 
therefore vote for prudent equilibrium between adaptability and fixed. A bit 
messier perhaps to purposely engineer on the side of caution, but so too are 
most institutions that experience wide adoption coming to my mind. .02- MM



----- Original Message ----- 
From: "Alan Ruttenberg" <alanruttenberg@gmail.com>
To: "Kei Cheung" <kei.cheung@yale.edu>
Cc: "Huajun Chen @ Zhejiang University" <huajunsir@gmail.com>; "Jonathan 
Rees" <jar@mumble.net>; "chris mungall" <cjm@fruitfly.org>; 
"public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>; "Suzanna Lewis" 
<suzi@berkeleybop.org>; "Judith Blake" <jblake@informatics.jax.org>; "Barry 
Smith" <phismith@buffalo.edu>; "John Barkley" <jbarkley@nist.gov>
Sent: Friday, April 20, 2007 9:17 AM
Subject: Re: adding pubmed ids to BAMS



Yes, this is an issue. I actually see it as two separate issues:

1) Creating names that are stable
2) Creating names that are readable

1) For stable names, a uri based on the logical definition will do,
e.g.manchester syntax definition, or an md5 of that. This will be
unambiguous and deterministic, though perhaps aesthetically
unpleasing.

Note that other groups, OBO in particular, have chosen to only use
prefixed numerical identifiers for their class names, precisely to
make maintenance easier. Perhaps someone (Chris?) could explain their
rational.

2) I think that URIs should function first as unique identifiers, and
only if possible, as elements of user interface. So for these classes
I would create a label and a comment that make them more user
friendly.

That said, I am seriously thinking about the criticisms Huajun and you 
raised.

-Alan

On 4/20/07, Kei Cheung <kei.cheung@yale.edu> wrote:
> Hi Alan et al,
>
> Although are are pros and cons for named classes vs. unnamed classes, I
> prefer the latter for ease of ontology maintenance. As Huajun pointed
> out, it's usually not a trivial issues to come up with standard (long)
> class names that will make both human and machine happy. If these names
> are changed, the underlying applications will need to be changed. Just
> my 2-cent thought.
>
> Best,
>
> -Kei
>
> Alan Ruttenberg wrote:
>
> >
> > I'm uncertain about the status of comments on anonymous classes. At a
> > minimum, they present present a challenge for current display tools,
> > though I suppose that any solution does. For my display tools I was
> > considering filtering out these named classes in certain types of
> > displays. I was able to retrieve the comments using sparql on an
> > unreasoned version of your file.
> >
> > I'm guessing that they will be considered to be outside OWL-DL, but
> > harmless (other than for lack of tool support). A similar trick -
> > naming some otherwise blank restriction nodes caused pellet to
> > complain. Jonathan speculated, and I think the reasoning is supported
> > by the documentation, that the only valid OWL-DL documents are ones
> > that can be specified in the abstract syntax, and the abstract syntax
> > doesn't have a place for annotations.
> >
> > There is no difference between semantics if the class is named or not,
> > so it is not surprising that the inconsistencies were found with the
> > anonymous classes.
> >
> > I admit I am torn on this issue. The most compatible way of doing what
> > we want to do is to use the named classes. I think there are also good
> > philosophical reasons for naming these classes - in some way they are
> > more valid than classes that we specify as ontology designers, since
> > they are actually referenced in a paper describing an experiment. In
> > current tools it may also be easier to pinpoint the source of an
> > inconsistency, since currently the whole class would be unsatisfiable,
> > whereas if the granularity was smaller it might be that some of the
> > components would be, and could therefore be removed from consideration
> > when debugging.
> >
> > In either case, some adjustment will have to be done to user interface
> > to support the technique. In the case of the named classes, some
> > mechanism to hide them in appropriate contexts is likely to be
> > desirable. In the case of the commented anonymous nodes, some method
> > of associating the evidence with a visible referent, in appropriate
> > contexts, will be needed.
> >
> > What do others think?
> >
> > -Alan
> >
> > ps. I have generally found "protege supports it" not to be a very good
> > guide as to whether something is actually in the spec, or a good idea.
> > Conversely, there are unsupported elements of the spec, and I worry
> > that the limited interface can discourage good ideas.
> >
> > On 4/20/07, Huajun Chen @ Zhejiang University <huajunsir@gmail.com>
> > wrote:
> >
> >> We met similar problems when we tried to relate publications to 
> >> neurons.
> >>
> >> Check the research notes for CA3 pyramidal neuron:
> >> http://senselab.med.yale.edu/senselab/NeuronDB/ndbEavSum.asp?id=259&mo=4&re=
> >>
> >>
> >> All of the notes are about papers supporting the evidences of being
> >> present or absent of a receptor/current/transmitter in a specific
> >> compartment.  The notes can not simply attach to the neurocell class,
> >> since they are actually annotations that should be attached to
> >> statements.  All of the notes should be attached to the
> >> owl:Restriction defined for those receptor/current/transmitter, see
> >> the definition for CA3 pyramidal neuron in old DL syntax below.
> >>
> >> If taking Alan's approach, we have to create additionally a huge
> >> number of new named classes. For example, for CA3, we have to create
> >> extra 30 named classes. And if take it as an average for other cells,
> >> we have to create nearly 30*33=990 new named classes for all cells in
> >> neuroDB. Previously we only have less than 50 classes in total.
> >>
> >> We also have to come up new class name for each one which tends to be
> >> long and somewhat odd. For example, the class name might be
> >> AMPA_in_DAD_in_CA3_pyramidal_neuron.
> >>
> >> Unnamed class is one of the elegant and neat features of DL,
> >> especially in the cases where we do not want or do not know how to
> >> explicitly specify the name.
> >> Besides, I don't think unnamed class dis-enable the evidential
> >> inference to take place.  For examples, the inconsistence we've just
> >> found were inferred out from those unnamed class.
> >>
> >> I've also found protégé does supports annotating unnamed classes, but
> >> I'm not quite sure if the OWL specification allows us to do that?
> >>
> >> Best all,
> >> Huajun
> >>
> >> Principal_Neuron AND
> >>
> >> ro:hasPart SOME [(Dad AND
> >>                   (has_Receptors SOME AMPA) AND
> >>                   (has_Receptors SOME NMDA) AND
> >>                   (has_Currents SOME I_p_q) AND
> >>                   (has_Currents SOME I_K)),
> >>                  (Dap AND
> >>                   (has_Receptors SOME Glutamate) AND
> >>                   (NOT (has_Currents SOME I_Na_t)) AND
> >>                   (has_Currents SOME I_K) AND
> >>                   (has_Currents SOME I_p_q)),
> >>                  (Soma AND
> >>                   (has_Receptors SOME AMPA) AND
> >>                   (has_Receptors SOME NMDA) AND
> >>                   (has_Receptors SOME GabaB) AND
> >>                   (has_Receptors SOME GabaA) AND
> >>                   (has_Receptors SOME mGluR) AND
> >>                   (has_Receptors SOME Gaba) AND
> >>                   (has_Currents SOME I_p_q) AND
> >>                   (has_Currents SOME I_K_Ca) AND
> >>                   (has_Currents SOME I_Na_t) AND
> >>                   (has_Currents SOME I_N) AND
> >>                   (has_Currents SOME I_A) AND
> >>                   (has_Currents SOME I_K) AND
> >>                   (has_Currents SOME I_IR_Q_h) AND
> >>                   (has_Currents SOME I_T_low_threshold) AND
> >>                   (has_Currents SOME I_L_high_threshold)),
> >>                  (Dam AND
> >>                   (has_Receptors SOME mGluR) AND
> >>                   (has_Receptors SOME GabaB) AND
> >>                   (has_Receptors SOME AMPA) AND
> >>                   (has_Receptors SOME Gaba) AND
> >>                   (has_Receptors SOME Glutamate) AND
> >>                   (has_Receptors SOME NMDA) AND
> >>                   (has_Receptors SOME GabaA) AND
> >>                   (has_Currents SOME I_L_high_threshold) AND
> >>                   (has_Currents SOME I_p_q) AND
> >>                   (has_Currents SOME I_T_low_threshold) AND
> >>                   (has_Currents SOME I_K)),
> >>                  (T AND
> >>                   (has_Receptors SOME NO) AND
> >>                   (has_Currents SOME I_N) AND
> >>                   (has_Transmitters SOME Glutamate)),
> >>                  (A AND
> >>                   (has_Currents SOME I_Na_t)),
> >>                  (AH AND
> >>                   (has_Currents SOME I_K) AND
> >>                   (has_Currents SOME I_Na_t))]
> >>
> >>
> >> On 4/18/07, Alan Ruttenberg <alanruttenberg@gmail.com> wrote:
> >> >
> >> > Here is an idea I am exploring. Perhaps you might mock this up:
> >> >
> >> > The essential idea is that evidence and other annotation is about
> >> > named classes. In those cases where one might think of annotating
> >> > some axiom, or piece of axiom, we would instead look for the class
> >> > that is the referent of the annotation and name that class.
> >> > Then, we can connect that class, using an annotation property,  to
> >> > whatever kind of annotation or evidence we think appropriate.
> >> >
> >> > Suppose we have a class HumanP53Protein, which we will define as:
> >> > Those proteins whose sequence of amino acids are described by the
> >> > sequence in the sequence information field of the Uniprot P53_Human
> >> > Record, or which are derived from such a protein. (I'm open to
> >> > discussion on what this definitions should be, BTW, but I think we
> >> > should have one)
> >> >
> >> > One gene ontology annotation to P53 is:
> >> > GO:0000739; Molecular function: DNA strand annealing activity
> >> > (inferred from direct assay from UniProtKB).
> >> >
> >> > GO:0000739 is defined as in OBO, as a class, a subclass of function.
> >> >
> >> > We will say that the referent of this annotation is the class
> >> >
> >> > HumanP53ProteinWithFunctionDNAStrandAnnealing:  HumanP53Protein and
> >> > has_function some GO:0000739
> >> >
> >> > The annotation property itself might be called "ExistsAccordingTo",
> >> > by which we mean that this class has instances
> >> >
> >> > The thing it exists according is
> >> >
> >> > Inference001
> >> >    type InferredFromDirectAssay
> >> >    describedInPaper theArticlePMID1234Describes
> >> >
> >> > So our annotation is
> >> >
> >> > HumanP53ProteinWithFunctionDNAStrandAnnealing ExistsAccordingTo
> >> > Inference001
> >> >
> >> > Up to this point we have been conservative. We haven't made any
> >> > statement about P53 in general. Here, we will overstate (our only
> >> > choice, if we want to make a statement about biology from which some
> >> > useful inference can be done, given the evidence we have)
> >> >
> >> > HumanP53Protein subclassOf
> >> HumanP53ProteinWithFunctionDNAStrandAnnealing
> >> >
> >> > This may be wrong. For instance, it may be the case that only that
> >> > P53 phosphorylated in some way actually has this function.
> >> > I hope that by some other statement, a contradiction is inferred that
> >> > will force us (or the curators) to be more specific.
> >> >
> >> > ----
> >> >
> >> > What's nice about this?
> >> >
> >> >
> >> > 1) We are making statements about biology (better than making
> >> > statements about "terms")
> >> > 2) There is no RDF reification involved - the main contender for
> >> > representing this sort of thing.
> >> > 3) We have been (relatively) conservative about what we say there is
> >> > evidence for
> >> > 4) We are owning the fact that we are making an overstatement
> >> > 5) We are enabling some inference to take place.
> >> >
> >> > What's the cost?
> >> >
> >> > 1) One extra triple, in which we name the class
> >> > HumanP53ProteinInvolvedInDNADamageResponse
> >> > Where we previously would have used a restriction to introduce the
> >> > participation, we now use the named class.
> >> > 2) When querying about what the evidence is for, we need to query the
> >> > asserted (or told) assertions only. That's because after inference
> >> > has been done, new assertions may be known about
> >> > HumanP53ProteinWithFunctionDNAStrandAnnealing and we won't be able to
> >> > tell the difference between what was asserted and what is inferred,
> >> > given that we have associated the only the class name with the
> >> evidence
> >> >
> >> > ---
> >> >
> >> > Taking this to BAMS it means that we associate the paper with the
> >> > cell class for which we already have an name.
> >> > For the molecule is found in cell cases, we create the named class
> >> > for the cell contains some molecule class, use that
> >> > class in place of the restriction, and associate the paper to that
> >> > named class.
> >> >
> >> > You can define
> >> >
> >> > Class(article :partial)
> >> > Class(pubmedRecord :partial)
> >> > ObjectProperty(definedByPMID inversefunctional)
> >> >
> >> > Represent the pubmed record as an instance of pubmedRecord named
> >> > http://purl.org/commons/pubmed/1234
> >> >
> >> > The last issue is the nature of the relationship between the paper
> >> > and the class. If we can't easily distinguish between whether
> >> > these annotations are evidence or simply discussion we could use the
> >> > relation "isMentionedBy", which we will mean to say that the class
> >> > (or some instances of the class) are discussed in the paper.
> >> >
> >> > ---
> >> >
> >> > Call me if you want to discuss this. Admittedly this may seem
> >> > involved and odd, since it is a new idea, though I will blame Chris
> >> > and Jonathan, who I bounced it off of, for not telling me straight
> >> > off it didn't make sense :)
> >> >
> >> > But how about we give it a go and see what it feels like. I'm
> >> > planning to use this translation for the GO annotations and the rest
> >> > of the similar sources, unless somebody comes forth with some
> >> > arguments about what would be a better idea.
> >> >
> >> > Best,
> >> > Alan
> >> >
> >> >
> >> > On Apr 18, 2007, at 3:49 PM, jbarkley@nist.gov wrote:
> >> >
> >> > >
> >> > >> From what Mihai sent me, the pubmed refs are about:
> >> > >
> >> > >> the cell and
> >> > >> the fact the molecule is found in cell
> >> > >
> >> > > Pending your recomendation, I had tentatively suggested the
> >> > > following for
> >> > > representing this as:
> >> > >
> >> > > pubmedID has "<id>" or
> >> > > cell_has_molecule_within some (<cell> and (pubmedID has "<id>"))
> >> > >
> >> > > where one of more of these is associated with a cell. I was under
> >> the
> >> > > impression that you were thinking about a general representation
> >> > > that everyone
> >> > > would use for pubmedID. So, I haven't yet added these to the BAMS
> >> > > OWL version.
> >> > >
> >> > >> OK. Can you send me this for a quick look?
> >> > >
> >> > > I'm not sure what you are asking to see. Do you want to see the
> >> > > original
> >> > > tables Mihai sent me?
> >> > >
> >> > > thanks,
> >> > >
> >> > > jb
> >> > >
> >> > >
> >> > >
> >> > > Date:  Wed, 18 Apr 2007 12:30:17 -0400
> >> > > From:  Alan Ruttenberg <alanruttenberg@gmail.com>
> >> > > To:  John Barkley <jbarkley@nist.gov>
> >> > > Cc:  Jonathan A Rees <jar@mumble.net>
> >> > > Subject:  Re: adding pubmed ids to BAMS
> >> > > Quoting Alan Ruttenberg <alanruttenberg@gmail.com>:
> >> > >
> >> > >>
> >> > >> On Apr 13, 2007, at 1:51 PM, John Barkley wrote:
> >> > >>
> >> > >>> I have confirmed from Mihai that all of the pubmed references in
> >> > >>> BAMS are evidence for or elaboration about.
> >> > >>
> >> > >> OK. Can you send me this for a quick look?
> >> > >> Is it clear what the they are about
> >> > >> i.e.
> >> > >>
> >> > >> the cell
> >> > >> the part
> >> > >> the fact that cell is located in part
> >> > >> the fact the molecule is found in cell
> >> > >> the fact the molecule is found in part
> >> > >> the fact the molecule is found in cell in part
> >> > >> etc.
> >> > >>
> >> > >> ?
> >> > >>
> >> > >>>
> >> > >>>
> >> > >>> ----- Original Message ----- From: "Alan Ruttenberg"
> >> > >>> <alanruttenberg@gmail.com>
> >> > >>>
> >> > >>>> Don't have time at this moment, but I think that generally you
> >> > >>>> want  to state the the article is either evidence for, or
> >> > >>>> elaboration about  the scientific statement involving the cells,
> >> > >>>> molecules, etc. Then  then use the pubmed id in some standard 
> >> > >>>> URI
> >> > >>>> form (maybe neurocommons  record url style) or
> >> > >>>> Jonathan's purl.org suggestion. In other words the pubmed id is
> >> > >>>> the identifier for a thing (the article, or the abstract,
> >> > >>>> depending on  one's point of view).
> >> > >>>>
> >> > >>>> More details later.
> >> > >>>>
> >> > >>>> You could look and see how Gene ontology represents evidence.
> >> > >>>>
> >> > >>>> -Alan
> >> > >>>>
> >> > >>>> On Apr 11, 2007, at 3:46 PM, John Barkley wrote:
> >> > >>>>
> >> > >>>>> hi alan,
> >> > >>>>>
> >> > >>>>> I recieved spreadsheets from Mihai relating cells & pubmed ids,
> >> > >>>>> and cells, molecules, & pubmed ids. I wanted to consult with 
> >> > >>>>> you
> >> > >>>>> about  your preferences for how to integrate this into BAMS.
> >> I am
> >> > >>>>> thinking  something like defining a datatype property pubmedID
> >> > >>>>> from owl:thing  to string. Then for cells, you would have:
> >> > >>>>>
> >> > >>>>> pubmedID has "<id>"
> >> > >>>>>
> >> > >>>>> and for cells with molecules within, you would have:
> >> > >>>>>
> >> > >>>>> cell_has_molecule_within some (<cell> and (pubmedID has 
> >> > >>>>> "<id>"))
> >> > >>>>>
> >> > >>>>> Please let me know.
> >> > >>>>>
> >> > >>>>> thanks,
> >> > >>>>>
> >> > >>>>> jb
> >> > >>>>>
> >> > >>>>
> >> > >>>
> >> > >>>
> >> > >>
> >> > >>
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >>
> >
>
>
>
Received on Friday, 20 April 2007 17:14:30 UTC