W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > April 2007

Re: adding pubmed ids to BAMS

From: William Bug <William.Bug@DrexelMed.edu>
Date: Thu, 19 Apr 2007 01:43:38 -0400
Message-Id: <98B858FD-0475-47E6-B2D7-F1D0ED661F73@DrexelMed.edu>
Cc: John Barkley <jbarkley@nist.gov>, Jonathan Rees <jar@mumble.net>, chris mungall <cjm@fruitfly.org>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>, Suzanna Lewis <suzi@berkeleybop.org>, Judith Blake <jblake@informatics.jax.org>, Barry Smith <phismith@buffalo.edu>
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Hi Alan,

I like the look of this idea, and it falls in with approaches I've  
been hoping we can do with our OWL/RDF representations of neuro  
repositories such as the Mouse Brain Library, GeneNetwork, and Cell- 
Centered Database (and, of course, SenseLab, BAMS,  
NeuroMorph.org,etc.) - many of which contain either data based  
evidence or citation based evidence.  With all the effort that has  
been invested in curating that evidence, we must find a way to  
include it in our ontological representation of the information -  
something the GO community has been doing in using their GO & OBO  
formalims for quite some time.

Of course, as we now expect the assertions drawn from that evidence  
to be used to support inferencing (even basic classification -  
nothing fancy) we'll also need to add a simple, minimally labor  
intensive way to vet some of this evidence.  As Matthias Samwald  
pointed out today with his review of some of the SenseLab citations,  
they can be fraught with "seeming" contradictions, if you haven't  
fully specified the context for the evidentiary assertion (see his  
example of evidence tied to a specific neuron subtype definition in  
NeuronDB that both confirmed AND refuted the presence of fast,  
transient Na+ currents - which on closer inspection by Matthias  
merely required more detailed qualification of the evidentiary or  
referent context).

With BAMS this is critical as well, as it is ALL built from citation- 
based evidence and interpretation of neuronal connectivity studies  
can be very difficult to normalize across studies.  Mihail Bota who  
is both lead developer and lead curator on BAMS has a complex model  
he's created for tracking interpretation, but it is not completely  
clear to me how one would map this into RDF where you could more  
easily query and reason upon it with readily available, community tools.

I will read through this proposal and get back to you with some  
thoughts.

Cheers,
Bill

On Apr 19, 2007, at 12:24 AM, Alan Ruttenberg wrote:

>
> Here is an idea I am exploring. Perhaps you might mock this up:
>
> The essential idea is that evidence and other annotation is about  
> named classes. In those cases where one might think of annotating  
> some axiom, or piece of axiom, we would instead look for the class  
> that is the referent of the annotation and name that class.
> Then, we can connect that class, using an annotation property,  to  
> whatever kind of annotation or evidence we think appropriate.
>
> Suppose we have a class HumanP53Protein, which we will define as:  
> Those proteins whose sequence of amino acids are described by the  
> sequence in the sequence information field of the Uniprot P53_Human  
> Record, or which are derived from such a protein. (I'm open to  
> discussion on what this definitions should be, BTW, but I think we  
> should have one)
>
> One gene ontology annotation to P53 is:
> GO:0000739; Molecular function: DNA strand annealing activity  
> (inferred from direct assay from UniProtKB).
>
> GO:0000739 is defined as in OBO, as a class, a subclass of function.
>
> We will say that the referent of this annotation is the class
>
> HumanP53ProteinWithFunctionDNAStrandAnnealing:  HumanP53Protein and  
> has_function some GO:0000739
>
> The annotation property itself might be called "ExistsAccordingTo",  
> by which we mean that this class has instances
>
> The thing it exists according is
>
> Inference001
>    type InferredFromDirectAssay
>    describedInPaper theArticlePMID1234Describes
>
> So our annotation is
>
> HumanP53ProteinWithFunctionDNAStrandAnnealing ExistsAccordingTo  
> Inference001
>
> Up to this point we have been conservative. We haven't made any  
> statement about P53 in general. Here, we will overstate (our only  
> choice, if we want to make a statement about biology from which  
> some useful inference can be done, given the evidence we have)
>
> HumanP53Protein subclassOf  
> HumanP53ProteinWithFunctionDNAStrandAnnealing
>
> This may be wrong. For instance, it may be the case that only that  
> P53 phosphorylated in some way actually has this function.
> I hope that by some other statement, a contradiction is inferred  
> that will force us (or the curators) to be more specific.
>
> ----
>
> What's nice about this?
>
>
> 1) We are making statements about biology (better than making  
> statements about "terms")
> 2) There is no RDF reification involved - the main contender for  
> representing this sort of thing.
> 3) We have been (relatively) conservative about what we say there  
> is evidence for
> 4) We are owning the fact that we are making an overstatement
> 5) We are enabling some inference to take place.
>
> What's the cost?
>
> 1) One extra triple, in which we name the class  
> HumanP53ProteinInvolvedInDNADamageResponse
> Where we previously would have used a restriction to introduce the  
> participation, we now use the named class.
> 2) When querying about what the evidence is for, we need to query  
> the asserted (or told) assertions only. That's because after  
> inference has been done, new assertions may be known about  
> HumanP53ProteinWithFunctionDNAStrandAnnealing and we won't be able  
> to tell the difference between what was asserted and what is  
> inferred, given that we have associated the only the class name  
> with the evidence
>
> ---
>
> Taking this to BAMS it means that we associate the paper with the  
> cell class for which we already have an name.
> For the molecule is found in cell cases, we create the named class  
> for the cell contains some molecule class, use that
> class in place of the restriction, and associate the paper to that  
> named class.
>
> You can define
>
> Class(article :partial)
> Class(pubmedRecord :partial)
> ObjectProperty(definedByPMID inversefunctional)
>
> Represent the pubmed record as an instance of pubmedRecord named  
> http://purl.org/commons/pubmed/1234
>
> The last issue is the nature of the relationship between the paper  
> and the class. If we can't easily distinguish between whether
> these annotations are evidence or simply discussion we could use  
> the relation "isMentionedBy", which we will mean to say that the  
> class (or some instances of the class) are discussed in the paper.
>
> ---
>
> Call me if you want to discuss this. Admittedly this may seem  
> involved and odd, since it is a new idea, though I will blame Chris  
> and Jonathan, who I bounced it off of, for not telling me straight  
> off it didn't make sense :)
>
> But how about we give it a go and see what it feels like. I'm  
> planning to use this translation for the GO annotations and the  
> rest of the similar sources, unless somebody comes forth with some  
> arguments about what would be a better idea.
>
> Best,
> Alan
>
>
> On Apr 18, 2007, at 3:49 PM, jbarkley@nist.gov wrote:
>
>>
>>> From what Mihai sent me, the pubmed refs are about:
>>
>>> the cell and
>>> the fact the molecule is found in cell
>>
>> Pending your recomendation, I had tentatively suggested the  
>> following for
>> representing this as:
>>
>> pubmedID has "<id>" or
>> cell_has_molecule_within some (<cell> and (pubmedID has "<id>"))
>>
>> where one of more of these is associated with a cell. I was under the
>> impression that you were thinking about a general representation  
>> that everyone
>> would use for pubmedID. So, I haven't yet added these to the BAMS  
>> OWL version.
>>
>>> OK. Can you send me this for a quick look?
>>
>> I'm not sure what you are asking to see. Do you want to see the  
>> original
>> tables Mihai sent me?
>>
>> thanks,
>>
>> jb
>>
>>
>>
>> Date:  Wed, 18 Apr 2007 12:30:17 -0400
>> From:  Alan Ruttenberg <alanruttenberg@gmail.com>
>> To:  John Barkley <jbarkley@nist.gov>
>> Cc:  Jonathan A Rees <jar@mumble.net>
>> Subject:  Re: adding pubmed ids to BAMS
>> Quoting Alan Ruttenberg <alanruttenberg@gmail.com>:
>>
>>>
>>> On Apr 13, 2007, at 1:51 PM, John Barkley wrote:
>>>
>>>> I have confirmed from Mihai that all of the pubmed references in
>>>> BAMS are evidence for or elaboration about.
>>>
>>> OK. Can you send me this for a quick look?
>>> Is it clear what the they are about
>>> i.e.
>>>
>>> the cell
>>> the part
>>> the fact that cell is located in part
>>> the fact the molecule is found in cell
>>> the fact the molecule is found in part
>>> the fact the molecule is found in cell in part
>>> etc.
>>>
>>> ?
>>>
>>>>
>>>>
>>>> ----- Original Message ----- From: "Alan Ruttenberg"
>>>> <alanruttenberg@gmail.com>
>>>>
>>>>> Don't have time at this moment, but I think that generally you
>>>>> want  to state the the article is either evidence for, or
>>>>> elaboration about  the scientific statement involving the cells,
>>>>> molecules, etc. Then  then use the pubmed id in some standard URI
>>>>> form (maybe neurocommons  record url style) or
>>>>> Jonathan's purl.org suggestion. In other words the pubmed id is
>>>>> the identifier for a thing (the article, or the abstract,
>>>>> depending on  one's point of view).
>>>>>
>>>>> More details later.
>>>>>
>>>>> You could look and see how Gene ontology represents evidence.
>>>>>
>>>>> -Alan
>>>>>
>>>>> On Apr 11, 2007, at 3:46 PM, John Barkley wrote:
>>>>>
>>>>>> hi alan,
>>>>>>
>>>>>> I recieved spreadsheets from Mihai relating cells & pubmed ids,
>>>>>> and cells, molecules, & pubmed ids. I wanted to consult with you
>>>>>> about  your preferences for how to integrate this into BAMS. I am
>>>>>> thinking  something like defining a datatype property pubmedID
>>>>>> from owl:thing  to string. Then for cells, you would have:
>>>>>>
>>>>>> pubmedID has "<id>"
>>>>>>
>>>>>> and for cells with molecules within, you would have:
>>>>>>
>>>>>> cell_has_molecule_within some (<cell> and (pubmedID has "<id>"))
>>>>>>
>>>>>> Please let me know.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> jb
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Bill Bug
Senior Research Analyst/Ontological Engineer

Laboratory for Bioimaging  & Anatomical Informatics
www.neuroterrain.org
Department of Neurobiology & Anatomy
Drexel University College of Medicine
2900 Queen Lane
Philadelphia, PA    19129
215 991 8430 (ph)
610 457 0443 (mobile)
215 843 9367 (fax)


Please Note: I now have a new email - William.Bug@DrexelMed.edu
Received on Thursday, 19 April 2007 05:44:01 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:00:47 GMT