RE: Ambiguous names. was: Re: URL +1, LSID -1

 
Here are two items which may serve as additional resources for the
discussion on protein identification, and a third item soliciting input
on the NIGMS/NIH Protein Structure Initiative to which you may wish to
respond directly (deadline today).

1. The Michigan Molecular Interaction database which deep-merges data
from various protein interaction databases.
http://mimi.ncibi.org/MiMI/home.jsp
http://mimi.ncibi.org/MiMI/faq.htm

2. "Requirements and ontology for a G protein-coupled receptor
oligomerization knowledge base"  A paper discussing issues associated
with protein description and function based on oligiomerization for a
major class of receptors
http://www.biomedcentral.com/1471-2105/8/177 

3. A solicitation for input on the NIGMS/NIH Protein Structure
Initiative.
http://www.nigms.nih.gov/About/Council/PSIAssessment.htm
http://grants.nih.gov/grants/guide/notice-files/NOT-GM-07-108.html

Karen Skinner
NIDA/NIH

-----Original Message-----
From: Eric Jain [mailto:Eric.Jain@isb-sib.ch] 
Sent: Friday, July 20, 2007 11:56 AM
To: Alan Ruttenberg
Cc: Phillip Lord; Matthias Samwald; public-semweb-lifesci@w3.org
Subject: Re: Ambiguous names. was: Re: URL +1, LSID -1


Alan Ruttenberg wrote:
> "Remember that one of the reasons this came up was the claim that the 
> Uniprot URI should be used to identify a set of real things."

OK, I think that describes my current point of view.


> I get confused when I read statements that sound like "x means the 
> same thing in in all databases, except it might mean something 
> different in a database that isn't Uniprot". I'm sure this isn't what 
> you mean. What do you mean?

"x means the same thing in in all databases" -> not! What UniProt would
consider to be a "protein" likely differs a bit from what EMBL treats as
a "protein", which in turn differs from what John Doe considers a
"protein".

Since everyone seems to have their own idea of what's the best way to
make "sets of real things", there doesn't seem to be much of a point in
distinguishing between the sets and the "records" that describe the
sets?

Of course there are often going to be strong correspondences, which is
why mapping tools are really important, but to think that you could
create the one true system (TM) that has the "proper" concepts that
everyone should map to because their databases contain mere records
seems like a fallacy!


> I will read "protein" as "protein class", so as not to confuse the set

> with the individual member of the set, OK?

OK, "protein class". The individual member would be a real "protein
molecule" that exists somewhere for real, perhaps in a test tube :-)


> When someone makes a statement, such as the ones about the BAG-1 
> isoforms I cite in another message to Phil, I don't think that we
should 
> say this is an artificial set of real things.  While it may be the
case 
> that there is a certain amount of ambiguity in exactly which set of 
> proteins "BAG-1 p33" identifies, we know some things that I think
would 
> be profitable to be conveyed in OWL.

If someone mentions some name like BAG-1, it's not always clear what is 
meant, and in fact this may depend on the field of research of the
author. 
Someone with more experience in text mining could probably comment on
this.

The "namespace" for "BAG-1" here is the article (being conservative). 
Ideally you'd want to map this to something that is more widely
used/known, 
such as HGNC [http://purl.uniprot.org/hgnc/HGNC:937] (specific for human

stuff), or perhaps even UniProt
[http://purl.uniprot.org/uniprot/Q99933].


> For example:
> 
> a) There is no protein that is both a member of the set "BAG-1 p33" 
> identifies and also a member of the set "BAG-1 p33" identifies.
> 
> b) If it turns out at a later date that the properties (e.g. being
able 
> to inhibit apoptosis) ascribed to proteins in the set identified by 
> "BAG-1 p33" only were true when the protein was phosphorylated, and
some 
> different, conflicting properties(e.g. not being able to inhibit 
> apoptosis) became known of the unphosphorylated ones, then we would
have 
> to say that our original statements about "BAG-1 p33" needed to be 
> modified to be statements about the set of proteins identified as
e.g. 
> "phospho BAG-1 p33". I.e. we would name a new set of things: "phospho 
> BAG-1 p33", know it was a subset of the set of things identified as 
> "BAG-1 p33", that it was also disjoint from the set of things
identified 
> by "BAG-1 p29". We would be able to answer the question: If we cause 
> "BAG-1 p33" proteins to be overexpressed, but knock out the kinase
that 
> phosphorylates such proteins, do we expect(or do we have any evidence
to 
> support believing) apoptosis to be inhibited?

Received on Friday, 20 July 2007 17:56:59 UTC