Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1) from Darren Natale on 2007-07-19 (public-semweb-lifesci@w3.org from July 2007)

From: Darren Natale <dan5@georgetown.edu>
Date: Thu, 19 Jul 2007 17:03:08 -0400
To: June Kinoshita <junekino@media.mit.edu>
CC: Eric Jain <Eric.Jain@isb-sib.ch>, Alan Ruttenberg <alanruttenberg@gmail.com>, Chris Mungall <cjm@fruitfly.org>, Bijan Parsia <bparsia@cs.man.ac.uk>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>, Cathy Wu <wuc@georgetown.edu>, phismith@buffalo.edu, jblake@informatics.jax.org
Message-ID: <469FD18C.5090005@georgetown.edu>

Quite a nice example!  These are the sorts of issues that we must 
contend with while creating the PRO framework.  In fact, this addresses 
another issue of scope; that is, whether or not (in the long or short 
term) to also account for homodimers, trimers, and so on (currently, GO 
handles hetermeric complexes).  This also provides a good opportunity 
for me to mention that our most immediate goal is to provide a framework 
that can be built upon by others as well as us.  That is, we would 
encourage you to unfold your own corner of the protein world!  ;)

June Kinoshita wrote:
> If I may put forward a key protein in Alzheimer disease as an example 
> that we are grappling with, there is full-length APP (which itself has a 
> number of forms as well as mutations); various peptides derived from 
> cleavage of APP; and then multimeric forms of the peptides, particularly 
> Abeta42, which is known to form soluble dimer, trimer, tetramer, 
> hectamer, and dodecamer, each of which may have different functions or 
> toxicities, as well as "misfolded" protofibrillar and insoluble 
> fibrillar forms, and possibly a pore-like form consisting of 
> I-forget-how-many Abetas. In addition, proteins form complexes that have 
> functions that are different from those of the non-complexed protein. I 
> look forward to seeing how the Protein Ontology unfolds, so to speak! - 
> June
> 
> On Jul 19, 2007, at 11:23 AM, Darren Natale wrote:
> 
>>
>> We don't yet have formal definitions for many of the classes and 
>> relations (the effort only began in earnest a few months ago).  But, 
>> basically, there is a distinction made between the full-length (in 
>> terms of amino acid sequence) protein and the sub-length parts of 
>> proteins (commonly called domains by protein scientists, 
>> unfortunately).  The term "whole protein" is somewhat of a 
>> placeholder; it is used to signify the evolutionary classes (families) 
>> of full-length proteins as opposed to the evolutionary classes of 
>> domains.  Sequence form is again a placeholder term used to denote the 
>> initial translation product from an mRNA, which itself might be based 
>> on a "normal" gene or a mutant thereof, or which might be one of 
>> several possible alternatively spliced transcripts from the normal or 
>> mutant gene.  The cleaved or modified product is a further breakdown 
>> of those initial translation products, and allows one to distinguish 
>> between a phosphorylated version of a protein and the 
>> non-phosphorylated version (as an example).  The need for the latter 
>> derives from the fact that the two versions might have different 
>> functions.
>>
>> Eric Jain wrote:
>>> Darren Natale wrote:
>>>> We recently began a new Protein Ontology (PRO) effort geared 
>>>> precisely toward the formal definition of the "smaller entities" 
>>>> referred to by Alan.  By "we" I mean the PRO Consortium, comprising 
>>>> the PIs Cathy Wu of PIR (which is also a member organization of the 
>>>> UniProt Consortium), Barry Smith of SUNY Buffalo, and Judy Blake of 
>>>> Jackson Labs.  PRO is being developed within the framework of the 
>>>> OBO Foundry, and aims to specify protein entities at the level 
>>>> mentioned by Chris (accounting for splice variation and 
>>>> post-translational modification and cleavage). Where appropriate, 
>>>> PRO will indeed make reference to both other ontologies and to 
>>>> UniProt Knowledgebase (UniProtKB) records. Furthermore, we are also 
>>>> undertaking the "wildly ambitious" job of representing broader, 
>>>> more-inclusive classes of similar proteins based on evolutionary 
>>>> relatedness.
>>>>
>>>> A further description of PRO (with examples and link to a paper) can 
>>>> be found at http://pir.georgetown.edu/pro
>>> This will no doubt be interesting to quite a few people here! For the 
>>> sake of this discussion, could you elaborate a bit more on how the 
>>> different concepts in PRO are defined, i.e. what is a "protein", 
>>> "whole protein", "sequence form" and "cleaved and/or modified product"?
>>
>>

Received on Thursday, 19 July 2007 21:02:21 UTC