Re: Ambiguous names. was: Re: URL +1, LSID -1 from Alan Ruttenberg on 2007-07-16 (public-semweb-lifesci@w3.org from July 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Mon, 16 Jul 2007 12:15:37 -0400
To: Eric Jain <Eric.Jain@isb-sib.ch>
Cc: Phillip Lord <phillip.lord@newcastle.ac.uk>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-Id: <20E14C06-C978-41DF-AA97-B027DEF2253C@gmail.com>

I'm not advocating that we build definitions around protein  
sequences, just that we build definitions, period.
And that we don't confuse a page of html with a definition.

The uniprot curators are great! They know what they are looking for  
and they are skilled at finding it. Let's put work into formalizing  
whatever we can about what they know so that the fruits of their  
labor can be used effectively on the SW too!

We've got a SW language for making definitions - it's called OWL. If  
we have class names and definitions even for broad classes of  
proteins, then we can start to build new definitions by subclassing  
them, for instance into specific classes of sequence and post- 
translational variants. Lots of work goes on in the scientific  
community to characterize specifics about these subclasses and we  
need a place to anchor that knowledge in the SW.

-Alan

On Jul 16, 2007, at 12:06 PM, Eric Jain wrote:

> Alan Ruttenberg wrote:
>> I'm confused. I think we all would agree that there are instances  
>> of proteins and we have a good idea of what they are. We also know  
>> that there are groups of proteins that are built off the same  
>> template and share certain properties. If we define classes using  
>> such properties, then we can in principle, decide whether these  
>> proteins are members of a given class (subject to experimental  
>> limitations).  For instance we can define a class of proteins  
>> that  have a certain primary structure (aa sequence), and then,  
>> via assay, measure what fraction of the proteins in some sample  
>> have that structure.
>
> One of the biggest (and perhaps most appreciated) jobs of our  
> curators is to review all the different sequences that have been  
> submitted, and figure out which is most likely to be the correct  
> sequence. But this means what you get is an interpretation of our  
> curators, which some may disagree with.
>
> Note that you can build a database around sequence identity, but it  
> seems that this is of limited use (see http://beta.uniprot.org/ 
> uniparc/). In order to make something that's more useful, we  
> aggregate information about minor (and sometimes less minor)  
> variations, and separate by organism (but not by strain, so far).  
> The aggregating, again, makes things a bit fuzzy...
>

Received on Monday, 16 July 2007 16:15:43 UTC