Re: Ambiguous names. was: Re: URL +1, LSID -1 from Eric Jain on 2007-07-16 (public-semweb-lifesci@w3.org from July 2007)

From: Eric Jain <Eric.Jain@isb-sib.ch>
Date: Mon, 16 Jul 2007 18:06:10 +0200
To: Alan Ruttenberg <alanruttenberg@gmail.com>
CC: Phillip Lord <phillip.lord@newcastle.ac.uk>, public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-ID: <469B9772.9090204@isb-sib.ch>

Alan Ruttenberg wrote:
> I'm confused. I think we all would agree that there are instances of 
> proteins and we have a good idea of what they are. We also know that 
> there are groups of proteins that are built off the same template and 
> share certain properties. If we define classes using such properties, 
> then we can in principle, decide whether these proteins are members of a 
> given class (subject to experimental limitations).  For instance we can 
> define a class of proteins that  have a certain primary structure (aa 
> sequence), and then, via assay, measure what fraction of the proteins in 
> some sample have that structure.

One of the biggest (and perhaps most appreciated) jobs of our curators is 
to review all the different sequences that have been submitted, and figure 
out which is most likely to be the correct sequence. But this means what 
you get is an interpretation of our curators, which some may disagree with.

Note that you can build a database around sequence identity, but it seems 
that this is of limited use (see http://beta.uniprot.org/uniparc/). In 
order to make something that's more useful, we aggregate information about 
minor (and sometimes less minor) variations, and separate by organism (but 
not by strain, so far). The aggregating, again, makes things a bit fuzzy...

Received on Monday, 16 July 2007 16:06:26 UTC