Re: Ambiguous names. was: Re: URL +1, LSID -1 from Alan Ruttenberg on 2007-07-16 (public-semweb-lifesci@w3.org from July 2007)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Mon, 16 Jul 2007 11:37:30 -0400
To: Phillip Lord <phillip.lord@newcastle.ac.uk>
Cc: public-semweb-lifesci <public-semweb-lifesci@w3.org>
Message-Id: <0A05B149-8401-4E17-A0D5-2628635E447B@gmail.com>

On Jul 16, 2007, at 10:19 AM, Phillip Lord wrote:

>
>>>>>> "MK" == Marijke Keet <keet@inf.unibz.it> writes:
>
>   MK> Lack of sufficient knowledge about a particular (biological)  
> entity is
>   MK> a sideshow, not an argument, to the issue of distinguishing  
> real proteins from
>   MK> their records.
>
> I agree. The argument is that it's very hard to describe what you  
> mean by a
> "protein". We almost certainly don't mean a protein molecule. We  
> might mean a type of
> protein. But then we don't know whether two protein molecule are  
> actually of a given
> type.

I'm confused. I think we all would agree that there are instances of  
proteins and we have a good idea of what they are. We also know that  
there are groups of proteins that are built off the same template and  
share certain properties. If we define classes using such properties,  
then we can in principle, decide whether these proteins are members  
of a given class (subject to experimental limitations).  For instance  
we can define a class of proteins that  have a certain primary  
structure (aa sequence), and then, via assay, measure what fraction  
of the proteins in some sample have that structure.

> My questions are how often do we want to refer to a protein, rather  
> than a record
> about a protein?

Any time we want to make a scientific statement about proteins.  In  
my work, that means virtually all the time. For example, I have a  
body of work that is the target of text mining at the moment - If the  
text mining worked well enough to understand the articles, what  
should it generate for semantic web consumption?

> And who is responsible for ascribing a ID to a specific type of  
> protein. In practice, in bioinformatics, the answer to this is a)  
> we don't and b) uniprot.

I agree with a) - we mostly don't and when we do we do it in an  
unclear and nonstandard way. I disagree with b) Exactly what the  
class of proteins described by a uniprot record is not clear (though  
Eric started to make a theory of what it could be). I have seen  
uniprot ids used even to identify antibodies to a protein.

As for who is responsible, I would say that our community is  
responsible. I expect that there will be efforts along this line in  
the OBO Foundry and I would hope that there would be broad  
participation from the people who are interested in following this list.

> So, while distinguishing between a uniprot record and a protein  
> seems like a good
> idea, I'm not convinced it brings you anything. What are you going  
> to do with your
> protein ID?

I would like to be able to have Invitrogen be able to say that  
product xxxyyy is an antibody to some specific class of  
phosphoproteins in a way that a semantic web agent could do some  
shopping for me if I needed such a reagent. I could go on and name a  
long list of such cases, but I'm pretty sure you could do the same  
thing, notwithstanding your playing dumb here.

-Alan

ps. Hi Phil - glad you're joining the party!

>
> Phil
>
>
> -- 
> Phillip Lord,                           Phone: +44 (0) 191 222 7827
> Lecturer in Bioinformatics,             Email:  
> phillip.lord@newcastle.ac.uk
> School of Computing Science,            http:// 
> homepages.cs.ncl.ac.uk/phillip.lord
> Claremont Tower Room 909,               skype: russet_apples
> Newcastle University,
> NE1 7RU
>

Received on Monday, 16 July 2007 15:37:42 UTC