- From: Phillip Lord <phillip.lord@newcastle.ac.uk>
- Date: Mon, 16 Jul 2007 17:22:26 +0100
- To: Alan Ruttenberg <alanruttenberg@gmail.com>
- Cc: public-semweb-lifesci <public-semweb-lifesci@w3.org>
>>>>> "Alan" == Alan Ruttenberg <alanruttenberg@gmail.com> writes: >> I agree. The argument is that it's very hard to describe what you mean by >> a "protein". We almost certainly don't mean a protein molecule. We might >> mean a type of protein. But then we don't know whether two protein >> molecule are actually of a given type. Alan> I'm confused. I think we all would agree that there are instances of Alan> proteins and we have a good idea of what they are. We also know that Alan> there are groups of proteins that are built off the same template and Alan> share certain properties. Take these rhethorical questions: Is Red Opsin in human the same as Red Opsin in Cattle? Is Red Opsin in me, necessarily the same as Red Opsin in you? What if they have a polymorphism? Are two isoforms from an alternate splice the same protein? If a protein has been partly digested, is it still the same? Are haemoglobin alpha and beta the same? >> My questions are how often do we want to refer to a protein, rather than >> a record about a protein? Alan> Any time we want to make a scientific statement about proteins. In my Alan> work, that means virtually all the time. For example, I have a body of Alan> work that is the target of text mining at the moment - If the text Alan> mining worked well enough to understand the articles, what should it Alan> generate for semantic web consumption? The point is that you can't deal with a protein computationally. You can't resolve it, analyze it computationally. It's always second hand information that you want to deal with. >> And who is responsible for ascribing a ID to a specific type of >> protein. In practice, in bioinformatics, the answer to this is a) we >> don't and b) uniprot. Alan> I agree with a) - we mostly don't and when we do we do it in an Alan> unclear and nonstandard way. I disagree with b) Exactly what the class Alan> of proteins described by a uniprot record is not clear (though Eric Alan> started to make a theory of what it could be). I have seen uniprot ids Alan> used even to identify antibodies to a protein. Yes, exactly. A uniprot record defines a class of proteins extensionally. This means, antibodies to the proteins described by OPSD_HUMAN (for example). Alan> As for who is responsible, I would say that our community is Alan> responsible. I expect that there will be efforts along this line in Alan> the OBO Foundry and I would hope that there would be broad Alan> participation from the people who are interested in following this Alan> list. And I would say not. Uniprot are the people who understand proteins, they are the people who already have defined procedures for determining whether one protein is the same as another, who have answered the questions above and who will go back through the resource and update it as biological knowledge changes. And it's a big job. There are 100 annotators working at this. More over, Uniprot are the people who are trusted to make the right decisions, not us. It would be more satisfying for us to know intentionally what we mean by "protein". It would be good to have a clear set of definitions. But, ultimately, I think it would be mistaken. If we have the ability to express "the class of protein molecules defined by the swissprot record OPSD_HUMAN", then I think we have all we need. If we make our own definitions, all that we have done is duplicate what the uniprot team are already doing. And we will, almost inevitably, do it somewhat differently. All we would do is create confusion. The only way that we ensure that we do the same thing as uniprot is say "yeah, what they said". Unsatisfying, maybe. Clear definitions are important. But interoperability, and the lack of duplication are more so. >> So, while distinguishing between a uniprot record and a protein seems >> like a good idea, I'm not convinced it brings you anything. What are you >> going to do with your protein ID? Alan> I would like to be able to have Invitrogen be able to say that product Alan> xxxyyy is an antibody to some specific class of phosphoproteins in a Alan> way that a semantic web agent could do some shopping for me if I Alan> needed such a reagent. And, yet, you just told me that you could buy a antibody with just a swissprot ID. So, let me restate the question, what are you going to do with a protein ID that you are not going to do with a swissprot ID, or "the protein formally known as OPSD_HUMAN". Phil
Received on Monday, 16 July 2007 16:22:46 UTC