Re: Ambiguous names. was: Re: URL +1, LSID -1

>>>>> "Alan" == Alan Ruttenberg <alanruttenberg@gmail.com> writes:

  >> I agree. The argument is that it's very hard to describe what you mean by
  >> a "protein". We almost certainly don't mean a protein molecule. We might
  >> mean a type of protein. But then we don't know whether two protein
  >> molecule are actually of a given type.

  Alan> I'm confused. I think we all would agree that there are instances of
  Alan> proteins and we have a good idea of what they are. We also know that
  Alan> there are groups of proteins that are built off the same template and
  Alan> share certain properties. 

Take these rhethorical questions: 

Is Red Opsin in human the same as Red Opsin in Cattle? 

Is Red Opsin in me, necessarily the same as Red Opsin in you? 
What if they have a polymorphism? 

Are two isoforms from an alternate splice the same protein? 

If a protein has been partly digested, is it still the same? 

Are haemoglobin alpha and beta the same? 


  >> My questions are how often do we want to refer to a protein, rather than
  >> a record about a protein?

  Alan> Any time we want to make a scientific statement about proteins.  In my
  Alan> work, that means virtually all the time. For example, I have a body of
  Alan> work that is the target of text mining at the moment - If the text
  Alan> mining worked well enough to understand the articles, what should it
  Alan> generate for semantic web consumption?

The point is that you can't deal with a protein computationally. You can't
resolve it, analyze it computationally. It's always second hand information
that you want to deal with. 


  >> And who is responsible for ascribing a ID to a specific type of
  >> protein. In practice, in bioinformatics, the answer to this is a) we
  >> don't and b) uniprot.

  Alan> I agree with a) - we mostly don't and when we do we do it in an
  Alan> unclear and nonstandard way. I disagree with b) Exactly what the class
  Alan> of proteins described by a uniprot record is not clear (though Eric
  Alan> started to make a theory of what it could be). I have seen uniprot ids
  Alan> used even to identify antibodies to a protein.

Yes, exactly. A uniprot record defines a class of proteins extensionally. This
means, antibodies to the proteins described by OPSD_HUMAN (for example).


  Alan> As for who is responsible, I would say that our community is
  Alan> responsible. I expect that there will be efforts along this line in
  Alan> the OBO Foundry and I would hope that there would be broad
  Alan> participation from the people who are interested in following this
  Alan> list.

And I would say not. Uniprot are the people who understand proteins, they are
the people who already have defined procedures for determining whether one
protein is the same as another, who have answered the questions above and who
will go back through the resource and update it as biological knowledge
changes. And it's a big job. There are 100 annotators working at this.  More
over, Uniprot are the people who are trusted to make the right decisions, not
us.

It would be more satisfying for us to know intentionally what we mean by
"protein". It would be good to have a clear set of definitions. But,
ultimately, I think it would be mistaken. If we have the ability to express
"the class of protein molecules defined by the swissprot record OPSD_HUMAN",
then I think we have all we need. 

If we make our own definitions, all that we have done is duplicate what the
uniprot team are already doing. And we will, almost inevitably, do it somewhat
differently. All we would do is create confusion. The only way that we ensure
that we do the same thing as uniprot is say "yeah, what they said". 

Unsatisfying, maybe. Clear definitions are important. But interoperability,
and the lack of duplication are more so. 

  >> So, while distinguishing between a uniprot record and a protein seems
  >> like a good idea, I'm not convinced it brings you anything. What are you
  >> going to do with your protein ID?

  Alan> I would like to be able to have Invitrogen be able to say that product
  Alan> xxxyyy is an antibody to some specific class of phosphoproteins in a
  Alan> way that a semantic web agent could do some shopping for me if I
  Alan> needed such a reagent.

And, yet, you just told me that you could buy a antibody with just a swissprot
ID. So, let me restate the question, what are you going to do with a protein
ID that you are not going to do with a swissprot ID, or "the protein formally
known as OPSD_HUMAN". 

Phil

Received on Monday, 16 July 2007 16:22:46 UTC