Re: Ambiguous names. was: Re: URL +1, LSID -1

>>>>> "Alan" == Alan Ruttenberg <alanruttenberg@gmail.com> writes:

  Alan> Summary: Answering Phil's questions, and clarifying one thing he
  Alan> asserts about what I said.

  >> What if they have a polymorphism?
  Alan> No.
  >> Are two isoforms from an alternate splice the same protein?
  Alan> No.


In both of these you differ from uniprot. 

  >> Unsatisfying, maybe. Clear definitions are important. But
  >> interoperability, and the lack of duplication are more so.

  Alan> Forgive my confusion, but how exactly will we achieve interoperability
  Alan> and lack of duplication if we don't have definitions? How would we
  Alan> know that we don't have duplication, for example?


If you create identifiers to describe proteins rather than protein records
(like uniprot) then you have created a whole new set of IDs. When anyone wants
to talk about a protein, they will have to look up the ID.

  >> <snip>

  >> And, yet, you just told me that you could buy a antibody with just a
  >> swissprot ID. So, let me restate the question, what are you going to do
  >> with a protein ID that you are not going to do with a swissprot ID, or
  >> "the protein formally known as OPSD_HUMAN".

  Alan> I did not say that. I've said some people have identified antibodies
  Alan> by such ids. Unfortunately this information is of limited use when
  Alan> actually ordering an antibody, where I am interested in much more
  Alan> information, such as how specific it is, how it has been validated,
  Alan> and other properties related to how it behaves in certain experimental
  Alan> settings. I *want* to be able to have identifiers(URIs) that are up to
  Alan> the job of ordering reagents.

Well, I am not sure that you are going to achieve this with an identifier. You
need significant extra amounts of metadata. 

My point here is simple. Separating out the informatics and biology conform
better to our notion of reality, sure. But you are talking about modelling
what makes a protein and, more, a type of protein. Work through your scenarios
and see whether you need a protein ID for this. If not, you are introducing a
layer of abstraction that you don't need. 

Phil





-- 
Phillip Lord,                           Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics,             Email: phillip.lord@newcastle.ac.uk
School of Computing Science,            http://homepages.cs.ncl.ac.uk/phillip.lord
Claremont Tower Room 909,               skype: russet_apples
Newcastle University,                   
NE1 7RU

Received on Thursday, 19 July 2007 13:29:38 UTC