- From: Alan Ruttenberg <alanruttenberg@gmail.com>
- Date: Wed, 11 Jul 2007 05:15:53 -0400
- To: Eric Jain <Eric.Jain@isb-sib.ch>
- Cc: Michel_Dumontier <Michel_Dumontier@carleton.ca>, public-semweb-lifesci <public-semweb-lifesci@w3.org>, Mark Wilkinson <markw@illuminae.com>, Benjamin Good <goodb@interchange.ubc.ca>, Natalia Villanueva Rosales <naty.vr@gmail.com>
On Jul 11, 2007, at 3:53 AM, Eric Jain wrote: > > Alan Ruttenberg wrote: >> On Jul 11, 2007, at 3:16 AM, Eric Jain wrote: >>> http://purl.uniprot.org/uniprot/P12345 does not identify an RDF >>> resource, it represents our concept of some protein. >> What concept would that be? What are instances of the class of >> proteins that this identifiers denotes? >> (serious question) > > Some resources are quite simple and straightforward to understand, > e.g. http://purl.uniprot.org/uniparc/UPI00001328C5 represents a > specific amino acid sequence, The instances are sequences of letters? Qualities of a class of molecules? The molecules themselves? > and e,g, http://purl.uniprot.org/taxonomy/9606 represents a > specific organism (though there are some complications there, too...) (indeed) > > The resources in the http://purl.uniprot.org/uniprot/ namespace are > a bit more complicated, basically it's annotation for a sequence in > an organism: Are there sequences in organisms? Or are there polypeptides? Which do the records represent? If the proteins, then in all states - unfolded, folded, misfolded, phophorylated, glycosylated etc? Do the set of sequences/proteins include common(in the organism's population) non-function-changing mutants? > > http://purl.uniprot.org/uniprot/P60484 (Human) > http://purl.uniprot.org/uniprot/P60483 (same sequence, but Dog) What is the same about them? > > ...but these resources may also include annotation for related > sequences produced e.g. by alternative splicing: > > http://purl.uniprot.org/uniprot/P00750 (Human, 3 sequences) > > ...provided the function of the resulting sequences are not so > different that they warrant resources of their own... How different do they have to be? These might seem to be silly questions "everyone knows what they mean", but I don't think so. Would you use these identifiers to uniquely enough identify a protein if your life depended on it? I think that this is the standard that we should be aiming for - after all, people's lives do/will depend on it. What I'm trying to point out with these questions is that the uniprot records are not trivially interpretable as "concepts", and that it might be better to not even try in the first place. Rather leave them be database records, and separately create an ontology of proteins that might use the records, or aspects of the records in part of the formal definitions of those proteins. -Alan
Received on Wednesday, 11 July 2007 09:16:00 UTC