Re: URL +1, LSID -1

On Jul 11, 2007, at 3:53 AM, Eric Jain wrote:

>
> Alan Ruttenberg wrote:
>> On Jul 11, 2007, at 3:16 AM, Eric Jain wrote:
>>> http://purl.uniprot.org/uniprot/P12345 does not identify an RDF  
>>> resource, it represents our concept of some protein.
>> What concept would that be? What are instances of the class of  
>> proteins that this identifiers denotes?
>> (serious question)
>
> Some resources are quite simple and straightforward to understand,  
> e.g. http://purl.uniprot.org/uniparc/UPI00001328C5 represents a  
> specific amino acid sequence,
The instances are sequences of letters? Qualities of a class of  
molecules? The molecules themselves?

> and e,g, http://purl.uniprot.org/taxonomy/9606 represents a  
> specific organism (though there are some complications there, too...)
(indeed)
>
> The resources in the http://purl.uniprot.org/uniprot/ namespace are  
> a bit more complicated, basically it's annotation for a sequence in  
> an organism:
Are there sequences in organisms? Or are there polypeptides? Which do  
the records represent? If the proteins, then in all states -  
unfolded, folded, misfolded, phophorylated, glycosylated etc?
Do  the set of sequences/proteins include common(in the organism's  
population) non-function-changing mutants?
>
> http://purl.uniprot.org/uniprot/P60484 (Human)
> http://purl.uniprot.org/uniprot/P60483 (same sequence, but Dog)
What is the same about them?
>
> ...but these resources may also include annotation for related  
> sequences produced e.g. by alternative splicing:
>
> http://purl.uniprot.org/uniprot/P00750 (Human, 3 sequences)
>
> ...provided the function of the resulting sequences are not so  
> different that they warrant resources of their own...

How different do they have to be?

These might seem to be silly questions "everyone knows what they  
mean", but I don't think so. Would you use these identifiers to  
uniquely enough identify a protein if your life depended on it? I  
think that this is the standard that we should be aiming for - after  
all, people's lives do/will depend on it.

What I'm trying to point out with these questions is that the uniprot  
records are not trivially interpretable as "concepts", and that it  
might be better to not even try in the first place. Rather leave them  
be database records, and separately create an ontology of proteins  
that might use the records, or aspects of the records in part of the  
formal definitions of those proteins.

-Alan

Received on Wednesday, 11 July 2007 09:16:00 UTC