Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1) from Eric Jain on 2007-07-17 (public-semweb-lifesci@w3.org from July 2007)

From: Eric Jain <Eric.Jain@isb-sib.ch>
Date: Tue, 17 Jul 2007 10:33:02 +0200
To: Alan Ruttenberg <alanruttenberg@gmail.com>
CC: Chris Mungall <cjm@fruitfly.org>, Bijan Parsia <bparsia@cs.man.ac.uk>, public-semweb-lifesci hcls <public-semweb-lifesci@w3.org>, Darren Natale <dan5@georgetown.edu>
Message-ID: <469C7EBE.1020204@isb-sib.ch>

Alan Ruttenberg wrote:
> To clarify, no, I didn't mean this. I meant that the definition of 
> Uniprot records are already broad in the sense that sometimes multiple 
> splice variants are included in a single record, as are population and 
> disease-causing variants, according to Eric. Basically I don't know what 
> set of proteins people currently intend to denote when they use a 
> uniprot id as a protein, and I'm not entirely certain what the curators 
> intend. So step one would be an english description of how to figure out 
> what the curator's intent is, and we could go on from there to define 
> OWL definitions based on that. I suspect that people currently using 
> Uniprot ids may be using them in both broader and narrow ways, but we 
> could leave the discovery of such cases to a reasoner once we had the 
> basics in place.

People do indeed use UniProtKB identifiers in both broad and narrow ways: 
The narrow way is to talk about the exact, main sequence that is shown...

In any case, I'm not too optimistic about being able to define our concepts 
in a strict, yet meaningful way, as often it's practical criteria that are 
used to decide, e.g. here's what one of our curators has to say on this:

"[Usually] we have one entry per gene. We have several entries for a single 
gene when description of variations are too complicated to describe in FT 
lines (of course, this criteria depends on the annotator). For viruses, it 
is much more messy, due to ribosomal frameshifts."

Formalize that! :-)

Received on Tuesday, 17 July 2007 08:33:18 UTC