Re: ontology specs for self-publishing experiment

>>>>> "LH" == Larry Hunter <Larry.Hunter@uchsc.edu> writes:

  LH> On Mon, 2006-07-10 at 11:42 +0100, Phillip Lord wrote:

  >> 
  >> My own feeling is that the fly people got it right years
  >> ago. Their gene identifiers had meaning, but not too much. So,
  >> for example, sevenless is a mutant lacking the 7th cell in the
  >> eye. Clear, straight forward and memorable. And if the world
  >> changes under you, the name could be left the same because it
  >> doesn't really matter that much.

  LH> And hugely, miserably ambiguous.  The use of regular English
  LH> words to represent drosophila gene names has significantly held
  LH> back the application of information extraction technology for
  LH> that model organism, and wrecked all sorts of other havoc.  


This is a good point. I hadn't thought about information extraction. 


  LH> You can't even look up the "to" gene in NCBI -- its filtered out
  LH> of the query as a stop word -- but it's in there:

You could search for "takeout" however, with more success. 

  LH> I would say that the yeast people got it right.  Unambiguous
  LH> identifiers that can clearly be recognized as such.  

Maybe. I used to work on yeast in my dim, distant youth. Following
talks on any of the various talks on CDC mutants was a nightmare,
particular given that all the numbers are different between pombe and
cerevisiae. 

Both the fly and pombe people actually identify their genes with a
standard "markup" in the sense that they put gene names into
italics. Obviously not that great for information extraction, but it
worked well for humans. In the ideal world, using unambiguous markup
to state that the entity was an identifier would solve the
problem. Even something crude like "gene:to", or "gene:takeout" would
solve your particular problem I think. 

  LH> The amazing thing was that the community agreed to use those
  LH> names in papers, rather than reserve "naming rights" for the
  LH> "discoverer" of the gene.  As usual, the trick is social, not
  LH> technological.

Couldn't agree more. Best technology in the world is useless if no one
uses it. Of course, community agreement doesn't mean that good
technology results -- the wide agreement that two letter prefixed
flat-files is the best way to structure data is a case in point. 

Phil

Received on Wednesday, 12 July 2006 10:51:10 UTC