- From: Phillip Lord <phillip.lord@newcastle.ac.uk>
- Date: Wed, 12 Jul 2006 11:49:54 +0100
- To: larry.hunter@uchsc.edu
- Cc: w3c semweb hcls <public-semweb-lifesci@w3.org>
>>>>> "LH" == Larry Hunter <Larry.Hunter@uchsc.edu> writes: LH> On Mon, 2006-07-10 at 11:42 +0100, Phillip Lord wrote: >> >> My own feeling is that the fly people got it right years >> ago. Their gene identifiers had meaning, but not too much. So, >> for example, sevenless is a mutant lacking the 7th cell in the >> eye. Clear, straight forward and memorable. And if the world >> changes under you, the name could be left the same because it >> doesn't really matter that much. LH> And hugely, miserably ambiguous. The use of regular English LH> words to represent drosophila gene names has significantly held LH> back the application of information extraction technology for LH> that model organism, and wrecked all sorts of other havoc. This is a good point. I hadn't thought about information extraction. LH> You can't even look up the "to" gene in NCBI -- its filtered out LH> of the query as a stop word -- but it's in there: You could search for "takeout" however, with more success. LH> I would say that the yeast people got it right. Unambiguous LH> identifiers that can clearly be recognized as such. Maybe. I used to work on yeast in my dim, distant youth. Following talks on any of the various talks on CDC mutants was a nightmare, particular given that all the numbers are different between pombe and cerevisiae. Both the fly and pombe people actually identify their genes with a standard "markup" in the sense that they put gene names into italics. Obviously not that great for information extraction, but it worked well for humans. In the ideal world, using unambiguous markup to state that the entity was an identifier would solve the problem. Even something crude like "gene:to", or "gene:takeout" would solve your particular problem I think. LH> The amazing thing was that the community agreed to use those LH> names in papers, rather than reserve "naming rights" for the LH> "discoverer" of the gene. As usual, the trick is social, not LH> technological. Couldn't agree more. Best technology in the world is useless if no one uses it. Of course, community agreement doesn't mean that good technology results -- the wide agreement that two letter prefixed flat-files is the best way to structure data is a case in point. Phil
Received on Wednesday, 12 July 2006 10:51:10 UTC