Re: BioRDF Telcon from Eric Jain on 2007-06-20 (public-semweb-lifesci@w3.org from June 2007)

From: Eric Jain <Eric.Jain@isb-sib.ch>
Date: Wed, 20 Jun 2007 10:49:02 +0200
To: public-semweb-lifesci@w3.org
Message-ID: <4678E9FE.8070602@isb-sib.ch>

Marc-Alexandre Nolin wrote:
> [...] What I wanted to point out with
> this is people work with concept. P19367 is the identificator to
> access the information about the Hexokinase concept in the Uniprot
> database. P19367 doesn't have a sense in itself, it is only a string
> of number with a letter in prefix. P19367 only have a meaning in the
> context of the Uniprot Database. DBpedia or something else could
> provide a hub for the concept that link to ressource that define this
> hub, because there might be many ressources from differents places.

UniProt (and many other databases) use opaque strings like "P19367" rather 
than something more human readable like "Hexokinase" or "Hexokinase (Homo 
sapiens)" in order to keep things a bit more stable and easier to manage.

Also, if you want to represent the abstract "Hexokinase" concept, is it a 
good idea if your identifier changes when you decide that it's more often 
written "Hexo-kinase"? That each synonym would have it's own identifier?

The organization of Wikipedia makes sense for a dictionary, where you want 
to describe words, but I'm not sure it's ideal for representing concepts 
(such as specific protein sequences) that are not defined by a single word!

You could argue that UniProt is more suitable to be a "hub" for such 
concepts because 1. it is more fine grained (i.e. one entry per protein per 
organism), 2. it is far more complete, and 3. it is already connected to a 
large part of the relevant life sciences databases. That said, I don't 
think it is realistic to have one central hub, we'd never agree on that :-)

Received on Wednesday, 20 June 2007 08:49:09 UTC