Re: proposal for standard NCBI database URI from Tony Hammond on 2006-05-09 (public-semweb-lifesci@w3.org from May 2006)

From: Tony Hammond <t.hammond@nature.com>
Date: Tue, 09 May 2006 10:46:42 +0100
To: Matthias Samwald <samwald@gmx.at>, <public-semweb-lifesci@w3.org>
Message-ID: <C0862392.C953%t.hammond@nature.com>
Not sure of the relevance here but see you might like to consider also this:
    
http://info-uri.info/registry/OAIHandler?verb=GetRecord&metadataPrefix=reg&i
dentifier=info:pmid/

PubMed identifiers are already registered under the INFO URI namespace (RFC
4452 - http://www.ietf.org/rfc/rfc4452.txt).

"e) info:pmid/12376099

   where "pmid" is the "namespace" component for a PubMed Identifier
   [PMID] namespace and "12376099" is the "identifier" component for an
   identifier of an information asset in that namespace.

   The information asset identified by the identifier "12376099" in the
   namespace for PubMed Identifiers is the metadata record in the PubMed
   database that describes the journal article

       "Wijesuriya SD, Bristow J, Miller WL. Localization and analysis
       of the principal promoter for human tenascin-X. Genomics. 2002
       Oct;80(4):443-52."
"

Cheers,

Tony


On 9/5/06 09:46, "Matthias Samwald" <samwald@gmx.at> wrote:9/5/06 09:46

> 
> Hi Alan,
> 
>>  As far as I know there is no standard URI for a resource at NCBI. I
>>  would like to propose that there be one, since we will all need
>>  them to use when we refer to these resources  in our RDF. (and I
>>  need one *now*)
> 
> I think we should be aware that this could be a VERY important decision for
> the further development of RDF in the life sciences. The URI - scheme we come
> up with during this project would probably become THE standard for referencing
> ressources at the NCBI. I guess we should try to contact someone from the NCBI
> to make sure the soloution we come up with is acceptable to them. Maybe they
> will soon realize the need for URIs themselves and start creating their own,
> conflicting URI scheme. The last thing the Semantic Web would need would be
> two different URIs for each of the many ressources in the Entrez databases.
> 
> 
>>  Following other styles I've seen, I propose the following:
>> 
>> 
>>  1. http://www.ncbi.nlm.nih.gov/2006/entrez/<DATABASE_GOES_HERE>/
>>  <IDENTIFIER_GOES_HERE>
>> 
>>  or
>> 
>> 
>>  2. http://www.ncbi.nlm.nih.gov/2006/entrez/
>>  <DATABASE_GOES_HERE>#<IDENTIFIER_GOES_HERE>
> 
> We should have a look at how applications (especially triplestores) handle
> this. Do they know how to split namespace from identifier in the first case? I
> remember that the current version of the triplestore Sesame has some
> performance - problems when handling URNs, because he splits namespace and
> identifier in a wrong way (creating a new namespace for almost every
> resource). I know that, according to the RDF specification, the RDF ID is just
> an opaque string, but applications do handle that differently.
> 
>>  Rational: can use owl:sameAs to make them the same if we need to.
>>  We can suggest a best practice if we want to preferentially use one
>>  numbering system versus another. (I like the alphanumeric ones,
>>  myself)
> 
> We would not be happy to have huge amounts of redundant resources linked with
> owl:sameAs. owl:sameAs is nice when it only needs to be used sparingly, but
> having two different naming schemes of a large protein database linked through
> owl:sameAs would 'pollute' the Semantic Web right from the beginning. We
> should seek to avoid this when we are still in the position to do so.
> 
> 
> kind regards,
> Matthias Samwald
> 
> 
> 
> http://neuroscientific.net
> 
> Section on Medical Expert and Knowledge-Based Systems
> Core Unit for Medical Statistics and Informatics
> Medical University of Vienna/Austria
> http://www.meduniwien.ac.at/mes/home_en.html
> 

********************************************************************************   
DISCLAIMER: This e-mail is confidential and should not be used by anyone who is
not the original intended recipient. If you have received this e-mail in error
please inform the sender and delete it from your mailbox or any other storage
mechanism. Neither Macmillan Publishers Limited nor any of its agents accept
liability for any statements made which are clearly the sender's own and not
expressly made on behalf of Macmillan Publishers Limited or one of its agents.
Please note that neither Macmillan Publishers Limited nor any of its agents
accept any responsibility for viruses that may be contained in this e-mail or
its attachments and it is your responsibility to scan the e-mail and 
attachments (if any). No contracts may be concluded on behalf of Macmillan 
Publishers Limited or its agents by means of e-mail communication. Macmillan 
Publishers Limited Registered in England and Wales with registered number 785998 
Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS   
********************************************************************************
Received on Tuesday, 9 May 2006 09:46:45 UTC