Re: proposal for standard NCBI database URI from Matthias Samwald on 2006-05-09 (public-semweb-lifesci@w3.org from May 2006)

From: Matthias Samwald <samwald@gmx.at>
Date: Tue, 9 May 2006 10:46:22 +0200
To: <public-semweb-lifesci@w3.org>
Message-ID: <200659104622.934320@cqueberel>

Hi Alan,

>�As far as I know there is no standard URI for a resource at NCBI. I
>�would like to propose that there be one, since we will all need
>�them to use when we refer to these resources �in our RDF. (and I
>�need one *now*)

I think we should be aware that this could be a VERY important decision for the further development of RDF in the life sciences. The URI - scheme we come up with during this project would probably become THE standard for referencing ressources at the NCBI. I guess we should try to contact someone from the NCBI to make sure the soloution we come up with is acceptable to them. Maybe they will soon realize the need for URIs themselves and start creating their own, conflicting URI scheme. The last thing the Semantic Web would need would be two different URIs for each of the many ressources in the Entrez databases.


>�Following other styles I've seen, I propose the following:
>
>
>�1. http://www.ncbi.nlm.nih.gov/2006/entrez/<DATABASE_GOES_HERE>/
>�<IDENTIFIER_GOES_HERE>
>
>�or
>
>
>�2. http://www.ncbi.nlm.nih.gov/2006/entrez/
>�<DATABASE_GOES_HERE>#<IDENTIFIER_GOES_HERE>

We should have a look at how applications (especially triplestores) handle this. Do they know how to split namespace from identifier in the first case? I remember that the current version of the triplestore Sesame has some performance - problems when handling URNs, because he splits namespace and identifier in a wrong way (creating a new namespace for almost every resource). I know that, according to the RDF specification, the RDF ID is just an opaque string, but applications do handle that differently.

>�Rational: can use owl:sameAs to make them the same if we need to.
>�We can suggest a best practice if we want to preferentially use one
>�numbering system versus another. (I like the alphanumeric ones,
>�myself)

We would not be happy to have huge amounts of redundant resources linked with owl:sameAs. owl:sameAs is nice when it only needs to be used sparingly, but having two different naming schemes of a large protein database linked through owl:sameAs would 'pollute' the Semantic Web right from the beginning. We should seek to avoid this when we are still in the position to do so.


kind regards,
Matthias Samwald



http://neuroscientific.net

Section on Medical Expert and Knowledge-Based Systems
Core Unit for Medical Statistics and Informatics
Medical University of Vienna/Austria
http://www.meduniwien.ac.at/mes/home_en.html

Received on Tuesday, 9 May 2006 09:24:51 UTC