- From: Marc-Alexandre Nolin <lotus@ieee.org>
- Date: Thu, 1 Nov 2007 02:31:58 -0400
- To: public-semweb-lifesci@w3.org
- Cc: "Jonathan Rees" <jar@creativecommons.org>
Hi, The following are my comments about the TNS draft at http://sw.neurocommons.org/2007/uri-note/ and Major remaining trouble spots from http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/Recommendations To begin with, from the question about "Attitude Toward Nonlocators" in the major remaining trouble is that HTTP is OK. I use http identifier in Bio2RDF.org the same way Purl.org do ; with a REST like interface (http://purl.org/commons/xml/pmid/PM15548600 or http://bio2rdf.org/xml/pubmed:15548600). Also, many public ontologies like RDF, OWL are http base and we can already handle them. If we are to choose a string of characters to be an URI to identify an item of life sciences, I just find it logical to get the method of retrieval at the same time as I get the identifier. Another major point is about Racine Sharing with the #. I strongly discourage this practice for big knowledge base. It is only usable with little amount of instance. For example PubChem, if we use Racine Sharing, an URIs would look like http://view.ncbi.nlm.nih.gov/pccompound#id. The problem is, there are 17 millions ids that take about 32 Gb of gziped XML. The retrieval would be awfully long. Since the specific question of Jonathan is about what to put between de // and the first /, I would say that Purl.org is the best compromise because it has the infrastructure already in place, is open and offer a more neutral ground than other proxy like Bio2RDF.org because it's sciences commons. Big data provider (Uniprot, NCBI, EBI, Kegg, etc) might probably do without it because they have the capability to handle the data themself (like Uniprot http://purl.uniprot.org/uniprot/P19367.rdf . Purl is in the URI, but as a sub-domain of uniprot and not purl.org itself), but small provider migth found with the purl.org solution a convenient way to create and managed URIs. Purl.org (or Bio2RDF.org for some data provider) is also a good way to retrieve RDF from provider that don't produce RDF thenself yet, maybe someone elsewhere does and we can redirect to it while waiting for the official source to do it. But what is between the // and the first / isn't that important in the end. There will be many domain that will provide RDF, be it as a proxy that give RDF from a none RDF source or as a LSID resolver like http://lsid.biopathways.org/resolver/. That's what come after the first / that is a problem. What I would really like to see is simply a web page on a data provider web site explaining how people should refers to their content with URIs. The data provider would need to provide some kind of commitment about keeping these URIs as stable as possible. A page like this on Uniprot would look like this: To refers to a Uniprot item write it this way http://purl.uniprot.org/<database>/<id><.service> where database could be one of {uniprot | citations | etc } id is the identifier of the item and .service, what we want to receive from this id {xml | text | rdf | fasta | etc }. All of this string must be in lowercase The same page from NCBI could look exactly like http://view.ncbi.nlm.nih.gov/ but in the verb slot, we would add different format retrieval like rdf, xml, asn.1, etc. If another data provider publish a similar page and use purl.org scheme instead of his own domain, so be it, as long as it is detailled correctly. Now everyone that follow the rules about how to refers to an items from a specific data provider with an URI will connect together easily. This would render Bio2RDF mostly obselette because one of the added values that Bio2RDF give is the rewriting of URIs into its own namespace to be consistent from one document to another to create a web of linked data where there was none. For example, take this RDF document from Uniprot http://purl.uniprot.org/uniprot/P19367.rdf and look at the entry http://purl.uniprot.org/geneid/3098. If NCBI would have publish RDF URIs of there data, the URI here might be http://view.ncbi.nlm.nih.gov/gene/rdf/3098. This, without anything to add in between like lsid resolver, 303 redirect or #, will create linked data. That being said, I know that NCBI doesn't provide RDF version of their data yet and what I just wrote does not actually work, but if I put this in context of the draft which is a recommendation about best practice to mint URIs, this make sense. In conclusion, I support Http URIs. I strongly discourage Racine Sharing. We can't control what will be between the // and the first /, but as a recommendation for research center, without big IT budget, to create new URIs as soon as possible, I would recommend Purl.org. I'm for simple rules on a per data provider basis available on their web site (these rules could also be written in RDF, I don't see any problem with that). When a data manager have to create a triplestore and he know he will write PubMed paper and Uniprot protein, he go to these site and see how to refers to these entities with URIs. Now his triplestore is already usable in linked data. thanks, Marc-Alexandre Nolin P.S.:I apologize for my bad english. I wish my reflexion wasn't blur because of it. If clarification is needed, just ask me for it. 2007/10/29, Jonathan Rees <jar@creativecommons.org>: > On Oct 23, 2007, at 9:58 AM, Marc-Alexandre Nolin wrote: > > > Currently, I'm waiting for the publication of Jonathan URI > > recommendation to add it to the Bio2RDF system. Adding the support to > > the standardization effort doesn't mean to throw away the previous > > working system :) > > > > Marc-Alexandre > > I appreciate your confidence! I am hoping to release a draft of the > URI note to HCLS at the end of this week. It would be extremely > helpful to me if you would give your advice on common names for > public database records. I think you have seen the science commons > proposal, and your comments on that would be interesting. I have a > "major issue" page on this topic: > http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/ > URI_Best_Practices/Recommendations/PublicResources > > Since yours is the only other careful effort I know of along these > lines, I'd be interested to know whether you recommend what you have > for HCLS purposes, and what would be required to reconcile bio2rdf > with purl.org/commons (besides finishing the implementation of the > latter by making it yield RDF). I'm particularly interested in > opinions on what goes between the // and the first /. > > Jonathan > >
Received on Thursday, 1 November 2007 06:32:09 UTC