Re: what would change for me? from Peter Ansell on 2007-11-01 (public-semweb-lifesci@w3.org from November 2007)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Fri, 2 Nov 2007 08:23:36 +1000
To: lotus@ieee.org
Cc: public-semweb-lifesci@w3.org, "Jonathan Rees" <jar@creativecommons.org>, p.roe@qut.edu.au, j.hogan@qut.edu.au
Message-ID: <a1be7e0e0711011523o3d928604u529762a67b282dca@mail.gmail.com>
Hi,

My biggest preferences are basically that:
* HTTP GET can be used to retrieve metadata; AND,
* that the metadata identifier be the default for identifiers used as
URI's on other documents (if people are worried that they won't be
able to iterpret their documents then XSLT can be used to transform
documents, the main advantage that bioguid.info has right now is their
development of these XSLT scripts); AND,
* and that metadata is available in RDF (XML vs Turtle etc doesn't
matter to me as it is the extensibility of RDF that makes it so useful
as a language for describing metadata) (this is the main advantage of
bio2rdf.org)

I am a little concerned that PURL.org may simply be used as a wrapper
to data services by default instead of metadata but I guess if I end
up on the minority in this aspect I will deal with it.

I think it is useful for the community to provide wrapper rules for
reformatting data into RDF. Why should everyone else be forced to do
this when someone has done it already. We definitely shouldn't
"enforce" providers to conform to a given RDF format, although it
would be nice for them to offer their data in RDF form once people get
the issue of what the standard identifier forms are. Currently they
aren't able to provide anything because noone has decided on anything.

The rest of the issues about whether identifiers mean something, or
the documents at the web addresses, or the data or the actual physical
object are the "real" "object/resource" are very confusing and
possibly hindering the overall process as if it is decided as a
philosophical rule that metadata does not identify objects properly,
the result is not likely to conform the rule I propose about metadata
being the default for identifiers on other documents, which would be a
downside for the semantic bio web overall.

Peter


On 01/11/2007, Marc-Alexandre Nolin <lotus@ieee.org> wrote:
>
> Hi,
>
> The following are my comments about the TNS draft at
> http://sw.neurocommons.org/2007/uri-note/ and Major remaining trouble
> spots from http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/Recommendations
>
> To begin with, from the question about "Attitude Toward Nonlocators"
> in the major remaining trouble is that HTTP is OK. I use http
> identifier in Bio2RDF.org the same way Purl.org do ; with a REST like
> interface (http://purl.org/commons/xml/pmid/PM15548600 or
> http://bio2rdf.org/xml/pubmed:15548600). Also, many public ontologies
> like RDF, OWL are http base and we can already handle them. If we are
> to choose a string of characters to be an URI to identify an item of
> life sciences, I just find it logical to get the method of retrieval
> at the same time as I get the identifier.
>
> Another major point is about Racine Sharing with the #. I strongly
> discourage this practice for big knowledge base. It is only usable
> with little amount of instance. For example PubChem, if we use Racine
> Sharing, an URIs would look like
> http://view.ncbi.nlm.nih.gov/pccompound#id. The problem is, there are
> 17 millions ids that take about 32 Gb of gziped XML. The retrieval
> would be awfully long.
>
> Since the specific question of Jonathan is about what to put between
> de // and the first /, I would say that Purl.org is the best
> compromise because it has the infrastructure already in place, is open
> and offer a more neutral ground than other proxy like Bio2RDF.org
> because it's sciences commons. Big data provider (Uniprot, NCBI, EBI,
> Kegg, etc) might probably do without it because they have the
> capability to handle the data themself (like Uniprot
> http://purl.uniprot.org/uniprot/P19367.rdf . Purl is in the URI, but
> as a sub-domain of uniprot and not purl.org itself), but small
> provider migth found with the purl.org solution a convenient way to
> create and managed URIs. Purl.org (or Bio2RDF.org for some data
> provider) is also a good way to retrieve RDF from provider that don't
> produce RDF thenself yet, maybe someone elsewhere does and we can
> redirect to it while waiting for the official source to do it.
>
> But what is between the // and the first / isn't that important in the
> end. There will be many domain that will provide RDF, be it as a proxy
> that give RDF from a none RDF source or as a LSID resolver like
> http://lsid.biopathways.org/resolver/. That's what come after the
> first / that is a problem. What I would really like to see is simply a
> web page on a data provider web site explaining how people should
> refers to their content with URIs. The data provider would need to
> provide some kind of commitment about keeping these URIs as stable as
> possible.
>
> A page like this on Uniprot would look like this:
>
> To refers to a Uniprot item write it this way
> http://purl.uniprot.org/<database>/<id><.service>
> where database could be one of {uniprot | citations | etc }
> id is the identifier of the item and .service, what we want to receive
> from this id {xml | text | rdf | fasta | etc }. All of this string
> must be in lowercase
>
> The same page from NCBI could look exactly like
> http://view.ncbi.nlm.nih.gov/ but in the verb slot, we would add
> different format retrieval like rdf, xml, asn.1, etc.
>
> If another data provider publish a similar page and use purl.org
> scheme instead of his own domain, so be it, as long as it is detailled
> correctly.
>
> Now everyone that follow the rules about how to refers to an items
> from a specific data provider with an URI will connect together
> easily. This would render Bio2RDF mostly obselette because one of the
> added values that Bio2RDF give is the rewriting of URIs into its own
> namespace to be consistent from one document to another to create a
> web of linked data where there was none.
>
> For example, take this RDF document from Uniprot
> http://purl.uniprot.org/uniprot/P19367.rdf and look at the entry
> http://purl.uniprot.org/geneid/3098. If NCBI would have publish RDF
> URIs of there data, the URI here might be
> http://view.ncbi.nlm.nih.gov/gene/rdf/3098. This, without anything to
> add in between like lsid resolver, 303 redirect or #, will create
> linked data.
>
> That being said, I know that NCBI doesn't provide RDF version of their
> data yet and what I just wrote does not actually work, but if I put
> this in context of the draft which is a recommendation about best
> practice to mint URIs, this make sense.
>
> In conclusion, I support Http URIs. I strongly discourage Racine
> Sharing. We can't control what will be between the // and the first /,
> but as a recommendation for research center, without big IT budget, to
> create new URIs as soon as possible, I would recommend Purl.org. I'm
> for simple rules on a per data provider basis available on their web
> site (these rules could also be written in RDF, I don't see any
> problem with that). When a data manager have to create a triplestore
> and he know he will write PubMed paper and Uniprot protein, he go to
> these site and see how to refers to these entities with URIs. Now his
> triplestore is already usable in linked data.
>
> thanks,
>
> Marc-Alexandre Nolin
>
> P.S.:I apologize for my bad english. I wish my reflexion wasn't blur
> because of it. If clarification is needed, just ask me for it.
>
> 2007/10/29, Jonathan Rees <jar@creativecommons.org>:
> > On Oct 23, 2007, at 9:58 AM, Marc-Alexandre Nolin wrote:
> >
> > > Currently, I'm waiting for the publication of Jonathan URI
> > > recommendation to add it to the Bio2RDF system. Adding the support to
> > > the standardization effort doesn't mean to throw away the previous
> > > working system :)
> > >
> > > Marc-Alexandre
> >
> > I appreciate your confidence!  I am hoping to release a draft of the
> > URI note to HCLS at the end of this week. It would be extremely
> > helpful to me if you would give your advice on common names for
> > public database records.  I think you have seen the science commons
> > proposal, and your comments on that would be interesting. I have a
> > "major issue" page on this topic:
> >    http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/
> > URI_Best_Practices/Recommendations/PublicResources
> >
> > Since yours is the only other careful effort I know of along these
> > lines, I'd be interested to know whether you recommend what you have
> > for HCLS purposes, and what would be required to reconcile bio2rdf
> > with purl.org/commons (besides finishing the implementation of the
> > latter by making it yield RDF). I'm particularly interested in
> > opinions on what goes between the // and the first /.
> >
> > Jonathan
> >
> >
>
>
Received on Thursday, 1 November 2007 22:23:50 UTC