Re: what would change for me? from Marc-Alexandre Nolin on 2007-10-23 (public-semweb-lifesci@w3.org from October 2007)

From: Marc-Alexandre Nolin <lotus@ieee.org>
Date: Tue, 23 Oct 2007 09:58:46 -0400
To: "Peter Ansell" <ansell.peter@gmail.com>
Cc: public-semweb-lifesci@w3.org, p.roe@qut.edu.au, j.hogan@qut.edu.au
Message-ID: <d6a9bb0d0710230658n5fe7d709i172c2f6c64137ce4@mail.gmail.com>
Hi,

I'm one of Bio2RDF maintainer. While we still strongly believe in HTTP
URI with a REST nomenclature and a GET retrieval, we also believe that
some kind of standardization about how URIs should be created would be
beneficial for the health care and life science domain. These
recommendations are also intended for data provider that still don't
give their data in RDF and are looking for "best practice" about the
subject for doing it correctly the first time.

Currently, Bio2RDF worked by making a "translation" of any entity into
the Bio2RDF namespaces since the other data providers don't always
provide RDF or URIs.

PubMed --> http://bio2rdf.org/pubmed:<id>
Kegg Pathway --> http://bio2rdf.org/path:<id>
PDB --> http://bio2rdf.org/pdb:<id>

Sure, I (or anybody using the myBio2RDF package) can generate an
infinity of URIs into my own namespace and Bio2RDF show that it can
work this way, but what if data provider decide to provide URIs with
these same standard AND also use the same standard for their linkout,
we would have something like this:

PubMed --> http://www.ncbi.nlm.nih.gov/pubmed:<id>
Kegg Pathway --> http://www.genome.jp/kegg/pathway:<id>
PDB --> http://www.rcsb.org/pdb:<id>

Suddenly, Bio2RDF is not necessary anymore (not really true in fact,
there are databases that won't provide any URIs for their data and
Bio2RDF could still be a kind of hob for these data), because the link
between entities will actually work and we would have linked data
without having the intermediate step of translating entities into the
local Bio2RDF system.

Currently, I'm waiting for the publication of Jonathan URI
recommendation to add it to the Bio2RDF system. Adding the support to
the standardization effort doesn't mean to throw away the previous
working system :)

Marc-Alexandre

2007/10/21, Peter Ansell <ansell.peter@gmail.com>:
>
> Hi all,
>
> I have been using the Bio2Rdf markup system and I personally do not
> see what all the fuss is about but there must be something so here are
> my opinions based solely on the requirements document
>
> http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/Recommendations/Requirements
>
>
> # For our own resources, what URIs to mint and what contracts to
> adhere to regarding well-definedness and documentation
>
> Publically retrievable metadata for ones personally produced/published
> information (if not data as well) should be available using URI's
> matched to one's institution/organisation, with relevant owl:sameAs
> and rdfs:seeAlso tags to specify their relationships to other known
> uri's.
>
> Advantages: One does not need to negotiate with the original author in
> order to augment their definition, and people who actually want to
> know things have clear unambiguous ways of getting to their goal.
> Follows the process of how knowledge is developed, ie, someone comes
> up with an idea and develops it themselves with citations to outside
> publications. In the case that their following published information
>
> Disadvantages, sparql queries are not simple, but I use programmatic
> level access and enable the retrieval of sameAs items through code
> which then abstracts queries to utilise all known identifiers when
> querying. People don't actually want to write sparql queries
> themselves, they are biologists or doctors, who just want to click on
> a button and have it work for them, whether the program does one or
> three queries is basically inconsequential to them.
>
> # What particular URI's to use for resources related to public
> databases (esp. database records) (>4 proposals on table)
>
> Admittedly this is an issue, but so far I like being able to have the
> best of lsid and http: uri's with the bio2rdf markup schemata. Simple
> text URI's not matching is inconsequential if one has metadata
> identifying two URI's as identical.
>
>     * What entity is responsible for choosing and maintaining these URIs
>
> What is wrong with a simple scheme that "bio2rdf.org" uses? With my
> local "myBio2Rdf" installation I populate my database from the
> original supplier. Why do the metadata records need to be preprocessed
> and maintained by another entity?
>
> What is the difference between their scheme and any other, apart from
> prejudice against a particular opening identifer which people can
> translate and use without relying on the actual organisation to exist
> anyway.
>
> # How to get stuff
>
> Personally, I would stick with HTTP GET here.
>
>     * How to use a URI to get metadata (RDF) about an identified resource
>
> I have no problems with getting metadata using the explicit URI object
> reference and then having to follow another url to find the actual
> data. It is the way things in society pretty much work, you find the
> identifying information before you find the data, so when you find the
> data you know what you were looking for and that you actually wanted
> to expend resources to get the data
>
> Ie, I would never follow the following url's until I verified that
> http://bio2rdf.org/identifier described what I wanted to know.
>
> http://bio2rdf.org/data/identifier
> http://bio2rdf.org/html/identifier
> http://bio2rdf.org/image/identifier
>
> Where one knows about what html and image mean to them for their goal
> as basic information types.
>
>     * How to use a URI to retrieve the bits of an information resource
>
> Not sure what the difficulties are here. I spent a week making up a
> perfectly good browser page for bio2rdf information using my local
> database which assumed that the browser already knew how to follow
> HTTP standards... and it works so far.
>
> Essentially, given all of that, I have an adaptable system which
> utilises what I see as the best of the distributed semantic web (Web
> 3.0) with personal touches (Web 2.0).
>
> What would change if people all decided for instance to only use lsid
> and deprecated http:// uri's? Essentially, I could continue my
> personal methods as lsid is included already in my rdf data.
>
> What would change if people decided to access data by default with
> object references instead of metadata? Bio2RDF already allows for this
> within itself (ie, http://bio2rdf.org#rdfdata, although it is designed
> with what I see to be a more intuitive metadata by default approach.
>
> Is there any other change that would break my way of doing things? And
> does everyone need to decide on one standard, as opposed to utilising
> common elements well enough to combine them. Personally I do not like
> the idea of anonymous elements, ie bnodes, in RDF describing realistic
> scientific or medical data, but that is a minor issue I guess.
>
> Peter
>
> PhD student
> Faculty of Information Technology
> Queensland University of Technology
> Brisbane, Australia
>
>
>
Received on Tuesday, 23 October 2007 13:58:57 UTC