what would change for me? from Peter Ansell on 2007-10-21 (public-semweb-lifesci@w3.org from October 2007)

From: Peter Ansell <ansell.peter@gmail.com>
Date: Mon, 22 Oct 2007 09:44:03 +1000
To: public-semweb-lifesci@w3.org
Cc: p.roe@qut.edu.au, j.hogan@qut.edu.au
Message-ID: <a1be7e0e0710211644r7f51fcf6s4fbb99924113510b@mail.gmail.com>

Hi all,

I have been using the Bio2Rdf markup system and I personally do not
see what all the fuss is about but there must be something so here are
my opinions based solely on the requirements document

http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/Recommendations/Requirements

# For our own resources, what URIs to mint and what contracts to
adhere to regarding well-definedness and documentation

Publically retrievable metadata for ones personally produced/published
information (if not data as well) should be available using URI's
matched to one's institution/organisation, with relevant owl:sameAs
and rdfs:seeAlso tags to specify their relationships to other known
uri's.

Advantages: One does not need to negotiate with the original author in
order to augment their definition, and people who actually want to
know things have clear unambiguous ways of getting to their goal.
Follows the process of how knowledge is developed, ie, someone comes
up with an idea and develops it themselves with citations to outside
publications. In the case that their following published information

Disadvantages, sparql queries are not simple, but I use programmatic
level access and enable the retrieval of sameAs items through code
which then abstracts queries to utilise all known identifiers when
querying. People don't actually want to write sparql queries
themselves, they are biologists or doctors, who just want to click on
a button and have it work for them, whether the program does one or
three queries is basically inconsequential to them.

# What particular URI's to use for resources related to public
databases (esp. database records) (>4 proposals on table)

Admittedly this is an issue, but so far I like being able to have the
best of lsid and http: uri's with the bio2rdf markup schemata. Simple
text URI's not matching is inconsequential if one has metadata
identifying two URI's as identical.

* What entity is responsible for choosing and maintaining these URIs

What is wrong with a simple scheme that "bio2rdf.org" uses? With my
local "myBio2Rdf" installation I populate my database from the
original supplier. Why do the metadata records need to be preprocessed
and maintained by another entity?

What is the difference between their scheme and any other, apart from
prejudice against a particular opening identifer which people can
translate and use without relying on the actual organisation to exist
anyway.

# How to get stuff

Personally, I would stick with HTTP GET here.

* How to use a URI to get metadata (RDF) about an identified resource

I have no problems with getting metadata using the explicit URI object
reference and then having to follow another url to find the actual
data. It is the way things in society pretty much work, you find the
identifying information before you find the data, so when you find the
data you know what you were looking for and that you actually wanted
to expend resources to get the data

Ie, I would never follow the following url's until I verified that
http://bio2rdf.org/identifier described what I wanted to know.

http://bio2rdf.org/data/identifier
http://bio2rdf.org/html/identifier
http://bio2rdf.org/image/identifier

Where one knows about what html and image mean to them for their goal
as basic information types.

* How to use a URI to retrieve the bits of an information resource

Not sure what the difficulties are here. I spent a week making up a
perfectly good browser page for bio2rdf information using my local
database which assumed that the browser already knew how to follow
HTTP standards... and it works so far.

Essentially, given all of that, I have an adaptable system which
utilises what I see as the best of the distributed semantic web (Web
3.0) with personal touches (Web 2.0).

What would change if people all decided for instance to only use lsid
and deprecated http:// uri's? Essentially, I could continue my
personal methods as lsid is included already in my rdf data.

What would change if people decided to access data by default with
object references instead of metadata? Bio2RDF already allows for this
within itself (ie, http://bio2rdf.org#rdfdata, although it is designed
with what I see to be a more intuitive metadata by default approach.

Is there any other change that would break my way of doing things? And
does everyone need to decide on one standard, as opposed to utilising
common elements well enough to combine them. Personally I do not like
the idea of anonymous elements, ie bnodes, in RDF describing realistic
scientific or medical data, but that is a minor issue I guess.

Peter

PhD student
Faculty of Information Technology
Queensland University of Technology
Brisbane, Australia

Received on Tuesday, 23 October 2007 02:36:04 UTC