RE: URL +1, LSID -1 from Michel_Dumontier on 2007-07-11 (public-semweb-lifesci@w3.org from July 2007)

From: Michel_Dumontier <Michel_Dumontier@carleton.ca>
Date: Wed, 11 Jul 2007 10:22:10 -0400
To: public-semweb-lifesci <public-semweb-lifesci@w3.org>
Cc: Mark Wilkinson <markw@illuminae.com>, Benjamin Good <goodb@interchange.ubc.ca>, Natalia Villanueva Rosales <naty.vr@gmail.com>
Message-id: <AB349814F1ECB143A5D4CD29C7A64569017BF158@CCSEXB10.CUNET.CARLETON.CA>

> On Jul 10, 2007, at 1:13 PM, Michel_Dumontier wrote:
> 
> > The use of a location free identifier such as an LSID provides me
with
> > the capability to make statements about resources that I care about.
> > LSIDs and URLs can live together just fine. Using owl:sameAs
predicate
> > to bind them together is one easy way of doing this.  Just make sure
> > you're talking about the same thing.
> 
> What it doesn't do, is provide the courtesy that has been requested
> by other semantic web practitioners, that, based on the identifier,
> one can discover something about the resource by "following your
nose".

[Michel_Dumontier] Sure it does... I can make statements using
unambiguous LSIDs and if this resource is equivalent to one identified
by a URL, I can instruct my semantic web application to follow that URL.
However, the suggestion that my semantic web application should read
HTML, and follow its nose to find a REL LINK to an RDF document
(potentially not the right one) is an interesting, and more complex
resolution mechanism that requires more sophisticated knowledge about
possible content presentation. 

> The cost of using an http identifier, and providing a 303 and a
> pointer to more information, instead of using an LSID, seems a small
> cost to satisfy this community.
> 
[Michel_Dumontier] Ok - here's a use case to consider:
I would like to transform third party data (unstructured / text file)
into RDF/OWL, because they have no intention to make it available in
that format at this time - which URI should I assign to the resources?
Here are the things I need to consider
1) Many people may want to make statements about those resources and
need stable, unchanging identifiers to do so. 
 a) Imagine the problem of mapping multiple identifiers if everybody
assigned their own URI! I'm not interested in recreating the identifier
problem that has forever plagued bioinformatics.
 b) What if I have no intention of providing the content at a URL, but
rather as a downloadable document?
 c) As an interesting aside, by what mechanism should statements made
about the resource, but published at different locations, be retrieved?
(I'm very interested in learning about this!). One option, maybe, is for
data providers to register with a directory by providing the URL of the
resource they resolve.

2) Chimezie and Jonathan suggest that we might use emerging (not yet W3C
recommended) technologies to embed/extract/transform structured data.
This might be plausible, but requires fairly sophisticated approaches to
content management and application design, and requires standardization
across data providers. Otherwise, the penalty for trying to figure out
who does what and how, will be difficult and possibly overwhelming.
Don't get me wrong, I like the idea of embedding more explicit semantics
in HTML documents, but is this really the behavior we want for resources
defined in non-HTML documents?

> While you are correct about LSIDs and URLs being able to be bound
> together using sameAs, I don't see why one would, in new designs,
> choose to employ both.
> 
[Michel_Dumontier] Hopefully you sympathize with my need to have
unambiguous identifiers - the recent change of UniProt from LSID to URL
clearly demonstrates the consequences of arbitrarily changing the
identifier of a named resource. If anybody made statements using those
LSIDs, they are no longer defined in RDF documents provided by UniProt.
Resource identification and resource presentation are two really
different things.

-=Michel=-

Received on Wednesday, 11 July 2007 14:22:33 UTC