RE: [BioRDF] URI Resolution from Booth, David (HP Software - Boston) on 2007-02-02 (public-semweb-lifesci@w3.org from February 2007)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Fri, 2 Feb 2007 01:47:32 -0500
To: "Jonathan Rees" <jonathan.rees@gmail.com>, "public-semweb-lifesci" <public-semweb-lifesci@w3.org>
Cc: "Susie Stephens" <susie.stephens@oracle.com>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C201FD7687@tayexc19.americas.cpqcorp.net>
Re: http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Documents?actio
n=AttachFile&do=get&target=getting-information.txt

My overall comment: Yes!  I believe a URI resolution ontology could
significantly help address these problems, while still permitting URIs
to be based on the http scheme, thus facilitating bootstrapping and
minimizing barriers to adoption.

Some specific comments follow.

> 	 URI Resolution: Finding Information About a Resource
> 	   Jonathan Rees, Alan Ruttenberg, Matthias Samwald
> . . .
> 
> Problem statement
> . . .

Nice problem description!

> What is the received wisdom?
> 
>   - Don't mint non-URL URI's. (TimBL)
>       [good as far as it goes, but we may not be in a 
> position to choose]

Meaning what?  Others might ignore this advice?  I think we should still
advise what we think is best.

> 
>   - Mint URL's whose hostname specifies a long-lived server that will
>     maintain the resource at the given URL in perpetuity.  
> (Publishers,
>     libraries, and universities are in good positions to do this.)
>       [good as for as it goes, but user may not be in control, or may
>       find quality name management to be beyond his/her grasp] 

Good, but of course the long-lived server could also host a pointer to
the resource, and perhaps some other metadata about it, rather than the
resource itself 

> 
>   - Use a web cache such as Apache or Squid, and a proxy configuration
>     on the client, to deliver the correct content when a URL 
> is presented
>     that can't or shouldn't be used directly.
>     (Dan Connolly)
>       [this is a possible solution... see below]

Nice idea!  This sounds functionally equivalent to a special purpose
protocol resolver, except that the resolving smarts are factored out of
the client/agent that is requesting the data, which seems to me like a
distinct advantage.

> 
>   - Use LSID's.  LSID resolvers are very similar to web caches in that
>     an intermediate server is deployed to map URIs.
>       [requires maintenance of an LSID resolver; not all problematic
>       URI's are LSID's]

I think you should separate the question of using LSIDs (as identifiers)
from the question of using LSID resolution (as a protocol).  There is no
need to use LSIDs (as identifiers) in order to use LSID resolution.  As
I have described in "Converting New URI Schemes or URN Sub-Schemes to
HTTP"
http://dbooth.org/2006/urn2http/ ,
you can instead use specialized http prefixes that are resolved using
LSID resolvers by agents that know about LSID resolution, and resolved
using HTTP by other agents as a fallback.  This allows good old HTTP to
act as a best-attempt bootstrapping mechanism for locating basic
metadata about the resource (and potentially about LSID resolution).

> 
>   - If the type of the representation is unuseable, use content
>     negotiation and/or GRDDL to get the right type of resource.
>       [can Alan say more about why he dislikes content negotiation?]

I admit that I am much more drawn to the explicitness of GRDDL than the
invisible hokey-pokey of content negotiation.  However, I'm not sure
that I understand the intent of this item.  If you are merely talking
about equivalent data/metadata being served using different media types,
then content negotiation seems fine.  But if you are talking about the
representation being unusable because it contains different information
than what you need (e.g., you need more metadata), then GRDDL sounds
more appropriate.

> . . .
>   - To relate a non-information-resource to information about it,
>     mint URI's of the form http://example.org/foo#bar to name the
>     resource, with the convention that the URI http://example.org/foo
>     will name an information resource that describes it.
>       [obscure hack, probably too late to take hold, e.g.
>       ontology http://xmlns.com/foaf/0.1/ doesn't use #]

I'm surprised to see this characterized as an "obscure hack", since I
thought it was accepted practice in the RDF world.

> 
> What would a good solution be like?
> 
> Observation: We need information in order to find information.
> . . .

Good.

> Proposal: A URI resolution ontology.

Yes!

> . . .
>     . Retrieval methods: direct; URI transformation; SPARQL; web
>       service

Yes, also: Rules for associating specialized http URI prefixes with
special protocol resolvers, such as LSID resolvers, as described in
http://dbooth.org/2006/urn2http/#multiple-owners .

> . . .
>   - Disadvantage: you need an OWL engine to interpret resolution
>     information represented in this way, and not all applications have
>     an OWL engine.  [so why not get one and link it in?]

Perhaps a proxy such as Squid could do this automatically.  So if a
client/agent discovers a new URI and does an HTTP GET on it (through the
proxy) to learn about the associated resource, then the response could
include some OWL resolution information intended for the proxy.  Of
course, the proxy would have to be able to recognize when it should
intercept this information as opposed to merely passing it through to
the client/agent.  I suppose this might be done using HTTP headers, but
perhaps there are other ways it could be done also, that would require
less server configuration.  Would there be safe ways for the proxy to
intercept this information if it is in the body of the response?
Sniffing the body seems risky.

David Booth
Received on Friday, 2 February 2007 06:48:53 UTC