RE: RDF for molecules, using InChI from Booth, David (HP Software - Boston) on 2007-08-06 (public-semweb-lifesci@w3.org from August 2007)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Mon, 6 Aug 2007 14:59:16 -0400
To: <ogbujic@ccf.org>, "Alan Ruttenberg" <alanruttenberg@gmail.com>
Cc: "Egon Willighagen" <egon.willighagen@gmail.com>, "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>, "Michel_Dumontier" <Michel_Dumontier@carleton.ca>, "Jonathan A Rees" <jar@mumble.net>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C203028CF9@tayexc19.americas.cpqcorp.net>
Hi Chimezie,

I think you're partially correct, but I think you've left out an
important element that has emerged more recently in the evolution of
thought around URIs.  

In the development of the Web, the allowance of non-HTTP schemes was
initially an important feature, because it enabled other protocols like
FTP to be readily used and absorbed into the Web.  And it was an
important extensibility point for future protocols.  This made sense at
the time, because people did not yet realize that the HTTP scheme could
do it all -- with greater benefit -- even if other *protocols* are also
used for retrieval.  

What has caused this thinking to shift more recently is the realization
that: (1) HTTP URIs can be used as globally unambiguous names (or
"identifiers") of things other than web pages (or "information
resources"); and (2) the use of an HTTP URI does *not* imply that HTTP
must be used to retrieve a representation or other information about it.
In other words, it has to do with more clearly understanding the dual
use of an HTTP URI both as a locator and as a location-independent name.

The realization that HTTP URIs can do it all has been steadily growing
in the past 2-3 years, but it still is not what I would call widespread.
I suggest reading the following document:
http://dbooth.org/2006/urn2http/
Ostensibly, that document describes how *any* URN scheme or sub-scheme
can be converted to using http URIs while retaining nearly all of the
features of the original URN scheme or sub-scheme.  But the purpose of
writing it was not primarily to enable people to do such conversion.
Rather, the purpose was to clearly show the *superiority* of HTTP URIs
to *all* URN schemes or sub-scheme, without having to quibble about the
details of any particular URN scheme or sub-scheme.  Though not a formal
proof, it can be read as an informal proof-by-construction that shows
how the capabilities of http URIs are a direct superset of the
capabilities of URNs (and LSIDs) in nearly all ways.  AFAICT, the *only*
exceptions are:
[[
    *  URI Length.  HTTP URIs will generally be longer.

    * Governing Authority.  New URI schemes must be registered with
IANA, whereas specialized HTTP prefixes may be defined by any URI owner.
This may be a concern, both because IANA may be perceived as being more
reputable than other organizations, and because IANA provides a single
place to look for scheme definitions.  However, if this concern is
important enough, a registry of specialized HTTP prefixes could be
created by a reputable organization -- potentially even IANA.

    * Expectations.  Users discovering an xyzscheme URI expect it to be
governed by a separate specification, whereas users discovering an HTTP
URI with a specialized prefix may not realize that there is a separate
specification governing it (over and above the http scheme
specification).  This can be mitigated by educating users, and one good
way to do so is to serve useful metadata (indirectly) via the URI, as
described above.
]]

This does *not* mean that the LSID *protocol* is not better than the
HTTP *protocol* in some ways.  Indeed it is!  The designers of LSIDs did
a lot of good work on figuring out what features are needed and how they
can be achieved.  But this work occurred before this realization of how
HTTP URIs could used had permeated the URI-design community.

Bottom line: even if folks want to use the LSID *protocol*, they should
still use HTTP URIs, because it gives those URIs the potential to be
used naively as HTTP URIs *and* in more sophisticated ways by clients
who happen to be LSID protocol-aware. 


David Booth, Ph.D.
HP Software
+1 617 629 8881 office  |  dbooth@hp.com
http://www.hp.com/go/software

Opinions expressed herein are those of the author and do not represent
the official views of HP unless explicitly stated otherwise.
 

> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org 
> [mailto:public-semweb-lifesci-request@w3.org] On Behalf Of 
> Chimezie Ogbuji
> Sent: Monday, August 06, 2007 9:09 AM
> To: Alan Ruttenberg
> Cc: Egon Willighagen; public-semweb-lifesci hcls; 
> Michel_Dumontier; Jonathan A Rees
> Subject: Re: RDF for molecules, using InChI
> 
> 
> On Sun, 2007-08-05 at 01:25 -0400, Alan Ruttenberg wrote:
> > I don't think it is likely that the HCLS recommendations 
> will suggest  
> > using INFO uri's. 
> 
> Is the 'recommendation' of a particular URI scheme over others on the
> agenda? I would hope not.  I've yet to understand the motivation for
> considering the use of a particular URI scheme over another as a 'best
> practice' (the most common such suggestion being HTTP).
> 
> Note that the recent TAG finding [1] in this regard made a (guarded)
> argument for how HTTP schemes can facilitate location independence,
> persistence, etc..  This should not be confused for a recommendation
> *for* HTTP as the a preferred URI scheme.  I would consider such
> recommendations as dangerous and perhaps a misunderstanding 
> of AWWW and
> the URI mechanism: the fact that the URI syntax allows for the use of
> arbitrary URI schemes is a feature not a bug.
> 
> > They haven't been championed by anyone, urn schemes  
> > are generally discouraged by the W3C TAG,
> 
> Where exactly?
> 
> >   and in our discussions  
> > thus far haven't seen any advantages to using them while noting  
> > difficulties. 
> 
> I don't think URI schemes were meant to be thought of that way (as
> mutually exclusive)
> 
> > Too many URN schemes lead to difficulties on the part  
> > of clients, 
> 
> Not true, especially if the intent of a particular scheme has 
> more to do
> with identity management than network resolution (LSIDs are 
> still useful
> even without a resolution mechanism, mostly because it has a very
> precise identification scheme - non-collidable UUIDs, etc.).  Consider
> that you can perform inference over an RDF graph which consists of a
> merge of *both* ABox assertions (and other instance-level 
> data) and TBox
> assertions (ontology assertions) without the need of a resolution
> mechanism.  Being able to build such a merged graph 
> "on-the-fly" (i.e.,
> "Follow-your-nose") makes such an RDF Graph Hypertext-Web friendly but
> this is not a necessary criteria.
> 
> The HTTP scheme (as I understand it) is made for the 
> Hypertext-Web, not
> every information space maps well to the Hypertext-Web and for those
> where resolution is not a necessary component, it is (a bit) 
> redundant.
> 
> > which is why there is still a lot of discord about LSIDs,  
> 
> Most of which seems to follow the tone of the URN Registries finding
> (i.e., some of the problems solved by URN schemes can be 
> solved with the
> HTTP scheme - once again this should not be confused as recommendation
> for a URI scheme monopoly)
> 
> > which are certainly in line before INFOs. Finally, there 
> are better  
> > alternatives.
> > 
> > Just a heads up.
> > 
> > -Alan
> 
> [1] http://www.w3.org/2001/tag/doc/URNsAndRegistries-50.html
> 
> -- 
> Chimezie Ogbuji
> Lead Systems Analyst
> Thoracic and Cardiovascular Surgery
> Cleveland Clinic Foundation
> 9500 Euclid Avenue/ W26
> Cleveland, Ohio 44195
> Office: (216)444-8593
> ogbujic@ccf.org
> 
> 
> ===================================
> 
> Cleveland Clinic is ranked one of the top hospitals
> in America by U.S. News & World Report (2007).  
> Visit us online at http://www.clevelandclinic.org for
> a complete listing of our services, staff and
> locations.
> 
> 
> Confidentiality Note:  This message is intended for use
> only by the individual or entity to which it is addressed
> and may contain information that is privileged,
> confidential, and exempt from disclosure under applicable
> law.  If the reader of this message is not the intended
> recipient or the employee or agent responsible for
> delivering the message to the intended recipient, you are
> hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited.  If
> you have received this communication in error,  please
> contact the sender immediately and destroy the material in
> its entirety, whether electronic or hard copy.  Thank you.
> 
> 
> 
> 
>
Received on Monday, 6 August 2007 19:00:20 UTC