Re: RDF for molecules, using InChI from Chimezie Ogbuji on 2007-08-07 (public-semweb-lifesci@w3.org from August 2007)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Tue, 07 Aug 2007 11:47:36 -0400
To: "Alan Ruttenberg" <alanruttenberg@gmail.com>
cc: "Egon Willighagen" <egon.willighagen@gmail.com>, "public-semweb-lifesci hcls" <public-semweb-lifesci@w3.org>, "Michel_Dumontier" <Michel_Dumontier@carleton.ca>, "Jonathan A Rees" <jar@mumble.net>
Message-ID: <1186501656.25142.63.camel@otherland>
On Mon, 2007-08-06 at 17:20 -0400, Alan Ruttenberg wrote:
> Second, the view offered was my own and was perhaps too strongly  
> stated. But I made it because I saw that recommendations were being  
> made about decisions related to URIs and because I felt that given  
> that we have an ongoing activity to work together to come up with  
> some recommendations, that although we haven't come to a decision  
> about exactly what the recommendations will say, it certainly is the  
> case that URI schemes have been part of the discussion. I felt Egon  
> needed to know about this - I don't know how long he has been  
> following the list.

Fair enough.

> I'm quite happy that, as a result of that, you've chimed in,  
> Chimezie. I'd say that the starting point for the recommendations  
> will be a set of principles that we agree are important for our  
> domain (they may be important for other domains, but our scope is  
> deliberately the hcls community). It may or may not turn out that as  
> a consequence of these principles we may recommend that certain  
> schemes not be used, and recommend that others better serve those  
> principles. Please connect with Jonathan to make sure that he has a  
> good idea of what you think is important so that the recommendations  
> can reflect your views.

I will do so.

> It would be surprising if we misunderstand the AWWW. We've said  
> before that we think it doesn't do well on a number of issues. That  
> the possibility that there be other URI schemes, I consider a  
> feature. However we are talking about a specific use - HCLS/Semantic  
> Web, and I don't conclude from anything that I've seen that this  
> makes it obligatory that we are neutral wrt/ schemes. We want to find  
> something that works well for this community.

Well, it's one thing to be neutral wrt schemes and another to make a
statement about the use of certain schemes (HTTP) as preferred for
authors who are in the business of minting URIs.  I'll just echo
Michel's sentiments about not alienating communities and focusing on a
clear definition of the problem statement.

> I can't find you references right now, as I'm on a plane, but I'll  
> try to get to get some for you. Certainly in discussions I've had  
> with them they  have discouraged the use of new URN schemes for the  
> reason I mentioned.

I wasn't being coy in asking, I was quite serious.  If there is some
authoritative TAG discussion that suggests this, I would like to direct
appropriate comments, since I don't think blanket HTTP URI scheme
advocacy is constructive.

> If you consider LSIDs without resolution then I don't think they have  
> any value over any other URI scheme. They are simply strings and any  
> old scheme will do. 

Not necessarily true.  Consider schemes which have a specific (and
useful) formal semantics for parts of their strings.  The tag [1] scheme
incorporates a date component and a mechanism to guarantee uniqueness
while minting tag-based URIs.  The LSID scheme has specific portions of
the string for revision. These are all parts of the URI that the author
has to contend with rather than begin with a clean slate for an
arbitrary string with a naming convention picked by the author.  

> As an aside, I don't know what you mean about  
> them having a precise identification scheme (any more so than any  
> URI) or what you mean by "non-collidable UUIDs". Could you say a bit  
> more?

This was my mistake.  I assumed (incorrectly) that the ObjectID
component of the LSID scheme needed to be a UUID, but in reading
further, it appears there is not specific structure of the ObjectID.  By
precise identification scheme, I mostly meant that they had a specific
(defined) structure for the portions of the URI that have nothing to do
with authority and resolution but mostly have to do with identification.

> I think follow-your-nose is unworkable for the sort of work we mostly  
> do, 

Precisely! In fact I was going to lay out an example where the
restriction of working within an enterprise network severely reduces the
argument for minting HTTP URIs for terms coined by employees within that
enterprise who do not have control over their webspace.  So,
follow-your-nose (which strikes me as one of the primary arguments for
HTTP scheme monopoly) works for the open web but not the 'enterprise'
web.

> but because of the rest of web usage, we have thus far considered  
> it a goal to enable the sort of discovery that it commonly expected.  
> Personally, I consider it a matter of courtesy to give people a way  
> to find out more.

Right, but direct URI dereference of terms is *one* such way to extend
that courtesy.  I've argued (long ago) that this is not necessarily the
most effective way if 'finding out more' is meant to lead to inference.

> If one is only interested in doing local computation with RDF/OWL,  
> then the URI scheme doesn't matter - the reasoners don't typically  
> dereference anything. But we have been assuming that the usage we are  
> targeting is sharing names of resources and facts about those  
> resources with a wider community. For this purpose more is required.

Yes, absolutely.  However (again), it is just the suggestion of how to
go about this that I don't agree with.

> That is a misunderstanding. The HTTP scheme is explicitly (e.g.  
> httpRange-14) being recommended for purposes other than for hypertext  
> - range-14 talks about the fact that some things identified by HTTP  
> URIs are not information resources.

Right, and as you know, this is not without its problem (as witness from
the Pheonix-like uber TAG thread on this point)

> We're looking for concrete examples of this. If you have any it would  
> be very helpful if you could share the information.

I will collect my thoughts on this and do so.

> Indeed.
> 
> >  (i.e., some of the problems solved by URN schemes can be solved  
> > with the HTTP scheme - once again this should not be confused as  
> > recommendation for a URI scheme monopoly)

> I'm still waiting for an example that *can't* be solved using a HTTP  
> scheme. Do you have any? 

The HTTP scheme:

http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]

Does not have any formal components for identify management (dates and
versions).  A person (or group) who may not have control over webspace
but has a coherent theory they wish to express in OWL will probably find
these aspects of the tag (and lsid) schemes useful for guiding their
naming convention rather than inventing one (with perhaps a bogus
authority - such as 'example.com').

One could suggest a best practice for minting HTTP URIs which takes into
account provenance dates and revision, however there are URI schemes
that already have a well defined structure for representing such things.
It would be more productive (IMHO) to review where alternative schemes
get this right, how this may be replicated in HTTP (consider W3C
specification URIs) - i.e., clarify the problem statement rather than
proposing *a* solution (perhaps out of context).

[1] http://www.taguri.org/
-- 
Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org


===================================

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2007).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.
Received on Tuesday, 7 August 2007 15:47:58 UTC