Re: My conversation with Sean Martin about LSIDs from Ivan Herman on 2006-07-27 (public-semweb-lifesci@w3.org from July 2006)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 27 Jul 2006 09:30:35 +0200
To: noah_mendelsohn@us.ibm.com
CC: www-tag@w3.org, public-semweb-lifesci@w3.org, sjmm@us.ibm.com
Message-ID: <44C86B9B.3030101@w3.org>
Noah,

thanks for this message, it was very useful and interesting (at least
for me...).

I hope the meeting between the TAG and the SWHCLS IG on monday will be
fruitful. This is an important issue that we should get to some sort of
an equilibrium point as soon as possible...

Thanks again

Ivan


======== Noah Mendelsohn writes ======================


This (long)  note is an attempt to summarize some insights gained from a
conversation I had with Sean Martin this afternoon.

First of all, a bit of background and a few caveats.  There have been some
threads springing up recently that have to do with the tradeoffs between
having LSIDs as URN's as they are now, vs. achieving similar function
using the http URI scheme.  Others on the TAG have been active in this
discussion, and I've been more or less lurking.  At the end of today's TAG
call, I was asked to seek out Sean Martin for a chat, in part because we
work in the same building and it seemed like a convenient way of trying to
cut through the confusion by meeting face to face (though ironically, when
I reached Sean he was out of the building, so we spoke by phone anyway.)
The main caveats are that I don't bill myself as any sort of expert on
LSID, its history, or on the component technologies such as DDDS on which
it is built.  My main TAG focus is on other things and I have not made the
time to read all the pertinent specs in detail.  Also, what follows is
definitely not the opinion of anyone on the TAG other than myself, and it
is not necessarily reliable in its transcription of positions I may
attribute to Sean.  All of that said, I hope this goes some way toward at
least clarifying the issues.

My personal opinion is that, if we're going to make smooth progress, we
all need to take a bit more sympathetic look at each others' concerns. So,
let me try to play back points that seemed important to Sean, along with
occasional comments from me.  Sean is of course encouraged to correct any
of the following that misrepresents what he said:

* The LSID scheme was built in a good faith effort to follow best
practices and established RFCs and standards as understood at the time the
work was done.  Not just URN, but DDDS and a variety of other RFCs were
used, and a conscious effort was made to invent from scratch only a few
things for which existing technologies were not available.

* The strong advice to use only the http scheme seems relatively new, and
was not nearly as visible when LSID was being specified.  Indeed, even
now, there isn't a lot in the way of formal specifications to suggest that
http is the "one true scheme" on the Web (even for "documenty" things),
and that use of other schemes should be discouraged.

* LSID is designed to provide a variety of features that aren't directly
standardized for the http scheme.  Maybe they're achievable in principle
(or maybe not), but there don't seem to be interoperable standards for
encoding version metadata, guaranteeing that representations are 1:1 with
names, supporting replicated deployment, etc.  While there may be good
reason that "http: only"  is evolving as a focus for some on the TAG, it
needs to be supported with clear guidance, detailed specifications, etc.
Just saying "if you're careful you can do these things with the http
scheme" is different from saying "these are the specifications for doing
{replication, protocol switching, changing ownership, maintaining 1:1
between URI and representation, ensuring that metadata such as versioning
information is retrievable from the URI}";  I.e., here's how these things
are being widely done with the http scheme, and in standard ways.

* While it is true that certain forms of replication and distribution are
commonly done using http (failover servers, Akamai caching, etc.), that
isn't an existence proof that the particular desired forms of replication,
distribution and name management are practical using the HTTP scheme.

I personally think these are important observations, and I think the TAG
would really do well to try and address them with care.   In particular,
Sean seems to be among several non-TAG reviewers who feel that the current
draft of URNs, Namespaces and Registries [1] is not sufficient to convince
sceptics on many of its points.   For example, just saying " http: URIs
are not locations" isn't nearly enough to show that software and
conventions are in place that allow one to avoid, for example, the need to
deploy a server at example.org (which admittedly may be replicated using
IP and DNS trickery) to support URIs of the form http://example.org/x.

Another couple of points came up in our discussion that I am listing
separately because they are more logistical than technical:

* Obviously, a lot of work was done to get LSID as far as it is.  Some of
that work involved the sorts of difficult negotiations that are common in
standards work, and so anything that reopens discussion is naturally
painful.  Accordingly, it would help the LSID community to get a sense
that the TAG is sensitive to the need to set the bar much higher in asking
for changes or even reconsideration now, than it would have been set had
these discussions been happening some years ago.

* Sean also mentioned that he thought it would help make the case for
using the http scheme if the W3C could invest in some work that would show
how to achieve >in standard ways< some of the qualities of service,
metadata encodings, management characteristics, etc. that they feel they
already have with LSID.  I don't think Sean is asking anyone else to
rebuild LSID, but rather to specify the building blocks that would show
how, for example, non-HTTP protocols can be dispatched in standard ways,
how resources can be deployed without the need to run a central server at,
for example lsid.org, etc.  My own view is that this is an interesting
suggestion for the medium- to long-term, but that first we need to improve
the analysis in [1], so that we can get to the point where all concerned
agree that it covers the important issues in a way that's convincing.

* Although resolution is mediated by DDDS, LSID registration ultimately
traces back to DNS registration, at least in most cases.  Nobody is
promoting LSID as a scheme for getting rich on managing new registries;
you can get your LSID registrations done by getting new DNS entries.

I should also say that I took some trouble to convey to Sean some of what
I think is important in the TAG's (well, many TAG members') positions.
These include:

* New schemes and new protocols create very damaging de facto divisions in
the Web.  Architectures like LSID implicitly put a premium on achieving
great flexibility and highest fidelity in a more limited application
domain, but at the cost of greatly reduced interoperability with the rest
of the Web.  If I put a URN:LSID in an email on the TAG mailing list and
say "check out this cool protein", the chances that an arbitrary reader's
browser (or search spider) will do something useful with the link are
orders of magnitude lower than for http: URIs.  This is hugely important.

* If LSID were the only new URN, then we might say 'fine, add a handler to
all the browsers and spiders', but it's a slippery slope.  If it's OK for
LSID then it's OK for tens, hundreds or thousands of other URNs.  We don't
want to be in a place where Google or Firefox is having to add tens of new
handlers per month just to make sure links work everywhere.

* There are existence proofs in many sophisticated Web sites for many of
the techniques the TAG is advocating.  If you register DNS name lsid.org,
for example, you have a great deal of control over the URI policies within
that authority.  If you want do guarantee that all such URIs are 1:1 with
their representations, make it so.  If you want digital signatures in the
URI to help guarantee it, then put them in all your URIs.

My own (tentative) position probably comes through from the above.  I
think it would be highly desirable in principle for http-scheme URIs to be
used for LSIDs, but I think there's enough investment in the existing
URN's that the bar for justifying it now needs to be quite high (though
not impossibly high.)  I think there's a tremendous opportunity for each
side to learn from the others needs and concerns, and that's going to be a
lot easier if everyone has confidence that practical considerations will
be given suitable weight.  So, I think it would also help the TAG to know
that the LSID community was giving careful considerations to the points
raised above in favor of the http scheme, even if that only influences
work down the road.  The benefits to the Life Sciences community could be
substantial, IMO.  I also think that it would be a great yardstick for the
draft TAG finding [1] if we could get it to the point where informed
members of the LSID community (and similar communities that may be
sceptical) can say:  "that's a fair and convincing analysis".  I don't
think we're quite there yet.

Anyway, I'm mostly going back to my other TAG assignments which are in
other areas, but I hope this is of some help in framing the discussion. At
the very least, I hope Sean recognizes it as about right in capturing the
spirit of our chat.  Thank you, Sean, for taking the time.

Noah


[1] http://www.w3.org/2001/tag/doc/URNsAndRegistries-50.xml

--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------


-- 

Ivan Herman, W3C Semantic Web Activity Lead
C/o W3C Benelux Office at CWI, Kruislaan 413
1098SJ Amsterdam, The Netherlands
tel: +31-20-5924163; mobile: +31-641044153;
URL: http://www.w3.org/People/Ivan/
PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf#Me
Received on Thursday, 27 July 2006 07:54:03 UTC