My conversation with Sean Martin about LSIDs

This (long)  note is an attempt to summarize some insights gained from a 
conversation I had with Sean Martin this afternoon.

First of all, a bit of background and a few caveats.  There have been some 
threads springing up recently that have to do with the tradeoffs between 
having LSIDs as URN's as they are now, vs. achieving similar function 
using the http URI scheme.  Others on the TAG have been active in this 
discussion, and I've been more or less lurking.  At the end of today's TAG 
call, I was asked to seek out Sean Martin for a chat, in part because we 
work in the same building and it seemed like a convenient way of trying to 
cut through the confusion by meeting face to face (though ironically, when 
I reached Sean he was out of the building, so we spoke by phone anyway.) 
The main caveats are that I don't bill myself as any sort of expert on 
LSID, its history, or on the component technologies such as DDDS on which 
it is built.  My main TAG focus is on other things and I have not made the 
time to read all the pertinent specs in detail.  Also, what follows is 
definitely not the opinion of anyone on the TAG other than myself, and it 
is not necessarily reliable in its transcription of positions I may 
attribute to Sean.  All of that said, I hope this goes some way toward at 
least clarifying the issues.

My personal opinion is that, if we're going to make smooth progress, we 
all need to take a bit more sympathetic look at each others' concerns. So, 
let me try to play back points that seemed important to Sean, along with 
occasional comments from me.  Sean is of course encouraged to correct any 
of the following that misrepresents what he said:

* The LSID scheme was built in a good faith effort to follow best 
practices and established RFCs and standards as understood at the time the 
work was done.  Not just URN, but DDDS and a variety of other RFCs were 
used, and a conscious effort was made to invent from scratch only a few 
things for which existing technologies were not available.

* The strong advice to use only the http scheme seems relatively new, and 
was not nearly as visible when LSID was being specified.  Indeed, even 
now, there isn't a lot in the way of formal specifications to suggest that 
http is the "one true scheme" on the Web (even for "documenty" things), 
and that use of other schemes should be discouraged. 

* LSID is designed to provide a variety of features that aren't directly 
standardized for the http scheme.  Maybe they're achievable in principle 
(or maybe not), but there don't seem to be interoperable standards for 
encoding version metadata, guaranteeing that representations are 1:1 with 
names, supporting replicated deployment, etc.  While there may be good 
reason that "http: only"  is evolving as a focus for some on the TAG, it 
needs to be supported with clear guidance, detailed specifications, etc. 
Just saying "if you're careful you can do these things with the http 
scheme" is different from saying "these are the specifications for doing 
{replication, protocol switching, changing ownership, maintaining 1:1 
between URI and representation, ensuring that metadata such as versioning 
information is retrievable from the URI}";  I.e., here's how these things 
are being widely done with the http scheme, and in standard ways.

* While it is true that certain forms of replication and distribution are 
commonly done using http (failover servers, Akamai caching, etc.), that 
isn't an existence proof that the particular desired forms of replication, 
distribution and name management are practical using the HTTP scheme.

I personally think these are important observations, and I think the TAG 
would really do well to try and address them with care.   In particular, 
Sean seems to be among several non-TAG reviewers who feel that the current 
draft of URNs, Namespaces and Registries [1] is not sufficient to convince 
sceptics on many of its points.   For example, just saying " http: URIs 
are not locations" isn't nearly enough to show that software and 
conventions are in place that allow one to avoid, for example, the need to 
deploy a server at example.org (which admittedly may be replicated using 
IP and DNS trickery) to support URIs of the form http://example.org/x.

Another couple of points came up in our discussion that I am listing 
separately because they are more logistical than technical:

* Obviously, a lot of work was done to get LSID as far as it is.  Some of 
that work involved the sorts of difficult negotiations that are common in 
standards work, and so anything that reopens discussion is naturally 
painful.  Accordingly, it would help the LSID community to get a sense 
that the TAG is sensitive to the need to set the bar much higher in asking 
for changes or even reconsideration now, than it would have been set had 
these discussions been happening some years ago.

* Sean also mentioned that he thought it would help make the case for 
using the http scheme if the W3C could invest in some work that would show 
how to achieve >in standard ways< some of the qualities of service, 
metadata encodings, management characteristics, etc. that they feel they 
already have with LSID.  I don't think Sean is asking anyone else to 
rebuild LSID, but rather to specify the building blocks that would show 
how, for example, non-HTTP protocols can be dispatched in standard ways, 
how resources can be deployed without the need to run a central server at, 
for example lsid.org, etc.  My own view is that this is an interesting 
suggestion for the medium- to long-term, but that first we need to improve 
the analysis in [1], so that we can get to the point where all concerned 
agree that it covers the important issues in a way that's convincing.

* Although resolution is mediated by DDDS, LSID registration ultimately 
traces back to DNS registration, at least in most cases.  Nobody is 
promoting LSID as a scheme for getting rich on managing new registries; 
you can get your LSID registrations done by getting new DNS entries.

I should also say that I took some trouble to convey to Sean some of what 
I think is important in the TAG's (well, many TAG members') positions. 
These include:

* New schemes and new protocols create very damaging de facto divisions in 
the Web.  Architectures like LSID implicitly put a premium on achieving 
great flexibility and highest fidelity in a more limited application 
domain, but at the cost of greatly reduced interoperability with the rest 
of the Web.  If I put a URN:LSID in an email on the TAG mailing list and 
say "check out this cool protein", the chances that an arbitrary reader's 
browser (or search spider) will do something useful with the link are 
orders of magnitude lower than for http: URIs.  This is hugely important.

* If LSID were the only new URN, then we might say 'fine, add a handler to 
all the browsers and spiders', but it's a slippery slope.  If it's OK for 
LSID then it's OK for tens, hundreds or thousands of other URNs.  We don't 
want to be in a place where Google or Firefox is having to add tens of new 
handlers per month just to make sure links work everywhere.

* There are existence proofs in many sophisticated Web sites for many of 
the techniques the TAG is advocating.  If you register DNS name lsid.org, 
for example, you have a great deal of control over the URI policies within 
that authority.  If you want do guarantee that all such URIs are 1:1 with 
their representations, make it so.  If you want digital signatures in the 
URI to help guarantee it, then put them in all your URIs.

My own (tentative) position probably comes through from the above.  I 
think it would be highly desirable in principle for http-scheme URIs to be 
used for LSIDs, but I think there's enough investment in the existing 
URN's that the bar for justifying it now needs to be quite high (though 
not impossibly high.)  I think there's a tremendous opportunity for each 
side to learn from the others needs and concerns, and that's going to be a 
lot easier if everyone has confidence that practical considerations will 
be given suitable weight.  So, I think it would also help the TAG to know 
that the LSID community was giving careful considerations to the points 
raised above in favor of the http scheme, even if that only influences 
work down the road.  The benefits to the Life Sciences community could be 
substantial, IMO.  I also think that it would be a great yardstick for the 
draft TAG finding [1] if we could get it to the point where informed 
members of the LSID community (and similar communities that may be 
sceptical) can say:  "that's a fair and convincing analysis".  I don't 
think we're quite there yet.

Anyway, I'm mostly going back to my other TAG assignments which are in 
other areas, but I hope this is of some help in framing the discussion. At 
the very least, I hope Sean recognizes it as about right in capturing the 
spirit of our chat.  Thank you, Sean, for taking the time.

Noah


[1] http://www.w3.org/2001/tag/doc/URNsAndRegistries-50.xml

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Tuesday, 25 July 2006 23:42:21 UTC