Re: [BioRDF] All about the LSID URI/URN from Henry S. Thompson on 2006-07-25 (public-semweb-lifesci@w3.org from July 2006)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Tue, 25 Jul 2006 17:27:07 +0100
To: Sean Martin <sjmm@us.ibm.com>
Cc: public-semweb-lifesci@w3.org
Message-ID: <f5b8xmhfy5g.fsf@erasmus.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sean Martin writes:

>> Well, either your scheme is intended to be dereferenceble, or it
>> isn't.
>> 
>>  If it is, then instances are likely/virtually certain to contain some
>>  kind of named starting point, which needs to be looked up and
>>  resolved to an IP address start the dereferencing process.  Domain
>>  names and DNS are by far the best available implementation of this
>>  step, with excellent performance, widespread deployment and
>>  considerable flexibility.
>
> As it is a URN, the starting point for dereferencing is urn.arpa. The 
> specification [1] details the use of the DDDS system (RFCs 3401-3405)which 
> uses the existing DNS system (for the very reasons you detail) but 
> maintains a level of abstraction between the authority name in the 
> identifier and the data service location that can provide a copy of what 
> was named, as is proper for URNs.

So, register one of lsids.org, lsids.net, lsids.name or lsids.info,
and use e.g. http://lsids.or/xxx instead of URN:LSID:xxx.  Bingo -- no
new tools required, works in all modern browsers :-).  Implement as
much or as little redirection, caching etc. as you wish in the server
you run at lsids.info:80, just as you would using DDDS.

>> >> as well as the only means by which one may 
>> >> retrieve it (the protocol, usually http, https or ftp).
>> 
>> Not so.  The URI RFC [1] makes clear that it is up to protocols to
>> specify what URIs they interpret and how, not the other way around.
>> It is entirely reasonable, and indeed expected, that new protocols may
>> specify interpretations of 'old' URI schemes, including 'http'.
>> 
>> >> The first question to ask yourself here is that when you are
>> >> uniquely naming (in all of space and time!) a file/digital object
>> >> which will be usefully copied far and wide, does it make sense to
>> >> include as an integral part of that name the only protocol by which
>> >> it can ever be accessed and the only place where one can find that
>> >> copy?
>> 
>> I hope the above clarify that this is not the case for names using the
>> 'http' scheme.  Indeed they are much more likely to do so for 'http'
>> than for almost any other scheme.
>
> Assuming that a new http protocol replaces the existing one, how does this 
> change things? 

Sorry I wasn't clearer.  Any new protocol of _any_ kind can specify
how it handles 'http'-scheme URIs.  I assumed you were worried about
'http'-scheme URIs going "out of date" somehow because 30 years from
now http is dead and we're all using sdtp (super-duper transport
protocol TM, Patent Pending :-).  All I'm saying is that for sdtp to
be a success, it will surely define what to do with an 'http'-scheme URI.

> Surely the name is still tied to a single protocol (HTTP) 
> even if the underlying implementation of that protocol has changed?

No, again, as the URI RFC makes clear, it's _protocols_ which define
what they do with URIs of particular schemes, not the other way around.

> LSIDs are independent of any particular transport protocol and
> indeed already make use of any of the commonly used ones
> simultaneously (ftp, http, SOAP, file:// etc). The thing to remember
> here is that we are not thinking about URIs in the abstract here,
> but rather a 'living, breathing system' intended for naming digital
> objects that will be copied/archived far and wide. It was deemed
> important to support as many mechanisms as possible (including
> future ones) to support that copying/archiving process without
> losing track of the unique name.

So all LSID clients have to support all those protocols?  Doesn't
sound like a likely route to wide deployment. . .  Or are you proxying
all requests through a few central servers, who choose what protocols
to use for the initial fetch?  If so, no problem doing that with
'http'-scheme URIs either. . .

>> >> Unfortunately when it comes to URL?s there is no way to know
>> >> that what is served one day will be served out the next simply
>> >> by looking at the URL string. There is no social convention or
>> >> technical contract to support the behavior that would be
>> >> required.
>
>> 
>> True for some 'http' URIs, false for others.  The owners of a group of
>> names, whether they use 'http' or not, are responsible for
>> documenting, implementing and enforcing usage conventions.  I
>> absolutely agree that for your purposes you need to take this very
>> seriously, but using 'http' doesn't make this any harder (or, of
>> course, any easier).
>> 
>
> I am not sure that I can agree with you on this point. How does one go 
> about differentiating between one http:// URI and another programmatically 
> for the purposes of knowing what its conventions are? As opposed to using 
> something else which only has one established convention?

See above suggestion wrt http://lsids.org/ -- you own that domain, you
set the conventions/policies/etc.

> [1] http://www.omg.org/cgi-bin/doc?dtc/04-05-01
>
> --
> Sean Martin
> IBM Corp.
>

- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (GNU/Linux)

iD8DBQFExkZbkjnJixAXWBoRAlWAAKCEBW+ClErSYh6+p1Haxhoi+3n0UgCeK6xP
tiF46Ziales3L9xL2omz+r4=
=LfH/
-----END PGP SIGNATURE-----
Received on Tuesday, 25 July 2006 16:27:30 UTC