- From: Evain, Jean-Pierre <evain@ebu.ch>
- Date: Thu, 10 Mar 2011 14:34:26 +0100
- To: 'Juha Hakala' <juha.hakala@helsinki.fi>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
- CC: Peter Saint-Andre <stpeter@stpeter.im>, "uri@w3.org" <uri@w3.org>, "urn@ietf.org" <urn@ietf.org>
Hello there, One of the most recent activity in this domain is http://www.w3.org/2008/WebVideo/Fragments/ Cheers, Jean-Pierre -----Original Message----- From: uri-request@w3.org [mailto:uri-request@w3.org] On Behalf Of Juha Hakala Sent: jeudi, 10. mars 2011 13:29 To: "Martin J. Dürst" Cc: Peter Saint-Andre; uri@w3.org; urn@ietf.org Subject: Re: [urn] fragment identifiers Hello Martin; all, A few comments below. Martin J. Dürst wrote: > Hello Peter, > > I have cross-posted to the URI list, because I think it's important to > get input from more experts. People on the URI list, this is about what > to do (or not to do) about fragment identifiers in URNs, raised in the > context of an update of RFC 2141. For the URN community this issue is important because there are initiatives which are eager to use fragment identifiers. I have heard rumours that some are already using them. A typical use case would be a very complex data such as structured research data set within which many kinds of data should be separately described, identified and retrieved. > > On 2011/03/10 13:30, Peter Saint-Andre wrote: >> <hat type='individual'/> >> >> On 3/9/11 2:11 AM, "Martin J. Dürst" wrote: >>> >>> On 2011/03/09 13:51, Peter Saint-Andre wrote: > >>> Anyway, from a higher-up view, RFC2141bis is defining the "urn:" URI >>> scheme, and URI scheme definitions in general are supposed to say >>> nothing (or just a little in some exceptional cases) on fragment >>> identifiers. The reason for this is that fragment identifiers are >>> defined per MIME Media Type, not per URI scheme. >>> >>> So if I have something like "urn:foo:bar:baz#here", then the urn spec >>> only has to say what "urn:foo:bar:baz" is supposed to mean, the meaning >>> of "here" is defined by whatever format I might get back when resolving >>> "urn:foo:bar:baz". If I have a browser that resolves (some) urns (I >>> don't know one, but there should be some), this is what already happens, >>> and it shouldn't and won't change. RFC2141bis doesn't have to say >>> anything for this to work. >>> >>> In case RFC2141bis tries to do anything else than the above, that would >>> be a very bad idea, and should be fixed quickly. >> >> Here is what RFC 3986 says: >> >> The semantics of a fragment identifier are defined by the set of >> representations that might result from a retrieval action on the >> primary resource. The fragment's format and resolution is therefore >> dependent on the media type [RFC2046] of a potentially retrieved >> representation, even though such a retrieval is only performed if the >> URI is dereferenced. If no such representation exists, then the >> semantics of the fragment are considered unknown and are effectively >> unconstrained. Fragment identifier semantics are independent of the >> URI scheme and thus cannot be redefined by scheme specifications. >> >> As far as I can see, the semantics of fragment identifiers in URNs would >> not be defined by media types because URNs are not generally resolved >> for the purpose of retrieving a representation. > > "not generally" and "not" are not the same. Even for http: URIs, it's > true that they are not always resolved. So in that sense, if I use > http://never_any_server_here.sw.it.aoyama.ac.jp/one/two/three > with some fragment identifier (I'm in control of sw.it.aoyama.ac.jp and > make sure that there never is a server at > never_any_server_here.sw.it.aoyama.ac.jp), then I'm indeed unconstrained. > > On the other hand, for quite a few URNs, it would make a lot of sense to > resolve them. Let's say I have set up some proxy or use some dedicated > browser that helps me resolve some URNs. Then the paragraph from RFC > 3986 that you cite above clearly applies. Persistent identifiers will be used for multiple purposes, and by the time we assign e.g. a URN to a resource, we have no idea which resolution services will be needed in the (distant) future. Lifetime of a PID may be centuries; applications and the functionality they offer will change many times during such a period. And eventually even the copyright protection of a document will expire ;-). Retrieving a representation is one the key resolution services supplied already. But there does not need to be a 1:1 relation between a URN (or any other persistent identifier) and the URI (URL/URLs) it maps to via a resolution service. For example, consider: DOI: 10.1016/B978-0-240-81330-1.00007-5 This is a real Digital Object Identifier based on ISBN of Tomlinson Holman's Sound for film and television (3rd ed.), but please note that this DOI does not identify the entire book, but just a chapter within it. The final section of the DOI suffix (00007-5) signifies the second chapter of the book. Each chapter has its own DOI, and they will most likely be available for purchase as individual files, so the URIs these DOIs resolve to will not have <fragment>s in them. But if the above "extended ISBN" were expressed as URN, we might come up with something like: URN:ISBN:978-0-240-81330-1#00007-5 if this were the way in which identifiers for book chapters were expressed according to the ISBN standard and in the ISBN namespace. This URN would then resolve to the same PDF file as the DOI above, either in the same digital library or in some other digital asset management system. >> Therefore, in the >> context of URNs, the semantics of the fragment would be considered >> unknown and would be effectively unconstrained (at least from the >> perspective of the 'urn:' URI scheme). > > Non sequitur. > >> 2141bis seems to imply that the semantics of the fragment identifier >> could be constrained by the definition of a particular URN namespace >> (despite the fact that they are not constrained by the 'urn:' URI scheme >> itself). Yes; some namespaces / identifier systems will not allow usage of <fragment> since the syntax of the identifier does not support such a thing. For instance, the example shown above URN:ISBN:978-0-240-81330-1#00007-5, or ISBN string ISBN 978-0-240-81330-1#00007-5 is imaginary, since ISBN standard does not actually support this. DOI does, and one might also construct national bibliography numbers (NBNs) and consequently URNs which consist of ISBN and fragment identifier. Thus DOI namespace (if one is registered in the future) and NBN namespace should support <fragment>, if we are to give free hands to people using these identifiers in the URN context. > That would make at least some limited sense, if we could sort namespaces > by whether they (maybe only occasionally) allow resolution, or whether > they are absolutely and terminally never ever going to be used for > resolution. Based on what I have said before, I don't think that resolution is the crucial factor here. And if I am wrong and it is, then any namespace may allow resolution at some point in the future when the requirements of the user community change. But the last sentence from the paragraph you cite says: > > Fragment identifier semantics are independent of the > URI scheme and thus cannot be redefined by scheme specifications. > > This not only means that the URN spec (which is just the definition of > the 'urn:' URI scheme) cannot redefine fragment identifier semantics, it > also seems to imply that scheme specifications (including the URN spec) > cannot delegate such semantics to some subspaces of the scheme. Yes. > >> I'm not sure what the use cases are here, but perhaps folks on >> the list could explain a bit more what they mean by reusing an >> identifier scheme that designates objects of such complexity that it is >> necessary to reference parts of the objects via fragment identifiers. I can give one practical example from my own library. Like many other national libraries, we digitise old books. The outcome of the process is a METS container, within which the full text of the book is stored in structured XML (METS/ALTO). The structure expresses chapters, and some information objects such as images. Each chapter has currently its own URN:NBN, so in addition to being able to provide a persistent link to the title page of the book, such links can also be made to the chapters and other component parts of the book. We believe that some users will find such functionality useful (and they will also be happy when the URNs will still be functional many years from now, unlike many URIs that were thought to be cool). If usage of <fragment> is allowed in RFC2141bis and within the NBN namespace, we might change the current policy and assign just one URN:NBN to the book itself, and then fragment identifiers based on the NBN to the chapters and other component parts of the book. Our URN resolver would be able to map these URN:NBNs to the correct component parts within the METS container (or any other container standard we will rely on in the future. > I'm looking forward to hear from other people on this list, but > essentially even if there are very complex objects, there are always > different ways to identify components than using a '#'. True - in our case, the national library of Finland can continue the current policy and assign an NBN to each component part. Nevertheless, it may be a good idea to allow choice between two different approaches. In some cases, using <fragment> can be more convenient than assigning individual identifiers. Research data sets come to mind; perhaps somebody from that community can describe the requirements? Best regards, Juha > > Regards, Martin. > -- Juha Hakala Senior advisor, standardisation and IT The National Library of Finland P.O.Box 15 (Unioninkatu 36, room 503), FIN-00014 Helsinki University Email juha.hakala@helsinki.fi, tel +358 50 382 7678
Received on Thursday, 10 March 2011 13:38:07 UTC