Pierre-Antoine Champin, Jérôme Euzenat, Alain Mille
mailto:champin@lisi.univ-lyon1.fr
mailto:Jerome.Euzenat@inria.fr
mailto:amille@lisi.univ-lyon1.fr
PDF version of this document also available.
Uniform Resource Identifiers or URIs [1,3] have first been designed to offer a global and uniform mechanism to identify network accessible resources. More recently, the will to achieve the Semantic Web [2], and more particularly the Resource Description Framework (RDF) [11] made it a base vocabulary to describe not only network accessible resources, but any resource.
As a matter of fact, people are used to handle URIs, but mostly one kind of them: Uniform Resource Locators or URLs [4]. Hence, whenever a resource needs to be identified, a URL is used which corresponds more or less to that resource. This is the less part which is concerning, has become a problem for RDF, and may become, in our opinion, a serious obstacle to the Semantic Web.
We will first present our understanding of the notion of resource, which is the ground of the following discussions. Then we will explain why we think that URLs are often misused when employed as URIs, while they nevertheless have some advantages. Finally we discuss straightforward solutions which could be used to keep those advantages, without the drawbacks, based on Uniform Resource Names (URNs) [12].
As far as we know, no specific definition of the term ``resource'' has ever been given in the literature about the Web or URIs1. We hence take it in its common meaning; for example from Wordnet http://cogsci.princeton.edu/~wn/):
2: a source of aid or support that may be drawn upon when needed: ``the local library is a valuable resource''
As we pointed out in introduction, the web initially handled computer retrievable resources, i.e. resources being deliverable online. However the definition above is very general, and depending on the task, any identifiable thing can be considered a resource: a file, a web page, a person, a company, etc. In this section, we identify some properties of resources, which should be kept in mind when trying to locate or identify them.
More generally, different elements of the context can make resources vary. For example, the resource at http://www.w3.org/Icons/w3c_main is an image (namely the W3C logo) formated as a GIF or a PNG file, depending on the content negociations between the HTTP server and the web browser.
The main problem with URLs when used as URIs, i.e. as identifiers, is that they usually do not identify what one would expect them to. This is due to a number of properties that URLs have or may have, as specified in [10]; a (probably not exhaustive) list of those problematic properties follows.
Moreover, [6] allow RDF schemas to assign URIs to abstract things (such as classes or properties), but those URIs are built from the URL of the schema as fragments of it3. For example, the URI of the type property of RDF is http://www.w3.org/1999/02/22-rdf-syntax-ns#type. As a matter of fact, the property itself is not located by that URL: its description is.
For example, the homepage of the first author is located at http://www710.univ-lyon1.fr/~champin/. If he quits that university today, that URL will become invalid. If tomorrow somebody named Champin joins the university, she may get the same URL for her homepage.
In the immediate interpretation, a URL identifies the resource retrieved through it. But, as discussed above, there can be several such resources, depending on the context. Hence this interpretation is not appropriate.
The wholistic interpretation of a URL is the set of all possible resources that can be retrieved via this URL in any possible context. This interpretation is classic in model-theory, and is a natural extension of the previous one. It has however the drawback of not being as intuitive as it seems: in the example of the homepage (section 3.1) the intension would be something like ``the homepage of the university member with login champin''. Another problem is linked with URLs having a side-effect beyond the retrieved data: a registration CGI script may always return the same "thank you" page, whatever parameters it is given; moreover, a mailto: URL is not designed to retrieve anything.
The process wholistic interpretation of a URL is the set of all possible processes triggered by the URL used in conformance with its scheme, in any possible context. This interpretation allows to tackle the latter drawback of the previous interpretation: even side-effects are taken into account.
We are aware that the latter interpretation is quite different from the more intuitive two others (though we showed that the second one is not as intuitive as it seems). We agree that these interpretations suit many applications - we see them as a kind of metonymy4. But still we believe that several efforts related to the Semantic Web, implying formal logic and ontologies, require robust identifiers to deal with.
We also want to insist on the fact that none of these interpretations justifies using URLs to identify resources not located by them.
Contrarily to URLs, URNs (Uniform Resource Names) are designed to persistently identify a given resource. Hence we suggest that they should be more extensively used than they are currently. One problem is how to build these URNs. Another one is how to get the URN corresponding to a given URL. In this section we make some suggestions to solve these problems.
Besides the fact that they are commonly used, we see two main reasons which make URLs popular URIs: unicity and retrievability. Hence, a useful URN scheme should have the same advantages, which can be somehow inherited from URLs.
urn:new-urn-scheme:www710.univ-lyon1.fr/~champin/03-2001/myIdentifier
Such an URN scheme nevertheless requires precautions: name subspaces should be permanently assigned, which is not currently the case with something like /~champin, as pointed out in section 3.1. This is why a date has been included in the example above5 -- Note that a similar method is already used for URLs in the W3C to differentiate different versions of the same document.
This ``advantage'' is often pointed out by people identifying abstract resource with URLs: the retrieved resource can be a description of the identified resource. This is all the more useful that abstract resources can only be described, not retrieved.
We take it as a drawback as well, because it is inconsistent with all the interpretations presented in section 3.2, including the more intuitive ones. Actually, it leads to a confusion between the resource and its description, which are indeed two distinct resources. Furthermore, there is no way to distinguish an ``identifying'' URL (where the retrieved resource is an instance of the identified resource) from a ``descriptive'' URL (where the retrieved resource describes the identified resource).
In the URN example above, the prefix could be changed to make it a URL, but then any new-urn-scheme should define precisely what protocol may be used with the given path, and if any, what is the meaning of the retrieved resource (instance, description, something else...). Depending on that meaning, computer retrievable as well as abstract resources could be described wthout ambiguity by those kinds of URNs.
That question may come to any user of the Semantic Web when retrieving a web page (or any other resource) through a URL. If we want URNs to be extensiely used, the answer to this question must be as easy to get as possible.
Hence, we suggest that there is a need for the service described in [7] as L2Ns, returning a list of URNs corresponding to a given URL. We additionally suggest that such a service return the URN list in increasing order of generality: the first one being the identifier of the data retrieved in the current context (may be using the URL scheme proposed in [7]), and the last one being the URN of the named most generic resource included in the wholistic interpretation of the URL.
A solution solution would be to include this service in the retrieval process, either as an RDF description, an XHTML meta-element (<META type="urn" value="...">) or an HTTP header field (Content-Urn=...).
In this note, we discussed the issue of what exactly is identified by URLs, when they are employed as Resource Identifiers (URIs). As a matter of fact, we think that they are often misinterpreted when used as such. The interpretation we proposed, though not the most intuitive, seems to be more robust that more intuitive ones.
We then discussed a way of building URNs inheriting the good properties of URLs (unicity and retrievability) and allowing to identify any (network retrievable or not) resource. We believe that such URN schemes are necessary to achieve the goals of the Semantic Web, since they provide cleaner identifiers than URLs. We finally discussed the necessity of implementing L2Ns services so as to encourage the use of URNs.
This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -local_icons -split 0 urls.tex
The translation was initiated by Pierre-Antoine CHAMPIN on 2001-04-05