- From: Tim Berners-Lee <timbl@w3.org>
- Date: Fri, 1 Feb 2002 17:24:40 -0500
- To: "Peter Pappamikail" <peter@pappamikail.net>, <www-tag@w3.org>
----- Original Message ----- > From: Peter Pappamikail > To: www-tag@w3.org > Sent: Sunday, December 16, 2001 5:23 PM > Subject: Resource discovery: limits of URIs > I'm flagging up an issue as a private individual rather than in my official capacity of Head of Information Resources Management in the European Parliament, although the issue I address has been considered in my professional work as well as my private research and work on XML implementation issues. >My concern is the mechanisms available to "translate" information on a uniquely identifiable artefact to an addressable URI. Please accept my apologies in advance if the issue is not appropriate for this list. Your question makes an assumption that URIs can be divided into names and addresses. This is, I beleive, not a useful assumption. In brief, the differences between "names" and "addresses" are many and varied, and soial rather than technical. In longer form, see http://www.w3.org/DesignIssues/NameMyth.html for my take on it. This is an old issue. Not everyone takes my position. There have indeed been many systems designed for names, which almost invariably end up with the same authority:local id system, and then (I simplify) a reconstruction of the entire domain name system in order to be able to look them up. >As an "ordinary user" I can "identify" or name a particular information artefact, a book, a document, etc. With a URL, I can address it. The URL will give me an address that usually combines details of an originating authority, a content identifier, sometimes a language version and an application format (MIME extension). No, just an authority and a content identifier (your terms). There is no semantics to the right hand part, and you certianly should never assume anything about the MIME type of something from its URI. Other http://www.w3.org/DesignIssues/ notes deal with some of this. > However, with the exception of the language version - that might, depending on the server infrastructure, serve up a version according to my indicated preferences set in the browser - the "discovery" of the full URL cannot be deducted algorithmically from the content identifier. A couple of examples to demonstrate my concern more clearly: - "bookmark rot": I mark a set of resources from a particular site, only to find a year later that all the references are rotten as the .htm extension has been replaced by .php throughout the site, although no single item of content has changed; That is entirely the fault of the site mangement, and should be brought up at their company's shareholder meeting IMHO. See http://www.w3.org/Provider/Style/URI "Cool URIs Don's change". > I reference an item found via a WAP service, knowing that a more complete version of the same content is available in HTML on a parallel web site: the 'URLs' however are completely different despite referring to the same artefact; That is a problem with the WAP architecure. It is not a problem with HTTP which allows you to distinguish between a Genric URI and a specific one. (http://www.w3.org/DesignIssues/Generic) >- I copy a URL in a site, only to discover that the the URL is attributed not only dynamically but is ession specific and sometimes personalised, and thus un re-useable; That is also a server problem, unless there is real session-sepcfic information (such as search results) which you may want to bookmark. Cookies, when used instead, prevent this problem. >- I'm listening to a voice synthesised web page that contains links to resources thatare available in audio and text, but the link takes me to the text file via the hypertext link; The server is not doing format negotiation propely, or not trusting the client to do so. Some sites *do* get this right - natural langauges for example. > In architectural terms, my concern is that more and more sites, in the absence of any clear mechanisms for resolving addresses from identifiers, have increasingly complex interfaces with proprietary resolution mechanisms than practically render resources discovery impossible, except indirectly. I agree that is a problem. I would strongly promote content negoatiation. > A user should be able to indicate the minimum information that distinguishes a particular artefact uniquely (I'm not sure the URN does this, because it is still only a URI with a commitment to persistence) and not be bothered with whether it is the most recent version, which languages are available, whether it is in pdf, html, xml,wml, but that the server will resolve this in a context-sensitive manner. The issue will become critical when XPointer starts to be used to identify resource fragments: in fact the XPointer's potential weakness is precisely that the containing document may itself be poorly addressable. > My "ideal scenario" would be the replacement, in the hyperlink target data, of an URI - pointing as it does to a specific file - by a "UCI" ( a "Uniform Content Identifier") that resolves to the specific components: - a DNS entry or other service locator; - on the server side, to an URI appropriate to the client context, made up of the content identifier 'wrapped' with language, version, format and other context specific data; The way this is done is that 1. You use an HTTP URI 2. You use content-negotiation 3. You use "Location:" field to point to the specific URI You can of course embedd metadata in a resource to give more sophisticated map of the resources, gneric and otherwise, involved. > If this sort of issue is handled elsewhere, I'd be happy to be pointed the way, but I feel the issue goes beyond the scope of current W3C activity on addressing and is too "instance specific" to be in the realm of RDF or other semantic resource discovery issues: I believe the issue is analoguous to HTTP language negotiation, and warrants similar treatment. I hope you can indeed use content negotiation. It migh be worth a look at CC/PP for more precise client profiling. tim > Peter
Received on Friday, 1 February 2002 17:24:30 UTC