- From: Joshua Allen <joshuaa@microsoft.com>
- Date: Fri, 26 Jul 2002 00:20:28 -0700
- To: "Tim Bray" <tbray@textuality.com>, <www-tag@w3.org>
First, let's be clear that HTTP and REST have nothing at all to do with resource identity. There is absolutely no need for consensus on what resource is being represented at a certain URL, so it is really silly to argue about *what* the identity of that resource is. An http: identifier in practice simply locates a "representation dispenser for a resource". The identity of the resource for which it dispenses representations is immaterial, and a red herring in this discussion. That is because the http: identifier itself simply identifies the *dispenser*. In other words, http: identifiers are used to identify RESTful representation dispensers, without regards to the actual thing being represented. There are people who say that HTTP URL identifies an actual resource beyond a simple representation dispenser, because "mumble mumble you can indirectly identify the resource mumble mumble" or "mumble mumble it says so in somebody's masters thesis mumble mumble". But hardly anybody in practice actually believes such silly premises, and the web sure as heck doesn't depend on that interpretation. The web functions quite fine with the status quo, which is that an http: identifier identifies an endpoint which serves up hypermedia. Try this thought experiment -- if we say that http: identifiers only identify http-accessible endpoints which function as hypermedia dispensers, what breaks? The answer is, NOTHING breaks. Now for the second part. The semantic web will presumably talk about many types of entities beyond just hypermedia dispensers. (Again, certain people will argue that HTTP already identifies things beyond hypermedia dispensers, but I think I have shown above why this is pathetically wrong.) The *purpose* of an identifier is to unambiguously identify something. Axioms [1] 1 and 2a of web design make it very clear that a URI should not be context-sensitive and should not require disambiguation. The people who hate these axioms say: A) *Everything* depends on context, those axioms are paradox B) Disambiguation might sometimes be required, so those axioms should be abandoned C) If something can be identified indirectly, there is no need for a direct identifier However, the existence of ambiguity does not make the pursuit of clarity a worthless exercise. The possibility (even certainty) that stupid people will violate these axioms periodically is by no means an excuse for despair. People who violate these axioms will simply be blathering nonsense which will be understood by nobody. People who wish to communicate on a semantic web will do their best to adhere to these axioms. (I am not sure if you question the wisdom of these axioms. Please let me know if you would like some practical examples about why axioms 1 and 2a are important. Hopefully you agree that these are fundamental.) So, assuming we agree that axioms 1 and 2a apply, why is it dangerous to use an http: identifier to identify a beach? Well, for starters the identifier is now unable to unambiguously identify between a particular instance of a hypermedia server and a beach. There are three things that can happen: 1) Most sane people (and all web browsers) will assume that the http: identifier points to a hypermedia server, so you will find in practice that it is difficult to encourage YOUR definition of the "thing being identified" on the population at large. 2) You could decide that it is OK for the URI to identify different things depending on context. But that of course is violating the axioms of web design. 3) You could play games with words and declare that "the hypermedia server IS the beach", or "the hypermedia server is non-existent and ineffable". In other words, you could claim that there is no difference between the hypermedia server and any particular resource that you claim to be representing. However, this is also a clear violation of the axioms. This is appealing in a world where there is only GET, because in HTTP, the distinction between representation *dispenser* and *resource* is irrelevant and of no practical import. The idea that the hypermedia server is actually the resource is a silly vanity that we politely ignore, because we only ever use the representation dispenser anyway. But the instant I want to *really* identify something other than a representation dispenser, in a scenario that doesn't involve GET, then I should avoid the temptation to overload the well-established consensual meaning of the http: URL. If I succumb, why stop at saying that the beach and the HTTP endpoint were the same? What's to prevent us reducing the entire universe into one word? And it's very difficult to see how you could ever achieve consensus on the meaning of your identifier. Words are defined by consensus of people actively *using* the words. People use http: identifiers to identify http-accessible representation dispensers, and they get confused as heck when anyone tries to say that they identify something different. As a practical matter, it is easy to get people to understand that you mean a representation dispenser when you use an http: URL. They can pull it up in a web browser and prove it for themselves. If you decide to identify something that is *not* a representation dispenser (a book for example), and you use an http: URI, then you are going to have a heck of a time explaining to people that the URI is *not* a representation dispenser. And if you decide it can be both, woe be upon you. 4) You could take a different approach, and say that the http: part of the URI is not a characteristic of the thing being identified, but is rather an unnecessary appendage that has the side-effect of making it easy for people to do a GET on the resource. You could say that the resource being identified is not something that is intended to be primarily interacted with via HTTP, but at least GET is now possible. This is the URI version of Pascal's Wager -- "if I don't stick http: on the front, I can never do a GET, but if I *do*, it might confuse people during the life here below, but I might be able to do a glorious GET one day!" This approach is the worst yet. For starters, the http: part of the name becomes all but useless in identification. If it doesn't assist identification, it shouldn't be part of the name. Why not fix HTTP and web browsers so that they can handle any URI; even ones that don't begin with http:, and then kill the http: scheme altogether? Second, it becomes impossible to use a URI to identify a representation dispenser. Maybe that is OK with some people. Third, you end up with chunks of the global word space owned by the highest bidder. For web sites, it is a *good* thing that document/representation dispenser behavior is owned by the person who pays for the DNS segment. But the entire point of the semantic web is that the words will NOT be owned by the people being talked about. If people are permitted to "own" words at the meta-level, the semantic web is doomed. Consider a small example: suppose that Culinary Press establishes identifiers for all of their books. One particular book is identified as: http://www.culinary.com/books/pub-123 Thousands of book reviews are logged at various sites, and they are almost universally negative. Now, Culinary Press had long ago released a similar book, with identifier: http://www.culinary.com/books/pub-101 which was a smashing success. Since that one is out of print now, and pub-123 represents a large revenue potential, they decide to swap the two identifiers. Now when customers search for the new book, it tells them that the identifier is actually pub-101, and when they look for reviews about that publication on Google, they are much more likely to buy it. The point is that people will be less likely to use words that they don't trust, and using a word that is tied to DNS requires a person to trust the administrator of the DNS segment. Again, in HTTP you don't care. In the semantic web, it *does* matter. If words are owned by the highest bidder, people simply won't use them, and adoption will suffer. An identifier scheme like urn:isbn:11101122 produces names which are not tied to any particular DNS range, and are therefore less likely to be "owned". The word will take on the meaning that is established for it via consensus, which is how words SHOULD form. The URI scheme clearly identifies that it is neutral to access method and network segment ownership, so it is more likely to be perceived as trustworthy. If *you* wanted to write a book review about a book, and wanted to be sure that the maximum number of people could find your review, which URI scheme would *you* use? Finally, it's rather short-sighted. What happens tomorrow when there are pervasive ways to GET resource representations asynchronously? What about purely P2P systems like Freenet? It would surely be a shame to have all of your important objects identified with some sad and lonely http: stuck to the front when HTTP is obsolete and nobody does synchronous GET over port 80 anymore. [1] http://www.w3.org/DesignIssues/Axioms.html > -----Original Message----- > From: Tim Bray [mailto:tbray@textuality.com] > Sent: Thursday, July 25, 2002 4:10 PM > To: www-tag@w3.org > > > Joshua Allen wrote: > > > In other words, for HTTP this is simply sophistry. If we ever want the > > web to progress beyond the shackles of synchronous HTTP GET, we need to > > deal with the problem as a practical matter. > > Why? -Tim
Received on Friday, 26 July 2002 03:21:04 UTC