- From: Roy T. Fielding <fielding@apache.org>
- Date: Fri, 24 Jan 2003 19:28:15 -0800
- To: Tim Berners-Lee <timbl@w3.org>
- Cc: Sandro Hawke <sandro@w3.org>, www-tag@w3.org
> Ok, here is one hook to a difference in the model you and I have, > Roy. You point out that the API in libwww basically provides > the functionality of HTTP, and at the same time gives access > to FTP and so on. You use this an an illustration of a theory that > all URIs have the same interface as HTTP, that HTTP > extends over the web the interface of libwww in a quite generic > way, while other protocols only support some of the features. > Hence the ability of HTTP proxies to provide access to FTP and > Gopher. > > Which is is logical. However, it does not address the range of all > URI schemes, and of course as HTTP basically doesn't play with > the fragid, it doesn't involve that at all. It only needs to address those schemes for which a representation is a useful and desirable thing, and in those cases it does so. It doesn't play with the fragid because the fragid is not a first-class identifier in the system -- it is impossible to do anything other than GET or name-equivalence on a fragment. AFAIK, that is true within the Semantic Web as well, so I don't know where you are going with this. > It is a reasonable bit of software design for libwww to generalize > where generalization can be done, and it is not surprising that > HTTP, as a later design, "embraces and extends" FTP. > And HTTP is in fact a good model for the Web, and the category of > URIs for which this model holds (http, https, ftp, gopher) > are important, because they form a web of network information > objects. (I'm happy to call that the Web, and exclude "Web" Services, > by the way. We can call them "Internet Services" if you like. > I think this so far if what you call the REST model.). Information-providing resources, yes, but anything with state has information it can provide. > But other URIs don't fall into that scheme. mailto: URIs > identify mailboxes, and to say that you can make an HTTP proxy > represent a mailbox is a kludge. That is your interpretation of what mailto identifies, which unfortunately isn't supported by the specification. In any case, that would not be a kludge. HTTP is an interface protocol, not a web page. What interactions can you have with a mailbox that you cannot have with an HTTP resource? None. A mailbox is a subset, and therefore a trivial interface to build via HTTP. That doesn't mean its a good idea to do so: the HTTP write and append mechanisms are necessarily more abstract, less efficient, and more genericly defined than those in SMTP and IMAPv4, but it is possible! > A web site can have various > pages which give various sorts of information related to a > mailbox, but conceptually a mailbox is a delivery point > not an information object. Conceptually, a mailbox is both a delivery point and an identifier that is used in various ways by information systems to allow storage and organization of received and sent mail. How is that different from a collaborative weblog? The only real difference, aside from protocol syntax, is how the default access control is defined on mailboxes. It is still a mailbox and mail is still delivered by way of an SMTP interface, neither of which prevents the HTTP interface from correctly interpreting and processing requests from those applications that, for whatever reason, wanted an HTTP proxy to their mailbox. > You could map HTTP's POST to it but not HTTP's GET. I would only map those methods for which a use exists. The point is not to make every system HTTP (that would be absurd), but rather to make the information within every system accessible via HTTP. Web browsers took the name literally and, for security reasons, interpret a GET on a mailto URI as a request to open a mail application in composition mode with that mailbox pre-filled in the To: field (and others if you implement according to the RFC). Whether or not you agree that those semantics were originally intended, they are a reasonable interpretation of the identifier's mapping to a representation. It isn't the mailbox itself, of course, but that isn't the implementers' fault. They think it literally identifies "initiate mail to". > Similarly, telnet: URIs are end points for interactive sessions. > You can connect to one by a java obect in a web page, but > that doesn't mean they are like web pages any more than > a flower pressed in a book is a piece of paper. telnet URIs are initiations of telnet sessions, not complete telnet sessions. I didn't say it was like a web page. HTTP interaction via proxies is obviously not limited to web pages, so why would I have the proxy represent it as a web page? Even if were limited to web pages, an applet fits within your definition of a web page and is a perfectly valid response to a GET on a telnet URI because GET is requesting that the session be initiated. Likewise, using a mailcap handler to find a helper program and invoke it with instructions to initiate a telnet session is performing a GET on a telnet URI using the generic interface, and it was implemented on all of the Unix-based Web browsers in 1993. > So that is I think one way in which our formalizations of URIs > differ. I cannot emphasize this enough: if the conceptual work formalism cannot accommodate established best practice on the Web, then it is not a valid formalism of the Web. It may be a formalism of something else, perhaps something even better than the Web, but it is not formalizing the Web as we have implemented it. [...] > RDF people do not in my experience use a URI to represent > both the resource and a representation. Well, I don't. > (Cwm has, for example, a relationship -- a built-in function -- > log:semantics > which relates a resource to what you get from retrieving a > representation and > parsing it, and another, log:contents which relates a resource to > the bits of any representation of it) > > If you assumed that is what people are doing , it may be because you > are mapping their words onto your concepts, not theirs. You maybe > forget that for me, for example, the car and the picture of the car are > distinct. It is the confusion between those which causes a problem. This is the point where I have repeatedly said to you during our meetings that neither I nor the REST model ever confuse those two things. The resource is what is identified. The representations are what the client receives. They are always distinct. > Now, you don't write RDF so I am not sure how I discuss this with you. > I've written a lot of http://www.w3.org/DesignIssues/HTTP-URI > specifically > about this and I don't know where to start. I can read RDF. I don't write it because I am more likely to get the semantics wrong due to syntax error than due to English errors. > I think you must agree that once my program accesses the web page > which we will say is a picture of a car, then it has a representation > of > a picture on bits. It has therefore a concept of the picture. > The picture itself has important properties such as who owns it > and made it, and what its copyright information is. > You say that that is information about the representation, but I would > point out that a picture can have many representations, in JPG PNG > and GIF at various levels of resolution. They share owner, > copyright, date of creation, creator, focal length, genre, exposure, > orientation, and so on, because they are all what I would call > representations > of the same picture, the same conceptual work. They each independently hold the same relationships with a target. No problem. Hopefully they will tell you so using metadata. > This commonality is very strong, and points to the value of > being able to identify the thing they have in common: the picture. > And normally, when I want to make a hypertext link to that > it is to the picture, not to a representation, that I want to make > the link. So the argument that we are "just talking about > representations" doesn't fit the bill. It doesn't meet the > requirements to be able to talk about the picture as a conceptual > work. > > Now, you say the owner of the HTTP URL can declare that it actually > identifies the car. I say that messes things up. Suppose the owner > does > that -- suppose they mark up the JPEG with a comment field indicating > that. Now my client program has no ID for the picture. You are right. By doing so, the authority has specifically said that it is not guaranteeing future representations will be a picture. Perhaps it will replace it with an MPEG movie, or a text description, or maybe it will always be a picture and the authority simply doesn't want you to have the picture's ID. If they DO want you to have that ID, then they would have supplied a link to an ID that did identify a picture. In fact, the only thing the authority has done is said that you can identify the car with that URI. Basically, they are saying that there is a permanent, N:1 relationship between the URI and the car that is a valid ID both inside and outside the Web discourse. The reality is that you never did know that the URI was identifying a picture, because the identity mapping for an http URI remains hidden inside the server until they supply some other information out-of-band that might explain the semantics to the person making the link. That is the crux of the issue -- who gets to decide why a link gets broken? Is it the authority or the people who made links based on a mistaken belief that the authority shares their conception of the URI's meaning? "Cool URIs don't change" is an explicit recognition that the naming authority controls that meaning. > Now here's the rub. When the URI was for the picture, then I > can indirectly identify the car with it, as "x, where <car.jpg> is a > picture of x". > In N3 that looks like "That which has picture car.jpg". > > [ has :picture <car.jpg> ]. No you couldn't. You could only say "That which is represented in this picture from <car.jpg>." You only assumed it identified a picture. > That's cool. Its what we do all the time to identify things for > example > people by SSN. "The car whose picture hangs above your mother's > fireplace" > and stuff. KR sytems thrive on it. What doesn't work is if we > say that <car.jpg> actually is an identifier for the car. > Because "the picture of the car" doesn't identify the picture - it > identifies > any picture of the car. > > [ is :picture of <car.jpg> ] > > You can write it but it doesn't work. Its not a bug in RDF. It is a > fundamental problem > with the URI system we assume that you don't have an identifier for > the conceptual work. It is completely invalid to assume that any representation on the Web will maintain the same format over time just because the recipient has once observed it and made an assertion about it. Your assertions are therefore entirely dependent on the authority's wilingness (or ability) to maintain that mapping as a picture over time. The only thing the client can validly assert is that, during some range of time T, GET(me, <car.jpg>, T) consistently results in a representation with form=picture and subject=car. You don't have the ability to assume the identity of <car.jpg> is either a picture or the physical car, for the same reason you don't have the ability to assume it is in JPEG format. You can hope that it is a picture (or hope that the representation will always be a picture), and you can make assertions based on the probability that your hope will hold true for at least as long as that assertion is used, but you cannot make guarantees on behalf of the naming authority. That is indirect identification of the picture, not direct identification, and is one of the more common ways that links are known to become semantically broken on the Web. I wrote about that way back in my MOMspider paper. That's an interesting question in itself: how does the conceptual work model describe the cause of link rot due to changes in format or content that were not anticipated by the link author? It would be better to know all of the URIs that are associated with the generation of a representation so that the client could then choose which semantic they wish to capture for future reference. That's what I was hoping RDF could do as metadata within (or linked from) the representation. > An example you give often is a robot. To an RDF system, a robot > which can be driven by the control panel at <robot.html> > can be formally referred to in just the same way > as [ :controlPanel <robot.html> ]. (That which has control panel > <robot.html>) > This works. It needs a time qualifier, but that's okay. How would you formally describe that POST on a given URI turns the robot to the left? Actually, I would identify the robot by <robot>, and its representation would include some type of control-form that would target another URI identifying the robot's control panel, which could be as complex or simple as desired by the interface (e.g., it could vary from relative thrust/direction controls to a simple query of what the next target should be). Either way, we can model the flow of information directly from client to robot without any mention of how the interface is implemented. And the Semantic Web can still use <robot> to unambiguously identify the robot, because the URI is an N:1 relationship with the robot, not with the robot's current representation. > Let me summarize > > - Web software needs to be able to express things about conceptual > works > They are a big part of the web system and of our society. > - When you identify a conceptual work, you can retreive representations > and you can indirectly identify abstract things. > - If you say that the URI identifies an abstract thing you cannot > refer tothe > conceptual work. The last is not true. If the URI identifies an abstract thing, then you cannot use the same URI to directly identify the conceptual work. You can, however, discover the relationship between the two and obtain a different URI for that conceptual work, if the provider of the conceptual work is willing to provide that separate URI. This is no different than content negotiation, and the Web does not force an information provider to supply the individual URIs for variants of a negotiated resource. The benefit of all this, BTW, is that we don't need to special-case the fragment identifier in RDF. All of that complexity is completely unnecessary if we don't assume anything about the target of an http identifier aside from it being accessible via an HTTP interface, which makes perfect sense given that we don't allow the client to assume anything else about an http identifier, regardless of which model is used. Likewise, http as a scheme for xmlns URIs becomes a dead issue. > I think that you will find that the REST model is not harmed in any way > by introducing an extra concept of the conceptual work betwen > "representations" in and what you used to call the resource. > I think you will find it has a nice consistency and solidity. I'm afraid not. You just claimed that telnet and mailto are not implementable as a conceptual work. I know they are under REST and on the Web itself. How can I not consider it harm to give that up? Furthermore, I can add any URI scheme to the Web and the REST model does not have to be changed to accommodate it, whereas placing scheme-specific semantics on the Web interface means that a new scheme cannot be used until all software is changed to embed the scheme-specific semantics into the client. In my case, all I have to do is provide HTTP proxies, at least until such time as the scheme becomes manifestly useful to implement directly by each client. The direct implementation may provide a richer interface, but an information-window interface like HTTP is sufficient to enable deployment prior to popularity. I also believe that the fragid indirection actively harms a namespace that is large or hierarchical in nature, and Mark has already given examples of how that harm manifests itself, so its not as if the conceptual work model doesn't have its own drawbacks. ....Roy
Received on Friday, 24 January 2003 22:27:49 UTC