- From: <Patrick.Stickler@nokia.com>
- Date: Wed, 20 Oct 2004 14:42:04 +0300
- To: <timbl@w3.org>
- Cc: <www-tag@w3.org>, <sandro@w3.org>, <Norman.Walsh@Sun.COM>
***------------------------------------------------------------*** *** NOTE: The comments contained herein are my own, and do not *** *** (necessarily) reflect the official views of Nokia *** ***------------------------------------------------------------*** > -----Original Message----- > From: ext Tim Berners-Lee [mailto:timbl@w3.org] > Sent: 20 October, 2004 04:19 > To: Stickler Patrick (Nokia-TP-MSW/Tampere) > Cc: www-tag@w3.org; sandro@w3.org; Norman.Walsh@Sun.COM > Subject: Re: referendum on httpRange-14 (was RE: "information > resource") > > > > On Oct 19, 2004, at 4:09, <Patrick.Stickler@nokia.com> wrote: > > > > > > >> -----Original Message----- > >> From: www-tag-request@w3.org > >> [mailto:www-tag-request@w3.org]On Behalf Of > >> ext Tim Berners-Lee > >> Sent: 18 October, 2004 22:03 > >> To: Sandro Hawke > >> Cc: www-tag@w3.org; Norman Walsh > >> Subject: Re: referendum on httpRange-14 (was RE: "information > >> resource") > >> > >> > >> > >> The range of HTTP is not a question of belief, It is a question of > >> design. > >> The Web was designed such that the Universal Document Identifiers > >> identified documents. > >> This was refined to generalize the word "Document" to the > >> unfortunately > >> rather information-free "Resource". > >> The design is still the same. > >> The web works when person (a) publishes a picture of a dog, > >> person (b) > >> bookmarks it, mails the URI to person (c) assuming that > they will see > >> more or less the same picture, not the weight of the dog. > >> > >> That is why, while the dog is closely related to the picture, > >> it is not > >> what is identified, in the web architecture, by the URI. > >> > >> There is a reason. > >> > >> Tim > > > > Fine. And if the URI used to publish the *picture* of the dog > > identifies the *picture* of the dog, then one would presume to > > GET a representation of the *picture* of the dog. No argument > > there, obviously. > > > > Getting the weight of the dog via a URI identifying a picture of > > the dog would be unexpected (arguably incorrect) behavior per > > *either* view of this debate. So your example does not argue for > > or against either view. > > > > Also, using a particular URI to identify the *picture* of a dog > > does *not* preclude someone using some *other* URI to identify the > > *actual* dog and to publish various representations of that dog via > > the URI of the actual dog itself; and someone bookmarking the > > URI of the *actual* dog should derive just as much benefit > > from someone bookmarking the URI of the *picture* of the dog, > > even if the representations published via either URI differ > > (as one would expect, since they identify different things). > > No, they would *not* gain as much benefit. > They would, under this different design, not have any expectation of > the same information being conveyed to (b) as was conveyed to (a). > What would happen when (b) dereferences the bookmark? Who knows > what he will get? Something which is *about* the dog. Could be > anything. That way the web doesn't work. I strongly disagree. And your statements directly contradict AWWW. It is a best practice that there be some degree of consistency in the representations provided via a given URI. Per http://www.w3.org/2001/tag/2004/webarch-20041014/#URI-persistence [ Good practice: Consistent representation A URI owner SHOULD provide representations of the identified resource consistently and predictably. ] That applies *both* when a URI identifies a picture of a dog *and* when a URI identifies the dog itself. *All* URIs which offer consistent, predictable representations will be *equally* beneficial to users, no matter what they identify. > The current web relies on people getting the same information from > reuse of the same URI. I agree. And there is a best practice to reinforce and promote this. And nothing pertaining to the practice that I and others employ, by using http: URIs to identify non-information resources, in any way conflicts with that. > The system relies on the URI being associated with information of > consistent > content, not of consistent subject. I disagree. And I do not see how you have offered any arguments to substantiate this claim. In fact, this seems contradictory to even your own position, as illustrated below (see the example below of the speech versus the audio clip versus the transcript). > You can manke new URI schemes for arbitrary objects, but a very > convenient method is to use identifiers with a ??? Not sure where you were going there, but I hope it was not suggesting that either (a) use URIs other than http: URIs to identify non-information resources or (b) use URIrefs with fragids to identif non-information resources; both approaches having substantial practical drawbacks over using http: URIs to denote any arbitrary resource whatsoever. > > I think it is a major, significant, and beneficial breakthrough > > in the evolution of the web that the architecture *was* generalized > > to the more general class of resources -- so that users can > > name, talk about, and provide access to representations of, any > > thing whatsoever. > > 1. The URI itself was never constrained -- only HTTP URIs. Hmmm.... that's the very crux of this debate; i.e., whether http URIs were actually originally thus constrained, or whether any such constraint was clear from the specs, and whether it is even a reasonable constraint, or whether significant, non-harmful utility can be obtained by assuming no such constraint, is what this issue is all about. Even if such a constraint was presumed for the original, intended use of the http URI scheme, I've yet to see any substantial evidence that using http URIs to identify non-information resources, such that representations of those resources are directly accessible via the web, is harmful or in any way problemmatic to any existing web application. And the existing widespread practice of using http: URIs to identify non-information resources is evidence that either the specs were not clear regarding that constraint and/or the utility of abandoning any such constraint is sufficiently great. I think it is a bit of both. Yet even if the specs had been clearer, I think folks would still have used http: URIs to denote arbitrary resources, because the utility is so great. In either case, shall we chain ourselves to the historical record, or continue to move forward where there are proven benefits and (so far) no proven drawbacks? > 2. A great way is to write RDF files so you refer to a concept as > described in a document, a la foo#bar Great? Perhaps for tighly controlled, monolithic systems where one has total control over every aspect of every application. But, with all due respect, I've found that methodology to be highly constraining, difficult to use for modular knowledge management, and most significantly, found that it introduces nontrivial practical problems with regards to efficient access to representations of such secondary resources. (and I don't appear to be alone in that experience) Just because it might work great for you, does not mean it will work great for anyone else, much less a majority of users. (and to be fair, that argument applies equally well to my own experiences) > > To ask a pointed question, Tim, do you believe that the web cannot > > evolve beneficially in a direction beyond your original design? > > Of course I don't believe that. The web is a seething mass of > flexibility points, > designed to allow large chunks to be replaced. > > However, to extend it is one thing, And not even valid *extensions* such as URIQA (or WebDAV) are at all welcome, are they ;-) > but to "evolve" it in a way which > destroys the basic assumptions of the current web may make > nice working > prototypes, but is is really destructive. Firstly, we're not talking about prototypes here. Perhaps you missed all those places where I talked about "broadly deployed applications". Secondly, can you provide actual evidence that such a practice is destructive? (neither you nor anyone else has so far). Thirdly, the more general, agnostic model allowing any resource to be identified by an http: (or any) URI does not destroy [all of] the basic assumptions of the web, but only removes a single assumption, and one that is in any case of debatable clarity and utility. So please stop painting this general, agnostic model as somehow "deviant" or "subversive". It is not. In fact, for many, it reflects the natural state of the *presently* deployed web. It is not something that some of us wish the web to become. It is what the web already *is*, today, and we are very happy with it that way, and do not wish to see it forcefully reverted to a previous, and less useful, form. You, personally, may not be particularly happy with how this more general model is *already* part of the web today, but that does not change the fact that it is a *reality* of current web applications. As I've said before, this is really a closed issue that remains open simply because some folks refuse to accept what already *is* part of the presently deployed web. Unless you can point to explicit, unambiguous text in the specs which is clearly violated by particular practices, and/or can identify and demonstrate actual harm to existing or even currently envisioned web applications, then you cannot reasonably argue that the current state of the web, including applications employing the more general model, should be declared invalid or incorrect. > Here we are trying to get the semantic web, Please don't suggest that I am not, also, trying to get to the semantic web; much less imply that I am in any way ignorant of what it means to get to the semantic web. I am *deploying* the semantic web. I know alot better than many what it means to actually *deploy* the semantic web and produce successful, scalable, manageable, and affordable solutions based on semantic web technologies. Please do not presume to lecture me about the goals and visions of the semantic web as if I don't "get it". It is precisely because I *do* "get it" that I am concerned about these architectural issues and take the time to go round and round and round about issues that, on their *technical* merits, should have been resolved a long time ago. If every *toaster* is to eventually be on the semantic web, if my PDA is going to tell the car stereo in my rental car what my music preferences are, if my mobile phone is going to suggest a nearby restaurant that I'll probably like because it's lunchtime, taking my preferences and their menus into account, etc. etc. then the interchange of knowledge between semantic web agents *must* be efficient and scalable. Per your model it is *not*. And that has been *demonstrated*. Forced indirect access to representations via URIrefs with fragment identifiers is inefficient and non-scalable. Yes, it can work in some cases, for some applications, for some data. But it is *not* a scalable solution for the future of the web and semantic web. If we are to actually "get to the semantic web", we need to be able to have direct access to representations of any arbitrary resource, and that access must be as efficient as possible. > which really > cares about the > difference between a dog and a picture of a dog, to operate over > and also to model the HTTP web, which doesn't care about > dogs at all. The http://.../foo#bar design uses the same > flexibility > point > as the hypertext design uses: to take a language, and convert local > identifiers > in documents in that language into global identifiers using the > document URI and "#". I've never questioned the coherence of your model. I simply question its practicality, based on proven scalability problems. I also do not consider your more restrictive model any less compatable with the specs than the general model. There is ambiguity there. The choice must be based on which offers greater benefit. I think the greater benefit of the general model has been demonstrated. If I need to access information (a representation) of that dog using your approach, I am forced to always do so indirectly, via the identity of some other resource, rather than directly via the identity of the dog itself. As an engineer, and also from real-world experience deploying real semantic web applications, I find that unacceptable. Yes, the semantic web does indeed care about the difference between a dog and a picture of a dog, *BUT* the semantic web does not care one bit about the nature of the URIs used to identify those resources! The semantic web does not in any way *prefer* that URIs are only ever used to identify information resources whereas URIrefs with fragment identifiers are used to identify other kinds of resources. All that matters to the semantic web is that (a) distinct URIs are used to identify distinct resources and (b) knowledge about resources is available in some efficient and trustworthy manner. The web is certainly expected to be a primary means to to publish/access knowledge about resources, and fortunately, semantic web agents can presume that a given URI is taken to identify the same resource both in RDF statements and for accessing representations of that resource on the web. Thus, insofar as the semantic web is concerned, the hash vs. slash debate is mostly *irrelevant*, and is only relevant regarding the efficient interchange of knowledge via the web. The more general model is simply more efficient, flexible, and scalable than the more restricted model; both for publication/accessing as well as for modular knowledge management. I've outlined the problems with using URIrefs with fragment identifiers in numerous ways in numerous forums. I won't repeat them here. I continue, though, to wait for you or anyone to actually address those identified problems or to demonstrate either how the more general agnostic approach already in use is in any way worse than what you advocate, or to demonstrate how your approach is better. Your comments thus far on these issues seem to merely (a) restate your view (which I think we all understand) or (b) recast the examples used in discussions to reflect the presumptions of your view (e.g. someone says some URI identifies a dog, you say "well, if that URI identifies a picture of a dog", etc. and the discussion deteriorates from there). Neither form of response actually address the issues and challenges presented. > One can certainly design different protocols, in which the URIs > (without hashes) > denote arbitrary objects, and one fetches some sort of information > about them. > I know you have been designing such systems -- you described them in > the RDF face-face meeting in Boston. These are a different system: > similar to HTTP, but yo added more methods, and you don't > have URIs for > the > documents. You are blurring two issues. The use of http: URIs to identify any arbitrary resource is a distinct issue from the HTTP extensions offered by URIQA. Either provides benefit independent of the other, though together, they do indeed offer a tremendous amount of utility (IMO a potential, fundamental building block for the semantic web). But please leave URIQA out of this particular discussion. It is not a component of this particular issue (httpRange-14). > But it is a different design to the current web. No more so that WebDAV would constitute a different design to the current web. But again, let's leave URIQA out of this particular discussion. > You claim > utility > for it. Maybe it would be useful. But please don't call it HTTP. Firstly, while I may be more vocal about these issues than others, that does not mean that (a) I am the only one who holds these views, or that (b) I was the among the first to see the utility of using http: URIs to identify non-information resources. Secondly, this issue exists because the specs are not clear, and there are existing practices which reflect both views. If the specs were clear on this issue (httpRange-14) then it would not be a TAG issue that has remained unresolved for a very, very long time. The TAG would simply say "Spec X says Y, so don't do that" and that would be that. The TAG issue httpRange-14, and the related "hash vs. slash" debate, exists precisely *because* of the fact that I, and others, consider it acceptable (and highly beneficial) to use http: URIs to denote non-information resources, and also consider such practice a fully valid use of HTTP. Thus, your statement above is not addressing the issue, but merely dismissing it. > But I claim great benefit in designing the semantic web > cleanly on top > of the HTTP web so that the facilities of each support each other and > become one large consistent system. And I do not suggest doing anything differently. You are suggesting, though, that the more general model is not as clean a design for integrating the web and semantic web, yet you provide no evidence of why. And it is my view that the more general model actually provides the simpler, more balanced, cleaner integration because it allows both the web and the semantic web layers to maintain the same exact agnostic view about what URIs identify, i.e. to share the very same "range" of resources; thus, stated particular to HTTP: Simple, balanced (clean) integration per the general, fully agnostic model: * On the web, http: URIs can identify any resource. * On the semantic web, any URI can identify any resource. * A given http: URI identifies the same resource on both the web and semantic web. * The web provides for direct access to representations of any http: URI identified resource. * The semantic web provides for making statements about any URI identified resource. Range of Web: any resource Range of Semantic Web: any resource Versus the more complex, imbalanced (less clean) integration per the restrictive model: * On the web, http: URIs can only identify information resources. * On the semantic web, any URI can identify any resource. * A given URI identifies the same resource on both the web and semantic web. * The web provides direct access to representations of only http: URI identified information resources. * The semantic web provides for making statements about any URI identified resource. Range of Web: information resource Range of Semantic Web: any resource Now, which model really provides the cleaner, more balanced integration? > You ask what utility there is in this rule. > > There is great utility in the fact that any person, on seeing a web > page, > can use the URI instead of the content as a shorthand for > that content. Really? I think what they *see* is a presentation of a representation of whatever resource is identified by that URI, which may be a partial view of that resource. Just because all of the substance of an information resource *can* be trasferred in a message, does not mean that all of the substance of an information resource *must* be transferred in a message. (this is a point that AWWW could also explicitly make clear). Furthermore, it would be a mistake on the part of the user to try to equate the request URI with what might be successfully presented by their browser per a successful response. The URI may identify a speech, yet the browser may play an audio stream of the speech, because there is an MP3 representation and the user prefers MP3 over HTML and the user's browser supports MP3 audio streams, -- and the user may then mistake the URI as identifying the audio stream rather than the speech itself, and they email that URI to their friend and say "listen to this", but their friend's browser doesn't support MP3 audio streams, so they get a transcript instead, encoded in HTML and are subsequently confused about how their friend expects them to "listen" to a textual transcript. Yet the above scenario is entirely possible per your restricted model of what an http URI can identify, and shows that this is not a problem inherent in the more general model, but a more fundamental problem at the very foundation of the web and also reflected by the principle of URI opacity. Thus, here is yet another example of how one cannot ever presume anything about what resource is identified by a given URI or about the nature of a resource identified by a given URI solely based on any arbitrary representation(s). Your restricted model does *nothing* to avoid that scenario. It is a challenge that the semantic web must solve. And restricting http URIs to identifying only information resources will not help the semantic web one way or another to meet that challenge. At the end of the day, the creator/owner of *every* URI has to say what that URI identifies, and ideally, also tell us something about the nature of the resource in question. > This is so simple that people often haven't thought about it. > (And thinking about it leads to the aspects of version, language, and > content type.) > This is done in all the hypertext links and bookmarks and billions of > places where the web is used. Your proposed "evolution" would break > that. NO. It would not. And IMO the onus is on you to prove it. And BTW, if this evolution would break such things, then such things would *already* have broken, since the web has *already* evolved into the more general model. Web links refer to resources, not to representations (unless the resource referred to is, by coincidence, a representation). And links refering to non-information resources which consistently resolve to representations in a predictable manner *ALREADY* work *just* as well as any other link to any other type of resource. Here's one for you: consider the following link which refers to a property (which is presumed not to be an information resource): <a href="http://sw.nokia.com/FN-1/published">Publication Date</a> Go ahead. Follow the link. Email it to anyone. Have them resolve the link. Was your experience the same? Did you both get a consistent representation of that property? I bet you did. There, I've *proven* that the general approach does not break the web. (at least, I didn't notice the web crashing from way up here in Finland, perhaps I should give it a minute or two... nope, still nothing...) > I hope that this is now clear. Your position is clear. It has been reasonably clear to me for some time. Yet the *superiority* of your position has not been demonstrated. > > The core of your argument seems to be "Because the web was not > > originally designed to do that, it cannot and should not do that". > > No, it is that what you propose is inconsistent with the way the web > works now. Please demonstrate how it is inconsistent. E.g. Per your restricted model: * An http: URI always identifies an information resource * Links referring to resources identified by http: URIs provide access to representations of those resources. * Resolution of URIs to representations should be consistent. Successfully traversing the link <a href="http://example.com/foo">Blargh</a> results in being presented with a representation of the resource identified by the URI <http:/example.com/foo>. Per the general model: * An http: URI identifies any resource * Links referring to resources identified by http: URIs provide access to representations of those resources. * Resolution of URIs to representations should be consistent. Successfully traversing the link <a href="http://example.com/foo">Blargh</a> results in being presented with a representation of the resource identified by the URI <http:/example.com/foo>. Now, presuming that <http://example.com/foo> actually does identify an information resource, exactly *what* breaks given the more general, agnostic model? In fact, no matter what kind of resource it is, how does the general model change the way that users use web links to access information? It doesn't. We've already clarified that users should not conclude anything about what a given URI actually identifies based on accessible representations. What the web user is primarily (or only) concerned about is the consistency of the representations accessible via a given URI. They don't usually care at all what the URI actually identifies. It's the automated systems that really care, and it's the automated systems that will rely on the semantic web machinery to clarify what those URIs actually identify and what the nature of those identified resources actually is. The general model, and the integration of the semantic web with the web based on that general model, will have no significant impact on the way users use web links to access information. > > Yet actual practice and deployed solutions demonstrate that there > > is clear benefit to the more general model; and there does > > not appear to be any substantial evidence that applying that > > more generalized model is harmful or problemmatic to the actual > > real-world functioning of the web, or that the narrower, more > > restricted (original) model is clearly better. > > That is because you have not really looked at the implications of what > you are saying -- Thank you for paying me the complement of being short-sighted. Forgive me for not returning the complement. > you are assuming, I suspect, that web suers > will go on > using URIs as they do, and your software will use them differently, > and that the two won't bother each other. But I am aiming higher - > for one consistent design across WWW and SW. My aim, I assure you, is just as high (hmmm, perhaps even higher) than yours. Though I would hope and expect that final resolution of this issue would not lie in just my or your personal view. Nevertheless, I consider the more general agnostic model to provide a more consistent, seamless integration of the web and semantic web layers than your restricted model. Allowing for efficient web access (including *mobile* web access) to representations of any arbitrary resource, including their descriptions, and employing the semantic web machinery to talk about and reason about any arbitrary resource reflects not only the present, but also the future. Restricting direct access to representations to a particular subclass of resources will simply hobble the web and semantic web and exclude a significant amount of utility, much of which is already demonstrated. > > If you, or anyone, feels that there *is* evidence either showing > > how the more generalized view is harmful, or how the narrower > > (original) view is better, then I would love to see it. > > Maybe that explanation will help, maybe it won't I appreciate you making the effort, but it does not. Sorry. You have not presented any actual evidence that your restricted model is better than the more general model, or how the general model acually causes any real problems for either systems or users. > Best Wishes, Likewise, Patrick > Tim BL > > > Regards, > > > > Patrick > >
Received on Wednesday, 20 October 2004 11:43:47 UTC