- From: <Patrick.Stickler@nokia.com>
- Date: Tue, 26 Oct 2004 12:00:45 +0300
- To: <timbl@w3.org>
- Cc: <www-tag@w3.org>, <sandro@w3.org>, <Norman.Walsh@Sun.COM>
> -----Original Message----- > From: ext Tim Berners-Lee [mailto:timbl@w3.org] > Sent: 25 October, 2004 18:33 > To: Stickler Patrick (Nokia-TP-MSW/Tampere) > Cc: www-tag@w3.org; sandro@w3.org; Norman.Walsh@Sun.COM > Subject: Re: referendum on httpRange-14 (was RE: "information > resource") > > > > On Oct 20, 2004, at 7:42, <Patrick.Stickler@nokia.com> wrote: > > >> -----Original Message----- > >> From: ext Tim Berners-Lee [mailto:timbl@w3.org] > >> Sent: 20 October, 2004 04:19 > >> > >> On Oct 19, 2004, at 4:09, <Patrick.Stickler@nokia.com> wrote: > >>> [...] > >>> Also, using a particular URI to identify the *picture* of a dog > >>> does *not* preclude someone using some *other* URI to identify the > >>> *actual* dog and to publish various representations of > that dog via > >>> the URI of the actual dog itself; and someone bookmarking the > >>> URI of the *actual* dog should derive just as much benefit > >>> from someone bookmarking the URI of the *picture* of the dog, > >>> even if the representations published via either URI differ > >>> (as one would expect, since they identify different things). > >> > >> No, they would *not* gain as much benefit. > >> They would, under this different design, not have any > expectation of > >> the same information being conveyed to (b) as was conveyed to (a). > >> What would happen when (b) dereferences the bookmark? Who knows > >> what he will get? Something which is *about* the dog. Could be > >> anything. That way the web doesn't work. > > > > I strongly disagree. And your statements directly contradict AWWW. > > Precicsely. > The hypothesis you proposed ( using a particular URI to identify the > *picture* of a dog > does *not* preclude someone using some *other* URI to identify the > *actual* dog) led to the conclusion (that the representations would > not carry consistent content) you strongly disagree with. > The hypothesis fails. I honestly don't follow you here. You claimed, I believe, that if someone uses a URI to identify a dog, and someone else bookmarks a link based on that URI, that they cannot get consistent behavior from that bookmark -- that they won't ever know what they might get when they follow that link. It was that specific claim that I disagree with. If that's not what you were claiming, then please clarify. > > It is a best practice that there be some degree of consistency > > in the representations provided via a given URI. > > Absolutely. > > > That applies *both* when a URI identifies a picture of > > a dog *and* when a URI identifies the dog itself. > > > > *All* URIs which offer consistent, predictable > representations will be > > *equally* beneficial to users, no matter what they identify. > > Now here seems to be the crunch. > The web architecture relies, we agree I think, on this consistency > or predictability of representations of a given URI. I agree with some aspects of that statement, but not all. The *utility* of the web architecture *does* rely on consistency of representations. The web *architecture* itself does not *rely* on consistency of representations. Consistency/predictability of representations is a measure of the "quality" of the links, rather than an essential requirement of the web machinery. The HTTP protocol itself does not care one bit whether representations accessible via a given URI are perfectly static or chaotically random. Users care, but that doesn't mean consistency is a feature of the web architecture proper. That's why, presumably, consistency of representations is a best practice, rather than a functional requirement. Yet, being a best practice, that also means that no web client or user can presume that all links will exhibit consistent behavior and applications must take potential variability into account, and not take consistent behavior for granted. > The use of the URI in the web is precisely that it is associated > with that class of representations which could be returned for it. > > Because the "class of representations which could be returned" > is a rather clumsy notion, we define a conceptual thing > which is related to any valid representation associated with the URI, > and as the essential property of the class is a similarity in > information content, we call the thing an Information Resource. > > So a URI is a string whose sole use in the web architecture > is to denote that information resource. Again, your position is clear. Your model is coherent and logical. I've said that time and time again. There is no need to continue to explain your model. I get it. I think most everyone else also gets it. It is simply not agreed that it is the *best* model to apply for the future of the web and semantic web. The issue is whether the design choice to constrain representations to a particular class of resources is (a) necessary, (b) optimal, or (c) sufficiently clear and determinable. I believe the evidence strongly supports an answer of 'no' for all three points. It is not necessary to the functioning of the web that a representation be constrained to representing solely some body of information. This has been demonstrated. It is not optimal that a representation be constrained to representing solely some body of information. This has been demostrated. It is not clear from any given representation that either (a) the entire body of information of the resource is included in the representation or (b) that there is not additional information (e.g. links and other markup which conveys information) which are not part of some information resource, and hence, one cannot know from any representation what bits absolutely are part of that information resource versus part of the representation and which bits might be missing from the representation; which IMO nullifies any utility that might be had from any presumed architectural relationship between representations and information resources, since fidelity cannot be measured nor full and complete fidelity relied upon. Thus, the design constraint you advocate (however clear, coherent, and useful for your own particular mental processes) has not been demonstrated to be necessary or most optimal or even reliably determinable in the real context of deployed web applications. > Now if you say in the semantic web architecture that the same will > identify > a dog, you have a conflict. Sigh... only if you presume the restricted model... Your argument above appears to be: IF every http URI identifies an information resource AND you use an http URI to identify a non-information resource THEN you have a conflict Since this very debate is about the initial premise of that argument, the argument fails to actually address the issue at hand. IMO, the initial premise is false, therefore the argument fails. > >> The current web relies on people getting the same information from > >> reuse of the same URI. > > > > I agree. And there is a best practice to reinforce and promote this. > > > > And nothing pertaining to the practice that I and others employ, by > > using http: URIs to identify non-information resources, in any way > > conflicts with that. > > Well, it does if the semantic web can talk about the web, as the > semantic web > can't be ambiguous about what an identifier identifies in the way that > one can in english. Give me one concrete example of any problem introduced by the general, agnostic model. Just one. I've provided hard evidence from real world, deployed applications that the general, agnostic model works, and that the restricted model has severe scalability and efficiently problems. Claims without hard evidence to back them up do not help this debate, but merely waste time and energy. This issue needs to be decided on demonstrated benefit of one model over the other for real world web and semantic web applications. My evidence is on the table. I don't see any evidence of any sort either supporting the restricted model or reflecting any drawbacks to the general model. Until there is actual, concrete evidence either in support of the restricted model or showing real, practical drawbacks to the general model, I don't see any point in continuing this discussion. > I want my agent to be able to access a web page, and then use the URI > to refer to the information resource without having to go and > find some > RDF > somewhere to tell it whether in fact it would be mistaken. Sigh. Tim, there have been numerous examples presented during the course of this debate showing that the above goal cannot be achieved reliably, even with the restricted model, even if all http URIs are constrained to identify only information resources. I will offer at least one example here: Consider the following five distinct information resources: 1. A novel 2. A particular edition of a novel, with distinct wording discrepancies from other editions 3. A specific publication of a particular edition of the novel, containing the full textual content of the edition, along with an introduction specific to that particular publication and a glossary not part of the original novel 4. A eBook version of the above publication of that edition of the novel, modularized by inserting section divisions particular to that eBook publication and a table of contents with references to the individual sections, and with copyright statements included at the end of each section 5. The initial section of the above eBook publication containing only the front matter and table of contents with references to its subsequent sections, and with a copyright statement at the end *ALL* of the above resources are distinct, and could be identified by a distinct http URI according to the restricted model. Now, you encounter some URI and via that URI you obtain some representation in the form of an HTML document, and it appears that the representation concerns the novel in question. Yet which information resource exactly does that URI actually identify? They all are concerned with the novel in question. They all are information resources. OK, so you guess (applying full human cognitive abilities; let's forget about automated agents guessing about such things...) How can you know for sure, since you have no idea about the number of possible information resources that the recieved HTML document could be a representation of (you are not privy to that list of resources above or to every publisher's intentions, conceptions, or practices)? >From the URI alone you cannot know *which* actual resource the URI identifies such that you could make any *reliable* statements about it, e.g. when it was created, or who the creator was, or how many words it contains, etc. Even if you somehow guess correctly about which resource the URI identifies, how can you know which bits of information conveyed in the representation might be part of the substance of the information resource or simply part of the the representation alone? Since even if the URI denotes the novel alone, the representation of the novel might still contain a copyright notice which is certainly not part of the novel itself, etc.). Or how do you know which bits of the information resource might be missing? Since a representation of e.g. the novel provided via that URI may still be the modular first section of an eBook publication of that novel, and thus does not contain the entire substance of the novel! There is nowhere any requirement or expectation that there be a 1:1 correspondence of information between an information resource and any one of its possible representations. -- This example alone should be sufficient to illustrate that you cannot *reliably* conclude what resource a given URI identifies or anything about that resource based solely by the representations accessible via that URI. You can guess. You might guess right. But you cannot ever know for sure. The web architecture *cannot* provide that for you. I consider that to be as fundamental and reliable a fact about the nature of the web as there is. And it is that fact that makes the semantic web so important, as without the semantic web, we cannot be clear and sure about what any particular URI actually identifies and about the true nature of the resource identified. -- *** HOWEVER *** (and *please* give full consideration to the following) Even though you cannot know for sure what a given URI identifies, or know for certain about the true nature of the identified resource, you *can* at least hope (even expect) that the best practice of consistent representation has been followed, such that you can reliably link to that resource (whatever it is) and derive consistent benefit from the web behavior provided by that link due to the predictability/consistency of representations accessible via that link. You still cannot reliably make any conclusions about the identity or nature of the actual resource, but you can nevertheless derive real benefit from consistent web behavior afforded by its accessible representations. If you want to be more precise, and know exactly what the URI identifies and what the nature of that resource is, then you *have* to bring the semantic web machinery into play, and you *have* to have a way to ask precise questions about the resource, specifically in terms of the URI in question. -- The utility of a web link is dependent on the consistent representation of the resource accessible via the link URI, not to the identity or nature of the resource identified by that URI. The web layer does not (and should not) care about the identity or nature of the resources identified by particular URIs; only about consistency of representation/accessibility per those URIs. It is the semantic web layer that cares about the identity and nature of the resources identified by particular URIs. And the clean integration between the web and semantic web is that for any given URI, it is presumed at both layers that it identifies the same resource, so that the semantic web layer can describe those resources of which representations are provided at the web layer. The restricted model in fact blurs the distinction between the web and semantic web layers, by making the web care about a particular class of resource yet not others; in fact, discriminating against all other classes of resource by relegating them to the more complex, expensive mechanisms necessary for indirect access. > I want to be able to model lots and lots of uses of URIs in existing > technology in RDF. This means importing them wholesale, > it needs the ability to use a URI as a URI for the web page without > asking > anyone else. I expect then that you will be disappointed, and your applications untrustworthy (unless you restrict your applications to data and tools that you have absolute and complete control over). Regards, Patrick
Received on Tuesday, 26 October 2004 09:01:55 UTC