- From: Patrick Stickler <patrick.stickler@nokia.com>
- Date: Wed, 23 Jan 2002 10:26:47 +0200
- To: ext Mark Baker <distobj@acm.org>
Should this discussion be moved back on-list? It's very relevant, and I'm sure others would be interested. On 2002-01-23 0:44, "ext Mark Baker" <distobj@acm.org> wrote: >> Though I probably will need to provide a more lengthy explanation... > > Oh my, I was hoping for a couple of sound bites. Up front, I want to give warning and apology if any of my responses seem too curt, blunt, or generally insulting. In the interest of clarity and avoiding rambling on so as to say things in a politically correct or diplomatic fashion, I've chosen to just say what I think -- and all is said with all due respect and curtesy, honestly. > > But I see that you're using the classical view of URI space. I have stated repeatedly, and explicitly, that I subscribe to the classical view. > The W3C > has been fighting for a long time to change this view, and got at > least part of the way there with the "uri-clarification" note. The "clarification" does not clarify much of anything other than what the classicial and contemporary views are. It does not IMO mandate the contemporary view. It also lacks any clear discussion of what either view really means for software engineers building URI aware applications -- apart from a single mention of the word 'formal', the significance of which must be guessed at. Furthermore, I find that far more people that I encounter, particularly those working on semantic web applications, subscribe to the classical view, particularly software engineers building web applications that use URIs extensively, and who want/need a logical and formal taxonomy of URI classes. > I see it this way (which is also the way that Tim Berners-Lee, > Dan Connolly, and Roy Fielding see it - the three people most > responsible for the Web as we know it); With all due respect to all three individuals, and many others, I think that they are missing something fundamentally important. (Even the gods can be wrong now and then) My impression (which may be incorrect) is that they and others in their "camp" subscribe very strongly to a philosophy that is epitomized by Perl where "things are what you use them to be" which I consider to be hacking not engineering and completely unsuitable for many application areas (e.g. eCommerce, authority of knowledge, digital rights, etc.) even given the chaotic and dynamic nature of the Web. Don't get me wrong, I like Perl and use it alot, and there are many cases where hacking is appropriate -- but also many cases where it is not. I see the "everything is a URI and its meaning is how I use it" as just an extension of the Perl scalar datatype view, which again I find to be poor engineering. It's a useful notion for hacking and for one-off applications, but not as a principle of software architecture, particularly where data integrity is important and ambiguity is to be minimized (e.g. the Semantic Web). It appears that those who subscribe to the "contemporary" view also tend to hold this "things are what I use them for" view. And, BTW, I find the term "contemporary view" to be a *highly* politically loaded and offensive term equating to "if you don't agree with the modern, contemporary view, you're behind the times and your views are passe. I see the "contemporary view" as a transient fad that will pass, leaving the classical view to continue on its merry way towards a global semantic web. > Names are strings that identify something. "Mark Baker" is my > name. URIs are a subset of all possible names, with a specific > structure, e.g. "foo://bar-com/baz". Every URI is a name because > it identifies something, and I can associate meaning with it > independant of any further interpretation of that URI. A formal taxonomy of URI Classes with consistent semantics is an issue of scalability and economy. Sure, you can enumerate all knowledge about every individual URI, but if a large portion of the knowledge about many URIs intersects in a functionally significant way, it is good engineering to capture that intersection, and that's what URI schemes are for, and also what URI Classes are for. Otherwise, let's just use UUIDs which are globally and temporally unique, and then add whatever semantics we want about them; which seems to me to represent the distillation of the contemporary view. After all, now with the DDDS architecture, we can just create the DNS entries to map any arbitrary string to an IP address -- so why bother even with 'http:' or other URI schemes? Just create your names using UUIDs, describe them as you like, and for those that denote web resources, use DDDS to map them to some address. Eh? After all, a name is just an opaque identifier that "is" what you say it is. > For example, > I can tell you that "http://www.markbaker.ca/James/" identifies my > son. Nobody need ever invoke a GET on that URI in order to associate > that name with the meaning I gave it. And how then does a software application know that it denotes a non-digital resource and thus, a retrieval error is in fact "correct" and to be expected rather than an indication of some access problem?! I never argued that URLs couldn't be (mis)used to denote non-digital resources, only that a formal taxonomy of URI classes based on resolution criteria (direct, indirect, none) is extremely useful for applications -- particularly SW applications which are using URIs to infer things about the universe. Furthermore, are the proponents of the contemporary view completely *blind* to the confusion that exists in the larger masses of web users regarding 'http:' and other URLs that don't resolve because they don't denote digital resources (e.g. XML Namespaces, vocabulary terms, etc. etc.) as well as the confusion between vocabulary URLs and schema URLs and the total incompatability with such an approach for multiple schemas using the same vocabulary?! It appears so. > I can do this for more than just markbaker.ca URIs. You and I can > have a conversation about http://www.ibm.com without invoking GET. No. You can only have a conversation about the web resource accessible at http://www.ibm.com, which is neither the URI 'http://www.ibm.com' nor the company 'IBM Inc.'. You can achieve this fundamental distinction by using URI schemes that embody the key semantics: http://www.ibm.com = a web resource uri:http://www.ibm.com = the URI for a web resource auth://ibm.com = a (semantic) web authority/entity Now, and only now, can we actually discuss these three things in a clear and consistent manner. E.g. (apologies to IBM and the IETF for the use of their trademarks in the following examples, as well as to all persons actually named John Doe ;-) <rdf:Description rdf:about="http://www.ibm.com"> <dc:title>Welcome to IBM</dc:title> <dc:creator rdf:resource="auth://john.doe@ibm.com"/> <dc:publisher rdf:resource="auth://ibm.com"/> </rdf:Description> <rdf:Description rdf:about="auth://ibm.com"> <dc:title>International Business Machines Inc.</dc:title> </rdf:Description> <rdf:Description rdf:about="auth://john.doe@ibm.com"> <person:name>John Doe</person:name> <person:email rdf:resource="mailto:john.doe@ibm.com"/> </rdf:Description> <rdf:Description rdf:about="uri:http://www.ibm.com"> <rdf:type rdf:resource="http:"/> </rdf:Description> <rdf:Description rdf:about="http:"> <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-Taxonomy/URI/URL"/> </rdf:Description> <rdf:Description rdf:about="uri:auth://ibm.com"> <rdf:type rdf:resource="auth:"/> </rdf:Description> <rdf:Description rdf:about="auth:"> <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-TaxonomyURI/URP/URV"/> </rdf:Description> Otherwise, your knowledge would be highly ambiguous and for all practical purposes useless. E.g. <rdf:Description rdf:about="http://www.ibm.com"> <dc:title>Welcome to IBM</dc:title> <dc:creator rdf:resource="mailto:john.doe@ibm.com"/> <dc:publisher rdf:resource="http://www.ibm.com"/> <!-- Is the publisher its own publisher or just the publisher of the web page, and is John Doe the creator of the web page or of IBM or both? --> </rdf:Description> <rdf:Description rdf:about="http://www.ibm.com"> <dc:title>International Business Machines Inc.</dc:title> <!-- Is this the title of the web page or IBM, or both? Does the web page and/or IBM have two titles? --> </rdf:Description> <rdf:Description rdf:about="mailto:john.doe@ibm.com"> <person:name>John Doe</person:name> <person:email rdf:resource="mailto:john.doe@ibm.com"/> <!-- Is John Doe the name of the email address or a person, and does the email address have an email address that is itself? --> </rdf:Description> <rdf:Description rdf:about="http://www.ibm.com"> <rdf:type rdf:resource="http:"/> <!-- Is this the type of the web page or of the URI of the web page, or of IBM? --> </rdf:Description> <rdf:Description rdf:about="http:"> <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-Taxonomy/URI/URL"/> <!-- fortunately, this is unambiguous, at least in this example --> </rdf:Description> <rdf:Description rdf:about="auth://ibm.com"> <rdf:type rdf:resource="auth:"/> <!-- Again, is this the type of the URI or of IBM? --> </rdf:Description> <rdf:Description rdf:about="auth:"> <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-TaxonomyURI/URP/URV"/> <!-- also unambiguous, at least in this example --> </rdf:Description> I assume the numerous ambiguities and circularities in these second examples are clear, and also clearly demonstrate the critical need for distinctive URIs. Now, playing devil's advocate to my own arguments, I will concede that one could have different 'http:' URLs to capture the distinctions provided by my separate URI schemes, but there still remains the problem that in such a scenario URLs would be used for non-digital, non-accesible resources, and thus, the fair and resonable expectation by both a human and an application that a URL provides access to a web resource is violated. Again, how is an application (or person) supposed to know that a failure to resolve is intended/expected rather than due to some actual problem accessing a web resource? > Is "foo://www-markbaker-ca/James/" an URL or an URN? You don't know, You *could* know, if you said something like <rdf:Description rdf:about="foo:"> <rdfs:subClassOf rdf:resource="voc://ietf.org/URI-Taxonomy/URI/URN"/> </rdf:Description> Now, every SW agent can *know* that every instance of the 'foo:' URI scheme is in fact a URN, and by the defined qualities of a URN, it denotes a web resource which is accessible indirectly by that URI, and can then look for a definition within its operational context for how such URIs are to be resolved (which protocol or agency). Of course, since your son actually *isn't* a web resource, that resolution will fail (unless we move to the future or an alternate parallel dimension where you can beam folks on demand from wherever they are ;-) > but if I say *that* identifies my son, then that's the important > thing. It's one of the important things, but not every important thing. It is essential to keep in mind that the Semantic Web is *not* for humans! It is for stupid machines that can't think, and need explicit, well defined, formal symbol systems to do tricks with bits. *You* may understand that URI to denote your son. And some other human may be able to discern from its mnemonic characteristics that it likely denotes a human, but a computer just sees bits. > Now, if after I've asserted that, I define a mapping that says; > - replace "foo:" with "http:" > - do 's/-/./g' on the authority > > Is it an URN (using your definition of URN, not the contemporary > one) or an URL now? Obviously, since 'http:' is a URL scheme, you have now created a URL, but that is a *different* URI from the first. > Hopefully you see where I'm going with this. Actually, no, unless you are suggesting that "www-markbaker-ca/James/" is a globally unique identifier in its own right and that the URI scheme prefix simply has to do with the method of interpretation, such that 'http://www-markbaker-ca/James/' and 'foo://www-markbaker-ca/James/' denote the same thing but merely represent different methods of interaction/access/reference. I'm going to presume that that is not what you are meaning, as that is contrary to the very basis of URI uniqueness. > "Identifiers" are the > important thing. An identifier is a name or a locator in context. > In the context of resolving an identifer, it's always a locator. > In the context of talking about it, it's always a name. My above examples show that this view is a fallacy. We must be able to talk about the identifier, as well as what is identified, and a given identifier can only identify one thing in every context, not different things in different contexts. The presently widespread view that e.g. 'http://www.ibm.com' can denote both a web page and the company, or that 'mailto:john.doe@ibm.com' can denote both a person and an email address is just dead wrong, and unfortunately, it seems that this is a common view held by those who subscribe to the "contemporary view". As I've said before, it may very well be that the *Web* can limp along with the contemporary view, but the *Semantic Web* cannot. >> Does that help clarify my understanding of URL, URN, etc.? > > Yes, thanks. Very "traditional". 8-) The "founding fathers" got it right in the first place. The contemporary view is a false detour. We need to get back on the main road. Cheers, Patrick -- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com
Received on Wednesday, 23 January 2002 03:25:56 UTC