- From: Tim Bray <tbray@textuality.com>
- Date: Tue, 16 May 2000 01:57:40 -0400 (EDT)
- To: <xml-uri@w3.org>
- Cc: <xml-dev@xml.org>
At 09:45 AM 5/15/00 -0400, Tim Berners-Lee wrote: >This is a list set up - possible for a short term - to hold the discussion >of whether XML namespaces should be URIs. I am very busy and may not be able to keep up with the xml-uri mailing list, although I promise to check in as time allows. I apologize in advance for the length of this message. We're all busy but this is important. But here's a summary: 1. URIs are sound in their design, just as as TimBL claims 2. Namespaces set out to solve the problem of naming things, no more, and they succeeded 3. It is reasonable to want more from namespace names, and for this reason, the fact that they are syntactically URIs is good, as it leaves open the door for the building of the Semantic Web 4. It is wrong to compromise the basic utility of namespaces by imposing strict URI-ness on them 5. The use of relative URI references as namespace names is wrong and dangerous and should be, at the least, deprecated I think TimBL's framing of the question, as quoted above, is very apt and cuts to the heart of things. The current live issue in the W3C is much narrower - what to do about relative URI references - but probably can't be solved without some deep thinking about the relationship between namespace names and URIs. Note that the issue, while narrow, is important, first because there are W3C recommendations in the field which are inconsistent on this point, and there are other recommendations, notably the DOM, which are hung up pending its resolution. 1. URIs Are Just Fine To open, it should be said that nobody in this debate (as far as I can tell) has so far challenged the basic soundness of the URL system of resource addressing; for my money, it's one of the shining proofs of the virtues of razor-edge simplicity in the history of technology. Further, and in particular, nobody has challenged the virtue or utility of relative URI references; anybody who does not use them is probably building fragile web sites. 2. Namespaces Are Just Trying to be Names If I may be pardoned for wordiness, let me quote the first three paragraphs of the namespace spec: We envision applications of Extensible Markup Language (XML) where a single XML document may contain elements and attributes (here referred to as a "markup vocabulary") that are defined for and used by multiple software modules. One motivation for this is modularity; if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use this markup rather than re-invent it. Such documents, containing multiple markup vocabularies, pose problems of recognition and collision. Software modules need to be able to recognize the tags and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element type or attribute name. These considerations require that document constructs should have universal names, whose scope extends beyond their containing document. This specification describes a mechanism, XML namespaces, which accomplishes this. The only problem the namespace spec set out to solve was that of naming. My assertion is simply a statement of verifiable historical fact. Here is a test case that really crystallizes the problem, for me: suppose I have invented a handy new XML language, TML, for some purpose of my own that is not material here. Suppose TML is to contain some structural elements that are document-centric - for example bulleted lists. Suppose also that I must also embed some mathematical formulae. Suppose finally that I want to include a few graphs. Today, thanks to the good work of the W3C and the simple use of namespaces, this is pretty easy. The HTML, MathML, and SVG vocabularies respectively have well-known namespace names, and there are good and free implementations of software that does useful work with all three vocabularies. It is thus very easy for me to write code that dispatches to the appropriate software. 3. Should We Want More? This is a huge step forward, and it works today. Without namespaces it wouldn't work. Is that enough? Maybe not; the published namespaces for most XML dialects do not support direct retrieval of machine-usable semantics for these dialects. Assuming such specifications exist, and we can all agree that their arrival is a worthwhile goal, making it easy to retrieve them would be a wonderful thing. For this reason, it is good, I think, that namespace names are URIs, rather than, say java package pathnames, because it leaves open the possibility of an automated, machine-readable and machine-usable Web; the Semantic Web. I have occasionally griped that we should have used the Java package naming syntax, and it certainly would have avoided some of the pain we're now in, but I'm not really serious; I really do believe in a future Semantic Web, and URIs are the right way to stitch it together. Via, I believe, some sort of packaging mechanism or other way to achieve the necessary and formalized levels of indirection. [Claim: content-negotiation is not enough]. 4. Keep Namespaces Working as Intended While Building the Semantic Web But let us also not discard the great virtue of namespaces, the purpose they were designed to fulfill, that of names for vocabularies. If we decree, now, that namespace names really are URLs, then I argue that the simple design goal of dispatching software to markup based on its universal name is grievously compromised. Here's why: One of the crucial (and I think good) aspects of the URL is its syntactic opacity. Nothing very meaningful can be said about a resource, at any level, based on its URL, until you retrieve it. This is not just a theological point, but a deep one that has been learned at great cost by anyone who has tried to implement a server, or a browser, or a spider, while ignoring it. As we all know, the same URL can return different resources in successive microseconds; at the same time, there are arbitrarily many different URLs that can when dereferenced deliver the same resouce. Given this, if a namespace name is really a URL in all its important respects, then the actual contents of the string aren't important at all; if I want to use it to dispatch to software in the intended way, I'd really have to dispatch based on the contents of the resource that is yielded by dereferencing it. So for the time being, I think we have to, for the purposes of software dispatching, treat namespace names in the way the namespace spec specifies, namely as literal strings. Any attempt to be smart about this leads down the slippery slope of having to dereference it and dispatching based on the contents. This doesn't bother me; I think that the basic URI design is flexible enough that we can, for now, use URLs as names without closing off any significant doors for the development of the Semantic Web. 5. Relative URI References are Lousy Namespace Names And finally, the pointy end of the question now jabbing the XML community in various tender and embarrassing places: what about relative URI references? If I may quote tediously again from the namespace recommendation: The namespace name, to serve its intended purpose, should have the characteristics of uniqueness and persistence. Relative URI references have many virtues; but they do not include either uniqueness or persistence. Working with them underlines, if it were needed, the point I made above: you really can't tell anything useful by examining a URI as a string; you have to go get the resource. Thus it is my view a huge bug that that the Namespace recommendation doesn't forbid the use of relative URI references. There are only two consistent ways to deal with this bug: - try to kill it retroactively by deprecating the use of relative URIs as namespace names. In this case "deprecating" covers a spectrum of tactics ranging from warnings at the weak end, through a commitment to avoid ever doing this in the W3C's work, to some attempt to rewrite history and retroactively ban these things. - say they're OK because namespace names really are URIs, and relative references are well-proven and known to be good practice. The tactics here also occupy a spectrum, ranging at the weak end from canonicalizing away such usages as foo/././././bar through expanding them by applying the BASE uri (if you happen to know it) to requiring that the resource be retrieved and the dispatching based on it rather than its identifier. For my money only the last of these is consistent. 6. Conclusion In re-examining TimBL's message to which this is a response, it seems that I've spent little time addressing his points. That's because I disagree with so few. Yes, URIs are a central component of the Web Architecture; there is no other reasonable way to contemplate pulling together the Web of tomorrow; and great caution is to be advised in their use. TimBL and I are in substantial agreement that vocabularies need to be connected to the web, and the value of so doing will increase as we learn how to package up semantics in more and better declarative forms. There is lots of room for disagreement over the relative value of content-negotation versus indirection via manifest, but that's just engineering tactics. There's one key point of difference in play here; I think it's OK to, for the moment, use URIs just as names, in parallel with figuring out how to build the Semantic Web. TimBL sees this as deeply broken. But in the here and now, those of us who build software for a living really do need cheap, lightweight ways to name markup vocabularies. If we have to dereference them to use them, we can't use them. Please don't take them away from us. -Tim
Received on Tuesday, 16 May 2000 09:39:51 UTC