Namespace names and URIs

At 09:45 AM 5/15/00 -0400, Tim Berners-Lee wrote:
>This is a list set up - possible for a short term - to hold the discussion
>of whether XML namespaces should be URIs.   

I am very busy and may not be able to keep up with the xml-uri mailing list,
although I promise to check in as time allows.

I apologize in advance for the length of this message.  We're all busy but this is important.  But here's a summary:

1. URIs are sound in their design, just as as TimBL claims
2. Namespaces set out to solve the problem of naming things, no more, and
   they succeeded
3. It is reasonable to want more from namespace names, and for this 
   reason, the fact that they are syntactically URIs is good, as it leaves 
   open the door for the building of the Semantic Web
4. It is wrong to compromise the basic utility of namespaces by imposing
   strict URI-ness on them 
5. The use of relative URI references as namespace names is wrong and
   dangerous and should be, at the least, deprecated

I think TimBL's framing of the question, as quoted above, is very apt and 
cuts to the heart of things.  The current live issue in the W3C is much 
narrower - what to do about relative URI references - but probably can't be 
solved without some deep thinking about the relationship between namespace 
names and URIs.  Note that the issue, while narrow, is important, first 
because there are W3C recommendations in the field which are inconsistent 
on this point, and there are other recommendations, notably the DOM, which 
are hung up pending its resolution.

1. URIs Are Just Fine

To open, it should be said that nobody in this debate (as far as I can tell) 
has so far challenged the basic soundness of the URL system of resource 
addressing; for my money, it's one of the shining proofs of the virtues of 
razor-edge simplicity in the history of technology.  Further, and in 
particular, nobody has challenged the virtue or utility of relative URI 
references; anybody who does not use them is probably building fragile web 
sites. 

2. Namespaces Are Just Trying to be Names

If I may be pardoned for wordiness, let me quote the first three paragraphs 
of the namespace spec:

 We envision applications of Extensible Markup Language (XML) where a single 
 XML document may contain elements and attributes (here referred to as a 
 "markup vocabulary") that are defined for and used by multiple software 
 modules. One motivation for this is modularity; if such a markup vocabulary 
 exists which is well-understood and for which there is useful software 
 available, it is better to re-use this markup rather than re-invent it. 

 Such documents, containing multiple markup vocabularies, pose problems of 
 recognition and collision. Software modules need to be able to recognize 
 the tags and attributes which they are designed to process, even in the 
 face of "collisions" occurring when markup intended for some other software 
 package uses the same element type or attribute name. 

 These considerations require that document constructs should have universal 
 names, whose scope extends beyond their containing document. This 
 specification describes a mechanism, XML namespaces, which accomplishes 
 this.

The only problem the namespace spec set out to solve was that of naming.  
My assertion is simply a statement of verifiable historical fact.

Here is a test case that really crystallizes the problem, for me: suppose I 
have invented a handy new XML language, TML, for some purpose of my own that 
is not material here.  Suppose TML is to contain some structural elements 
that are document-centric - for example bulleted lists.  Suppose also that I 
must also embed some mathematical formulae.  Suppose finally that I want to 
include a few graphs.

Today, thanks to the good work of the W3C and the simple use of namespaces, 
this is pretty easy.  The HTML, MathML, and SVG vocabularies respectively 
have well-known namespace names, and there are good and free implementations 
of software that does useful work with all three vocabularies.  It is thus 
very easy for me to write code that dispatches to the appropriate software.

3. Should We Want More?

This is a huge step forward, and it works today.  Without namespaces it 
wouldn't work.  Is that enough?

Maybe not; the published namespaces for most XML dialects do not support 
direct retrieval of machine-usable semantics for these dialects.  Assuming 
such specifications exist, and we can all agree that their arrival is a 
worthwhile goal, making it easy to retrieve them would be a wonderful thing.  
For this reason, it is good, I think, that namespace names are URIs, rather 
than, say java package pathnames, because it leaves open the possibility of 
an automated, machine-readable and machine-usable Web; the Semantic Web.

I have occasionally griped that we should have used the Java package naming 
syntax, and it certainly would have avoided some of the pain we're now in, 
but I'm not really serious; I really do believe in a future Semantic Web, 
and URIs are the right way to stitch it together.  Via, I believe, some sort 
of packaging mechanism or other way to achieve the necessary and formalized 
levels of indirection.  [Claim: content-negotiation is not enough].

4. Keep Namespaces Working as Intended While Building the Semantic Web

But let us also not discard the great virtue of namespaces, the purpose they 
were designed to fulfill, that of names for vocabularies.

If we decree, now, that namespace names really are URLs, then I argue that 
the simple design goal of dispatching software to markup based on its 
universal name is grievously compromised.  Here's why:

One of the crucial (and I think good) aspects of the URL is its syntactic 
opacity.  Nothing very meaningful can be said about a resource, at any level, 
based on its URL, until you retrieve it.  This is not just a theological 
point, but a deep one that has been learned at great cost by anyone who has 
tried to implement a server, or a browser, or a spider, while ignoring it.

As we all know, the same URL can return different resources in successive 
microseconds; at the same time, there are arbitrarily many different URLs 
that can when dereferenced deliver the same resouce.

Given this, if a namespace name is really a URL in all its important 
respects, then the actual contents of the string aren't important at all; 
if I want to use it to dispatch to software in the intended way, I'd really 
have to dispatch based on the contents of the resource that is yielded by 
dereferencing it.

So for the time being, I think we have to, for the purposes of software 
dispatching, treat namespace names in the way the namespace spec specifies, 
namely as literal strings.  Any attempt to be smart about this leads down 
the slippery slope of having to dereference it and dispatching based on the 
contents.

This doesn't bother me; I think that the basic URI design is flexible enough 
that we can, for now, use URLs as names without closing off any significant 
doors for the development of the Semantic Web.

5. Relative URI References are Lousy Namespace Names

And finally, the pointy end of the question now jabbing the XML community in 
various tender and embarrassing places: what about relative URI references?  
If I may quote tediously again from the namespace recommendation:

 The namespace name, to serve its intended purpose, should have the 
 characteristics of uniqueness and persistence. 

Relative URI references have many virtues; but they do not include either 
uniqueness or persistence.  Working with them underlines, if it were needed, 
the point I made above: you really can't tell anything useful by examining a 
URI as a string; you have to go get the resource.

Thus it is my view a huge bug that that the Namespace recommendation doesn't 
forbid the use of relative URI references.  There are only two consistent 
ways to deal with this bug:

- try to kill it retroactively by deprecating the use of relative URIs
  as namespace names.  In this case "deprecating" covers a spectrum of
  tactics ranging from warnings at the weak end, through a commitment to
  avoid ever doing this in the W3C's work, to some attempt to rewrite 
  history and retroactively ban these things.
- say they're OK because namespace names really are URIs, and relative
  references are well-proven and known to be good practice.  The tactics
  here also occupy a spectrum, ranging at the weak end from canonicalizing 
  away such usages as foo/././././bar through expanding them by applying 
  the BASE uri (if you happen to know it) to requiring that the resource be 
  retrieved and the dispatching based on it rather than its identifier.  
  For my money only the last of these is consistent.

6. Conclusion

In re-examining TimBL's message to which this is a response, it seems that 
I've spent little time addressing his points.  That's because I disagree 
with so few.  Yes, URIs are a central component of the Web Architecture; 
there is no other reasonable way to contemplate pulling together the Web of 
tomorrow; and great caution is to be advised in their use.

TimBL and I are in substantial agreement that vocabularies need to be 
connected to the web, and the value of so doing will increase as we learn 
how to package up semantics in more and better declarative forms.  There is 
lots of room for disagreement over the relative value of content-negotation 
versus indirection via manifest, but that's just engineering tactics.

There's one key point of difference in play here; I think it's OK to, for 
the moment, use URIs just as names, in parallel with figuring out how to 
build the Semantic Web.  TimBL sees this as deeply broken.  

But in the here and now, those of us who build software for a living really 
do need cheap, lightweight ways to name markup vocabularies.  If we have to 
dereference them to use them, we can't use them.  Please don't take them 
away from us. -Tim

Received on Tuesday, 16 May 2000 09:39:51 UTC