xml-namespaces

The three of us, Paul Abrahams, Frank and Lisa Richards, are working on a
book on XML for Addison Wesley due out the end of the year.  We need to
include a technical discussion of namespaces in that book. We hope you will
accept
this letter, which we have written jointly, as one of the comments you've
requested from implementors.

We have found some very serious problems with the specification, problems
that make it difficult for us to develop anything like a coherent
explanation of it.  The fundamental difficulty is that the spec provides no
discussion at all of the question of how a qualified name is to be resolved
to an ELEMENT or ATTLIST declaration.  In fact, it appears resolutely to
avoid any mention at all of DTDs, for reasons that are unclear and certainly
unstated.

It appears that a document containing qualified names is unlikely to be
valid in terms of the unmodified 1.0 spec, so the namespace spec must
provide its own interpretation of any such document.  As a simple example,
consider an element that starts with

 <elt xmlns="file:///bar" xmlns:foo="file:///bar">
 
Within the content of this element, the references "gertie" and "foo:gertie"
refer to the same ELEMENT declaration, especially in view of the statement
in the namespace spec that a prefix functions only as a placeholder for a
namespace name.  However, XML does not allow a single element to have two
names, so a valid document cannot contain both
references.  Defaulting is not essential to this example; we could have
declared two explicit prefixes "foo" and "goo" and then asked the question
about "foo:gertie" and "goo:gertie".

Once you accept that a document written to conform to the namespace spec
must be interpreted according to its own rules and not those of the XML spec
itself, you gain a certain amount of freedom in designing the namespace
spec.  We believe you can take advantage of this freedom to solve some of
the problems with the spec.  A fundamental constraint
that keeps the effort honest is that a document containing no qualified
names or other namespace constructs must be interpreted the same way in
either case.

In practice, when a document makes use of multiple namespaces, each
namespace will have its own URI that specifies where its DTD is to be found.
The namespace spec now dances around that issue by saying that it isn't a
goal that the system literal in the namespace declaration be usable for
retrieving a schema but not saying anything at all about
what if any useful information is supposed to be found at the corresponding
URI.

So how, then, is the DTD for each namespace to be found?  The obvious, but
wrong, answer is that it's brought into the document's own DTD through a
parameter reference.  That's the wrong answer because all the names in that
imported DTD will remain unqualified and therefore not correspond to the
qualified names used in the document itself.

We believe that the correct approach would be to treat a namespace
declaration as importing the DTD found at the URI specified by the system
literal, with all element names in that DTD implicitly prefixed by the
prefix name given in the declaration.  That approach would give a simple
answer to the question we've posed of how an element name can be
connected to the corresponding ELEMENT declaration: you look in the DTD of
the element's namespace (after stripping the prefix, of course).

We also would like to reiterate Paul's suggestion to Tim Bray that an
element name whose prefix is empty, i.e., has the form ":foo", be taken to
refer to the document's own DTD, since that DTD is otherwise inaccessible
within the scope of a default declaration.  That requires a small and
straightforward change to production 5 of the syntax.

The ideas above, we believe, provide a complete, effective, and
straightforward treatment of element names.  Attribute names are a more
complicated issue.  None of us were able to understand the explanation of
them in the namespace spec despite our diligent efforts, and we're all
intelligent people who know this topic pretty well.

The trouble with attribute names is that an attribute name has no existence
or definition outside the element to which it is attached by an ATTLIST
declaration.  Even in plain XML there is no necessary connection whatsoever
between the properties of the X attribute attached to E1 elements and the X
attribute attached to E2 elements.  Whatever connection exists is a matter
of convention, albeit a convention that may very well be enforced by one or
more applications (such as
associating a particular attribute name with a color to be used in
rendering).

We propose that only unqualified attribute names be permitted and that an
attribute name be treated, without further ado, as being declared by the
ATTLIST declaration associated with the containing element.  Since the
containing element can be resolved to an ELEMENT declaration by the rules
we've already suggested, the problem of finding the applicable
declaration is solved.

Our reasoning is that qualification of attribute names isn't needed because
every attribute belongs to an element and the element's namespace already
provides the necessary partitioning.  It could be, of course, that within a
namespace the same attribute name is used for different purposes, or a
single name is used in many kinds of elements to signify the same thing.
But since that problem exists *within* a namespace, it does not make sense
to expect namespace qualification to solve it.  

One problem we have not solved (and the namespace spec certainly hasn't) is
this: what belongs in the DTD when element E1 refers to an element E2 in a
different namespace?  The DTD does have to be able to accommodate the
necessary information, so the namespace spec must deal with DTDs as well as
with elements and attributes.

It appears that the authors of the namespace spec had the proposed XML Data
spec in mind when writing the namespace spec, though XML Data is not
mentioned anywhere.  But even if XML Data is adopted (and it would
certainly imply a radical change to XML itself), there is nothing in the
current namespace spec that would answer the basic question we're asking:
how is the connection made between an element or attribute name and its
definition?  (We also feel that if there are any assumptions about XML Data
in the namespace spec, those assumptions should be made
explicit.)

One other side issue: since a document containing qualified names cannot be
interpreted in the same way as one without them, the version number for such
a document should be something other than "1.0".  Otherwise a
processor has no way of knowing the proper way to resolve references to
their corresponding declarations.

Paul Abrahams (author "Unix for the Impatient", Addison Wesley, past
president ACM)
Frank Richards (Reed Technology)
Lisa Richards (Reed Technology)

Received on Friday, 21 August 1998 17:41:34 UTC