This is very much a work-in-progress, something I would have blogged except I don't have a blog. Please bear this in mind when responding -- there's very little here, particularly in the more speculative sections towards the end, which I'm firmly convinced of. So feedback is very much in order.
TAG issues namespaceDocument-8 and abstractComponentRefs-37 were the topic of extended discussion at the last TAG f2f. There is considerable overlap between these two issues, and both are related to Dan Connolly's comment on the recently published Last Call Working Draft of XML Schema: Component Designators. Although a number of prior misunderstandings were identified and overcome in the discussion, more work is needed to make the background assumptions about what the problems are we're trying to solve and what the space of possible solutions is. This note is an attempt to begin that work.
The recent discussion about whether the xml:id spec. 'changes' the XML namespace by 'adding' a new name to it helped clarify that the minimalist reading of the XML Namespaces REC has achieved dominance in the intellectual marketplace. By "the minimalist reading" I mean I mean the reading on which an XML namespace is primarily a syntactic mechanism for distinguishing one class of uses of a particular simple name from all other uses thereof. This means a namespace is not a finite set of names, nor a more complex structured object as suggested by the (in)famous now-deleted non-normative Appendix A: The Internal Structure of XML Namespaces of version 1.0.
The minimalist reading is the only one consistent with actual usage --
people mint new namespaces by simply using them in an expanded name
or namespace declaration, without thereby incurring any obligation to define
the boundaries of some set. You could say that a namespace springs into life
the first time anyone uses a URI as a namespace name, but on balance I prefer
an understanding which doesn't reify a namespace as such at all. I don't
object to using phrases such as "[some name] in the [some URI] namespace", but
that's just another was of saying "the expanded name < some_URI,
some_name >
".
On this account it makes sense to ask questions about namespace names, e.g. "What
namespace name will XSLT 2.0 use?" and about expanded names, e.g. "Does XSLT
2.0 change the definition of the element named <
http://www.w3.org/Style/1998/Transform, output >
?", but
questions about namespaces as such are rarely if ever useful (unless of course
they're understood as questions about namespace names or about
some otherwise-defined set of expanded names with a namespace name in common).
Taking the argument one step further, it is a necessary consequence of the
position outlined above that it is incoherent to understand e.g.
"Such-and-such a type is defined in the XML Schema namespace" to mean that the
XML Schema namespace contains types (or type definitions). Considering things
carefully, we must understand this sentence as meaning that the XML Schema
language assigns the expanded name < http://www.w3.org/2001/XMLSchema,
such-and-such >
to some type definition. This perspective actually
works well with our overall understanding of XML Schema: a schema document
for a particular target namespace corresponds to a schema which assigns element declarations, type definitions, etc. to expanded names all
of whose namespace name is that target namespace.
So it's languages (or as we used to say, applications, in the SGML sense) which assign expanded names to things. That assignment may be unique and unequivocal, but evidently it is often one-to-many. And of course it's the language which determines what there is to be named, its own little (or large) ontology.
Many languages of course do provide only one thing to be named using a particular namespace name (e.g. XQuery Functions and Operators), and others, although naming more than one sort of thing, constrain their use of names to be unambiguous (e.g. SVG, RDF). In both these cases, just an expanded name is sufficient to identify something, and constructing a URI for something is therefore straightforward.
On the other hand there are many examples of languages where the mapping is one-to-many. The most immediate example is XML itself. The low-level syntax of XML distinguishs two sorts of things which are identified by expanded name: elements and attributes. Since there is no prohibition on using the same expanded name for both an element and an attribute, an expanded name is not sufficient to uniquely identify a named aspect of an XML document (or document type, in the ordinary language sense) -- you need to know what I've been calling the sort as well, i.e. element or attribute. For example, all of the following names:
abbr
cite
code
dir
label
link
object
span
style
title
can be used for either elements or attributes in XHTML 1.0 (transitional)
documents, and at least three of these (abbr
, cite
and title
) survive as ambiguous in XHTML Basic 1.0.
When we expand our scope to XML validation, we suddenly get a much more complex situation, in which there are in principle an unbounded number of things which share a name, only disambiguateable by context: we have element declarations (max. one per expanded name), and attribute declarations (max. as many as there are element declarations). For example, there are four distinct attributes definitions called align and five distinct attribute definitions called type in the XHTML transitional DTD. W3C XML Schema not only has a richer set of what it calls "symbol spaces", so that there are seven things whose definitions can be named (it adds types, attribute and element groups, notations and identity-constraints along side elements and attributes), it also allows elements as well as attributes to be defined in context.
Finally we should note that a language may encompass quite a
range of variation in terms of the things it assigns a particular expanded name
to. There can be variation over time, as new versions of a language are released,
and even alternative variants released at the same time. The HTML
P
element has a long and complex history, and even the XHTML
p
element has three distinct variants in version 1.0 (strict,
transitional and basic), none of which is exactly the same as the one in version 1.1.
None of this should come as a surprise. Ordinary language uses names in ways which are both ambiguous and context-determined, and whose use changes over time. But its consequence for the Web are more serious, particularly as we consider the use of names for things on the Web intended for automatic processing, where appeal to context for disambiguation may not be straighforward at all. At the very least it is clear that it is no longer trivial to specify an approach to constructing URIs for things which will cover all the cases just discussed.
Broadly speaking there are three ways one could respond to the situation outlined above:
It's important to note that there's an unspoken common assumption to all three of the above views: We're going to construct the URI for some named thing by adding some variety of fragment identifier to the namespace name of its expanded name. There is no space here for the possibility that two distinct languages might use the same expanded name for two evidently distinct things. This is intimately bound up with another assumption with respect to variation, namely that it's possibly to tell reliably when a change in something counts as a variation, as opposed to a fundamental change of identity. If I change the named definition of a type by nudging its min or max a bit, that pretty clearly just produces a variant of the same type. But if I change the definition assigned to a name from being an integer to being a date, it's equally pretty clear that that's no longer the same type at all. Those are the easy cases, there will be many which are much harder to call.
I expect that both of these assumptions will want to be recast as Good Practice notes going forward (e.g. "Don't use the same expanded name for two different things of the same sort in different languages under your control"; "As a language evolves, use new expanded names for new things, don't recycle old ones").
Without more detailed examination of real usage scenarios, it's hard to be sure of what general principles to establish here, but on the basis of my limited experience to date it seems likely that something along the following lines is a reasonable starting point.
It's up to the owner of a language, for each of the namespaces involved in that language, to provide a constructive definition of the way in which things which have expanded names can also be named with URIs. I've identified the following guidelines for such definitions:
The position that emerged at the end of the recent TAG f2f is consistent with the above guidelines, but obviously lacking in detail. On balance my prefered approach would look something like this:
URI names are provided for everything defined or declared by name at the top level which have some conceptual identity independent of the details of W3C XML Schema, i.e. elements, attributes and simple and complex types.
The URI name for something of one of the above four sorts is constructed by concatenating the namespace name of its expanded name, a/
if that does not already end with one, its sort (i.e. attribute, complexType, element or simpleType) a/#
and the local name of its expanded name.
URI names for languages which don't use namespaces are based on a URI designated for the purpose in the language specification, e.g. http://www.w3.org/2002/xmlspec/ for the W3C's 'specprod' language.
It would be the responsibility of language owners to provide retrievable representations of resources at each sort-determined sub-URI of the namespace URI to make this work (but see httpRange-14 below under Outstanding issues).
So for example the URI for the W3C XML Schema's own dateTime type would be
http://www.w3.org/2001/XMLSchema/simpleType/#dateTime
and perhaps, for the DAML+OIL example cited in Dan Connolly's feedback, we would get the following ('perhaps' because there's no namespace involved in the example as published):
http://www.w3.org/TR/2001/NOTE-daml+oil-walkthru-20011218/simpleType/#over12
(My inspiration for this approach is at least in part the IANA structuring of their registry of media types, which give us e.g.
http://www.iana.org/assignments/media-types/application/mathematica
for application/mathematica
(although irritatingly give us
nothing for e.g. text/html
).
This is by no means a fully-baked story. Some things I know are shaky are