Re: Namespaces, the universe, and everything

On Fri, 20 Jun 1997, Sam Hunting wrote:

> Tim Bray writes:
> 
> > >Perhaps in a future revision process DTDs will transmogrify, and
> > >son-of-8879 will have a much more ambitious world-view as to what
> > >constitutes a markup declaration.  In the meantime, these guys are
> > >convinced that they need namespaces and they need them well-defined
> > >by Q4 '97, and we shouldn't tell them that they can't have them.
> 
> Well, then *this* is the fundamental requirement, then, isn't it?

It certainly appears so -- an essentially external process has deemed a
protean notion yclept "namespace" to be a Good Thing To Have On Hand ASAP. 
The issue, then, is the syntactic machinery to permit this. 

> I buy David Durand's plea for a conservative approch to the namespaces
> issue. 

I'll add my Me Too. Assuming that *some* syntactic device will have to be
invented, I would prefer that it (a) be least invasive in terms of the
appropriate changes in SGML-bis, (b) not be *limited* to 97q4 "namespaces"
in its overt syntactic function.

In fact, the syntactic requirements appear to be relatively orthogonal to
the whole business of why namespaces are so urgent. If it were just a
matter of associating a "wider context" to an element -- typically for
semantic purposes -- the AF solution via attributes *is* an answer. This
can't be the problem. 

Rather, the problem seems to be how to "uniquify" GIs in contexts where
name clashes can't be ruled out. Quite apart from the serious validation
issues raised (Paul Grosso's AAP example comes to mind), there's also the
chance that the app downstream from the parser, i.e. the "semantic
engine", will get confused too. Hey, waitaminnit.  Preventing that is
supposed to be a raison d'etre of SGML... So, what are we *really* talking
about here when we say that "name clashes can't be ruled out"? IOW, *why*
can't they ruled out? 

It appears that the Canonical Problem is not inclusion of data from
multiple domains. It's such inclusion *on an ad hoc basis*; this is what
forces the need for syntactic distinguishability in the instance. We're
talking about a syntactically explicit Cut'N'Paste mechanism. It may help,
then, to work backwards from a standard SGML answer to this, notations
and (external) entities, and cast this as what happens when you inline
such a notated entity. 

Clearly, the requirement is to preserve the notation information: the data
content of the entity will still need to be in a portable or transferable
form (on General Principles: some day we'll need the external reference
mechanism anyway, and then it will do no good if the actual data content
has to assume different syntactic forms depending on where it gets to be
plunked.) The argument against the AF answer is that the naming attribute
even when included isn't distinguishable in the instance: it requires
extra DTD machinery to work.

In my own long-winded way, I've arrived at the point where I believe the
three kinds of (nominally sans-DTD) proposals can be understood in terms
of their different approaches to distinguishability:

1. Lexical - add a character to the set of name characters and use it as a
name-compounder. HTML:A, TEILITE:NOTE, etc.

2. Syntactic - use a PI to stuff the disambiguating information.

3. Structural - use a newfangled marked section as an explicit scoping
device. 

I would argue against (1) on the grounds that (a) it is overkill when the
problem context is (or rates to be) essentially ad hoc in its incidence,
(b) unnecessarily verbose, if not goofy, when the content might need to be
reusable as an external entity (indeed, why couldn't it have started out
that way?), and (c) it doesn't scale to the situation where multiple
domains/namespaces/whathaveyous might need to be encoded (cf. two or more
AF attributes.) 

I have no strong argument against (2). It works like a DTD in absentia, in
that the information has to be separately parsed *and* buffered while the
data content is parsed. That is, it's not syntactically (more accurately,
lexically) explicit in its scope; we need a full parse even to start.

Nevertheless, my push-come-to-shove preference is for something like (3). 
(a) the scope of <![ .... ]]> is lexically distinct in a reasonably opaque
fashion (sort of like figuring out the extent of an IGNORE MS), (b) like a
PI, we keep the "notation" information separate from the actual data
content, and (c) also like a PI, it focuses on a general purpose syntactic
mechanism that can be specialized for the needs of namespaces. 

Currently, only status keywords are allowed between the DSOs. I would
propose a variant on Henry Thompson's proposal with syntax like this:

   <![ (name-group) [ ... ]]>

Assuming PEs aren't dead in the water, hiding the namegroup also becomes
possible (and in extreme cases, the PE might be redefinable to CDATA and
the buck of parsing the content passed to the application, which could
invoke another parser instance ... hey, ad hoc come, ad hoc go.) Indeed,
the entire marked section can also be stuffed into an entity declaration
if need be.

> Wouldn't it be possible to enable the ":" to be added to XML names, and
> then enable namespaces themselves at the "application specific
> instructions level"-equivalent in XML, which I would take to be a set of
> processing instructions in the Misc section of the Prolog? 

Name-munging certainly looks like an easy way out. But it smacks too much
of forcefitting a solution whose essential appeal derives from a different
paradigm (C++?) Sure, the programmers will grok it and love it. But it
messes with the content (the need to "resolve" GIs gives them a data
quality beyond their markup function) when what we need is just markup. 

> That way, the Q4 guys are happy, experimentation with namespaces can
> proceed apace, and any failures wouldn't bring XML or SGML down.

Perhaps, but any solution that won't scale and won't work with other
mechanisms such as declarations for external entities and notations rates
to be penny wise and pound foolish. IMHO, of course.


Arjun
 

Received on Friday, 20 June 1997 02:21:39 UTC