Re: productions [NS 1, 10, 11, 17] from David Brownell on 1998-08-17 (xml-names-issues@w3.org from July to September 1998)

From: David Brownell <db@Eng.Sun.COM>
Date: Mon, 17 Aug 1998 09:54:45 -0700
To: James Clark <jjc@jclark.com>
CC: David Brownell <db@argon.Eng.Sun.COM>, xml-names-issues@w3.org
Message-ID: <35D86055.EEDF1CE2@eng.sun.com>
James Clark wrote:
> 
> In my view, the right processing model for the namespace draft is that
> there's a two-stage process:
> 
> (a) first you parse an XML document exactly as per XML 1.0, with the
> exception that every Name is constrained to be a QName or a NCName
> according to where it occurs syntactically

Highly desirable, yes.  I'd prefer to see the changed XML productions
listed (with original rule numbers!) in the Namespace document.  A new
production for "QName" would seem necessary, but it seems simpler to
just modify "NameChar" and continue to use "Name" rather than add a new
"NCName" production.
 

> (b) next you take the logical element tree produced by the first stage
> in the standard XML 1.0 way and produce a new tree in which some
> attributes are removed and some element type and attribute names are
> expanded

Basically, yes -- through there's no requirement that a physical tree
be produced (event driven parsers, push or pull, should be fine).

By removing attributes I presume you mean "xmlns*" declarations.  A
standard would be desirable for how the names are "expanded".  On the
XML-dev list there were suggestions that the prefixes be _replaced_
with the URI, separated from the NCName by some non-URI character;
that approach seems practical if the right character can be found.
I'd suggest an ASCII character for simpler programming.

I don't really like the definitions of "expansion" in 6.3; there's no
value using XML to represent this info, and there's also no need for
the attribute to track the node at that level.  The valuable info is
those test cases (!) but it'd be better as a simple table showing the
type, "local" name, and associated namespace.

(In fact I find what section 6 says about "traditional" namespaces to be
blatantly incorrect.  Naming has a long and rich tradition, and one of
the fundamentals is that names are unique only within a context.  This
seems to ignore that fundamental definition.  Consider:  one can't know
where "Main Street" is without knowing at least the right city.)


> I think productions are appropriate for a specification of the first
> phase, but I think they lead people down the wrong track if used to
> specify the second phase (witness the confusion in xml-dev) because they
> suggest that it's an operation on the syntactic structure rather than
> the logical structure.
> 
> This also would lead to what seems to be the appropriate error handling
> strategy:
> 
> - Errors in stage (a) are syntax errors and should be treated exactly
> like XML 1.0 syntax errors

Or validation errors; in any case, "just like XML 1.0".


> - Errors in stage (b) are more akin to validity errors and shouldn't
> necessarily be fatal; for example, a non-external entity reading
> processor given a non-standalone document can't give a fatal error for a
> undeclared namespace because it might have been declared in the external
> subset

The "expanded" names may cause creation of WF errors though.  For
example, there's the requirement that only one instance of a given
scoped attribute name appear in an attribute list ... this corresponds
to the "Unique Att Spec" WF constraint in XML (section 3.1).

(Such WF errors might occur even in validated XML, if a given namespace
can be identified in more than one way -- since the validation will be
done against the prefixed strings.)


> > 3.  There's no text discussing the URI strings which are the values
> >     associated with the various "xmlns" attributes.  A complete URI
> >     syntax is necessary since these names are compared (as strings)
> >     to determine namespace equality.
> 
> The spec needs to say how they are compared (only for the purposes of
> 6.4 as far as I can see).  It could just say you compare the strings
> character for character.  The URN spec (RFC 2141) specifies lexical
> equivalence for URNs:

... except that the namespaces are URIs, not URNs, and RFC 1630 on URIs
talks about additional normalizations needed in some cases.  (I think
the assumption here is that these "xmlns*" attributes be CDATA that's
not normalized by whitespace cleanup.)


> >     Are fragment identifiers allowed, or disallowed as elsewhere
> >     in the XML specification?
> 
> Fragment identifiers are (as far as I remember) allowed.

Section 4.2.2 of the XML spec points out that they're not part of URIs,
and so "an XML processor may signal an error if a fragment identifier
is given as part of a system identifier".  In short, if any document
uses a fragment ID it'll be rejected by some conformant XML processors.

- Dave
Received on Monday, 17 August 1998 12:57:43 UTC