RE: Possible schema validation issue in 3.0b3

This message is a question concerning the use of entities to emulate
namespace prefixes in a DTD - a technique that is being used by the
normative DTD from the April 7 XML Schema draft. I'm therefore intentionally
posting this to both xml-dev and xmlschema-dev, as it is relevant to both
discussion forums:

Recently one of our customers has reported an entity-resolution tech-support
issue in our "XML Spy" product ( http://www.xmlspy.com ), that has resulted
in a very interesting internal discussion regarding the addition of leading
and trailing spaces in the resulution of parameter entities (section 4.4.8
of the XML 1.0 specification - see http://www.w3.org/TR/REC-xml#as-PE ).

The problem is this - section 4.4.8 explicitely says:

    When a parameter-entity reference is recognized in the DTD and
    included, its replacement text is enlarged by the attachment of
    one leading and one following space (#x20) character; the intent
    is to constrain the replacement text of parameter entities to
    contain an integral number of grammatical tokens in the DTD.

This section has also never been corrected by any errata (to the best of our
knowledge) and the annotated XML specs also don't mention a word about it
other than pointing at the SGML history issues.

Now we've already seen many DTDs - and interestingly the normative XML
Schema DTD from the April 7 draft (see
http://www.w3.org/TR/xmlschema-1/#normative-schemaDTD ) is one of them -
that uses parameter entities to make DTDs pseudo-namespace-aware.

The trick most commonly used is to define a prefix and suffix entity that
can then be overridden in the internal subset of any document using this DTD
- and then use this prefix in defining any other element via another entity.
Here is an example from the normative XML Schema DTD:

    <!ENTITY % p ''>
    <!ENTITY % s ''> <!-- if %p is defined (e.g. as foo:) then you must
                          also define %s as the suffix for the appropriate
                          namespace declaration (e.g. :foo) -->
    <!ENTITY % nds 'xmlns%s;'>

    <!-- Define all the element names, with optional prefix -->
    <!ENTITY % schema "%p;schema">
    <!ENTITY % complexType "%p;complexType">
    <!ENTITY % element "%p;element">
    <!ENTITY % unique "%p;unique">
    ...

So by defining %p as 'xsd:' and %s as ':xsd' you can actually validate any
XML Schema that uses xmlns:xsd to refer to the XML Schema namespace using
this DTD, because this will result in all Schema elements being defined as
xsd:schema, xsd:complexType, xsd:element, etc.

Or so it seems. But this is where section 4.4.8 actually comes into play! If
the XML Schema DTD defines an entity %schema using

    <!ENTITY % schema "%p;schema">

and we assume %p has already been defined as 'xsd:' then section 4.4.8 tells
us that %schema will actually be defined as " xsd: schema", which is
certainly not a valid qualified name and the behavior intended by the
authors of the normative DTD.

So the real question is: is this use of pseudo-namespace prefixes in a DTD
really XML 1.0 compatible? And how should XML toolmakers interpret section
4.4.8 in the light of such use in new W3C drafts?

Sincerely,

Alexander Falk

P.S. Last time we had a tricky XML specification question, my colleague
wrote "please answer only, if you are absolutely sure" - and we received
only one answer from Tim Bray, which was right to the point. So I wonder, if
I shouldn't also be adding such a restriction this time ;)

... Icon Information-Systems 
... ALEXANDER FALK
... President, CEO
... http://www.icon-is.com/falk

=========================================================================
XML Spy 3.0  -  the first true Integrated Development Environment for XML
Visit http://www.xmlspy.com/ to download a free 30-day evaluation version
To get a demonstration, come see us at XML DevCon in New York, June 26+27
If you like our product, please vote for us at http://www.xmlspy.com/vote
=========================================================================

Received on Monday, 12 June 2000 14:10:46 UTC