notes on XML Schema and XInclude from C. M. Sperberg-McQueen on 2005-12-08 (www-xml-schema-comments@w3.org from October to December 2005)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: 08 Dec 2005 14:14:58 -0700
To: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
Cc: "Henry S. Thompson" <ht@cogsci.ed.ac.uk>, Michael Kay <mhk@mhk.me.uk>, Dave Remy <dremy@microsoft.com>, Ashok Malhotra <ashok.malhotra@oracle.com>
Message-Id: <1134076497.2767.7.camel@localhost>
A long time ago [1], I took an action to draft a short note about how
a schema processor could be made to handle attributes in the XML
namespace without being non-conforming.

The following text describes the problem as it was reported to us, and
the solutions we saw for it.  If the WG is interested, we can turn
this email into a WG Note and publish it formally; otherwise, it may
suffice to archive this email on a public mailing list so that people
can be referred to it for information on why the XML Schema WG does
not plan to make the attributes of the XML namespace magical.  For
that reason, I am sending this note to the XML Schema comments list.

Some time has passed since the original problem report, and I have
not heard any recent comments on the topic; I would be glad to know 
if the situation has changed since the problem was reported.  Do
the relevant frameworks now provide ways to work around the problem?
Or is this note still relevant?


1 The problem

Conforming XInclude processors insert xml:base attributes at the root
of included material; this causes the output to be labeled invalid if
it is then validated against a schema which did not provide for
xml:base attributes on those elements.

Under these circumstances, XInclude and XML Schema 1.0 are hard to use
together.  What can be done to ease the pain?


2 Background 

This issue was discussed by the XML Schema Working Group at some
length [1], after initial reports of the issue in [2] and [3] and
proposals by Henry Thompson [4] and Michael Kay [5].  (All of these
links are member-only material; their technical content is summarized
here for the benefit of those without access to W3C member-ony
material.)  

In [2], Dave Remy of Microsoft suggested that the XML namespace ought
to be treated the same way as the XSD 1.0 document-instance namespace
(which I'll just call XSI).  This would mean two things.  First of
all, attributes in the namespace would not need to be declared, but
would be allowed on any element at all. Second, they would be
validated, wherever they appear, using attribute declarations built in
to the schema processor.  In the case of XSI, these properties are a
consequence of clauses 3 and 4 of validation rule Element Locally
Valid (Element) and related material. Similar clauses could be
introduced for attributes in the XML namespace. 

The suggestion in [2] was motivated by the unexpected rejection by
schema validators of documents produced as by XInclude processing.
XInclude sets an xml:base attribute on the root of each inclusion, and
this attribute will be rejected as undeclared if the schema does not
declare it for the element in question and has no matching wildcard.

In [3], Ashok Malhotra of Oracle seconded Remy's suggestion and
pointed to technical discussion in [6] (later moved to [7]), in which
Daniel Cazzulino argues that unless something is done, XInclude and
XML Schema will effectively be unusable together; Cazzulino suggests
that the .NET XmlReaderSettings class be modified to allow a property
requesting that attributes in the XML namespace be ignored for
validation.  (Note that this is not quite treating the XML and XSI
namespaces in the same way, as suggested by Dave Remy.)

In [4], Henry Thompson outlines three possible approaches to this
question.  We could allow xml:* attributes anywhere by default, we
could maintain the status quo, or we could add a mechanism to make it
easier to declare elements which should be allowed on all elements.

In [5], Michael Kay suggested that the third of Thompson's three
approaches
should be followed:

    I tend to think that the right answer is some kind of "global
    attribute use" - a declaration that any element in a particular
    namespace may carry certain attributes, identified either
    specifically by name or generically by namespace.  The XML and XSI
    namespaces shouldn't be treated specially.)

When the XML Schema Working Group discussed the question, members
recalled that the XML spec explicitly suggests that in specific
applications, attributes in the XML namespace should be controlled by
declarations in a schema language: in the given context, the xml:lang
attribute (for example) might be restricted to a small number of
possible values, or have different default values.  This was too
powerful a tool to allow them to be comfortable with the idea of
treating the XML namespace as proposed in [2] or [3].  The proposal in
[5] was attractive but not useful in the context of XML Schema 1.0,
and difficult to reconcile with the compatibility policy in place for
XML Schema 1.1.


3 Accommodating XML-namespace attributes in a conforming processor

There are several mechanisms that can be used to allow the successful
validation of XInclude output using an XML Schema that has not been
tailored for this situation.  All can be implemented in conforming
processors, without any change to either specification.

(1) An infoset-to-infoset transformation process can strip out the
xml:base attributes and the [base uri] properties which depend on
them.

This could be implemented and invoked as a post-processor called by an
XInclude processor.  (The output would be indistiguishable from that
produced by an XInclude processor operating in a "no xml:base
attributes" mode, but because the output is not the output of the
XInclude process but of the post-processor, the XInclude processor
need not be non-conforming as a result.)

It could also be implemented and invoked as a pre-processor called by
an XSD 1.0 validator.  Schema validity assessment is a process which
accepts an information set as input; in the common case, the user
wished to validate the infoset produced by a conforming XML parser,
but this situation is not required for conformance.  The user can
desire that validation be applied to some other infoset (e.g. the same
infoset after removal of all xml:base attributes) without requiring a
non-conforming schema validator.

(2) An infoset-to-infoset transformation process can strip out the
xml:base attributes while leaving the [base uri] properties set as
they were in the input.

Like solution (1), this could be implemented either as a
post-processor for XInclude or as a pre-processor for an XML Schema
validator.

(3) A schema-construction option can be provided which augments each
complex type in the schema to allow an xml:base attribute (or,
optionally, which augments only those types associated with elements
which in fact have an xml:base attribute).

The XML Schema 1.0 spec is quite clear that schema components may be
constructed or acquired by a schema validator in any way the
implementors may think of and choose to implement.  Constructing the
components on the basis of schema documents is one very important way,
and it is called out with its own level of conformance.  But there is
no rule in XML Schema 1.0 which forbids a processor from offering a
different method of acquiring components.  In this case, the processor
would provide a method which involves (1) acquiring schema components
by reading schema documents or in some other way, and then (2)
augmenting the complex types which cannot accept an xml:base attribute
as needed.  Implementors will need to exercise care to ensure that
extension and restriction relations between complex types are not
rendered invalid by this augmentation.

Note that one problem identified in [6] and [7] is unaddressed by this
mechanism: if elements declared with simple types appear as the root
of included material and carry xml:base attributes, they will be
invalid.  The schema construction method could be extended to derive
complex types from those simple types and allow the xml:base
attribute, but this might confuse downstream applications which expect
the elements in question to be carrying xsd:integer or some other
simple type in their {type definition} property.

New mechanisms in XML Schema to make it easier to declare attributes
as legal on any element at all (as proposed in [5]) may be desirable
in the long run, but they are not essential in the short run.  Any of
these three approaches can be implemented by a conforming XML Schema
1.0 processor, and some of them also by a conforming XInclude
processor, or by an XML processing environment.



[1] http://www.w3.org/2005/04/15-xmlschema-minutes.html#item05
[2]
http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2005Jan/0024.html
[3]
http://lists.w3.org/Archives/Member/w3c-xml-schema-wg/2005Mar/0030.html
[4]
http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2005Jan/0026.html
[5]
http://lists.w3.org/Archives/Member/w3c-xml-schema-wg/2005Apr/0025.html
[6] http://weblogs.asp.net/cazzu/archive/2005/01/10/XsdAndXInclude.aspx
[7]
http://clariusconsulting.net/blogs/kzu/archive/2005/01/10/XsdAndXInclude.aspx
Received on Thursday, 8 December 2005 21:15:15 UTC