XInclude, schema validity-assessment, xml:base and xml:lang

[Note that this email is Bcc'd to the member-only
 w3c-xml-schema-ig@w3.org list -- please reply to
 public-xml-core-wg@w3.org and repeat the Bcc to keep both WGs involved]

Position 1: xml:lang and xml:base should be understood as out-of-band
mechanisms for notating aspects of the infoset (the *[language]* and
*[base URI]* properties) which would otherwise be inexpressible.  As
such they are _not_ part of any specific XML application, and are
arguably in the same category as namespace declarations, that is,
using attribute syntax but not really attributes at all.  If so, it
was a mistake to treat them as attributes, and not as declarations, in
deciding how to treat them wrt XML Schema validity assessment.  We
should accordingly at least commit to removing them from the scope of
validity assessment in XML Schema 1.1, and possibly do so for XML
Schema 1.0 as well via an erratum.

I'm tempted to say that, contrary to earlier discussion, this should
be all or nothing.  That is, as proposed above, the justification for
this move is that these are _not_ really attributes.  If that's the
case, they _can't_ be visible to schema validity assessment.  Any
argument that this should just be a default, and allowing people to
take control of their assessment by explicit declarations, is
tantamount to claiming that they really _are_ attributes, and we're
just treating them specially because of the inconvenience of doing
otherwise, particularly wrt XInclude 'output'.  I find such an
argument much less compelling.

Position 2:  xml:base and xml:lang are attributes like any others.  To
make it easier to manage them, we should provide some mechanism
to make it easy to declare 'universal' attributes.

I have mixed feelings about this.  For one thing, we already have such
a mechanism.  If I define the following type:

 <xs:complexType name="nearlyAnyType">
  <xs:complexContent>
   <xs:restriction base="xs:anyType">
    <xs:sequence>
     <xs:any processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute ref="xml:base"/>
    <xs:attribute ref="xml:lang"/>
    <xs:anyAttribute processContents="lax"/>
   </restriction>
  </complexContent>
 </complexType>

and then use it as the base for all my other type definitions, I get
the desired effect.  It would be take only a modest amount of effort
to produce a stylesheet which would transform 99.99% of all conformant
schema documents to use this approach.

I suppose we could imagine defining xs:anyType in XML Schema 1.1 to
have those two attribute declarations -- that would be a 'no new
syntax' solution, and would be over-rideable.

Harder to see that as an erratum.

Aha, so _here's_ a lightweight interim solution.

Provide a schema document at http://www.w3.org/2001/XMLSchemaXBL.xsd
as follows:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.w3.org/2001/XMLSchema">

 <xs:import namespace="http://www.w3.org/XML/1998/namespace"/>

 <xs:redefine schemaLocation="http://www.w3.org/2001/XMLSchema.xsd">
  <xs:complexType name="anyType">
   <xs:complexContent>
    <xs:restriction base="xs:anyType">
     <xs:sequence>
      <xs:any processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
     </xs:sequence>
     <xs:attribute ref="xml:base"/>
     <xs:attribute ref="xml:lang"/>
     <xs:anyAttribute processContents="lax"/>
    </restriction>
   </complexContent>
  </complexType>
 </xs:redefine>

</xs:schema>

Then if you invoke your schema processor with that schema document
alongside (before, logically) your own, the right thing will happen!

Seems to me that really is a solution we could try to sell to Oracle
and Microsoft, perhaps via a Working Group Note . . .

Once I patched XSV to not ignore all schema documents for the
XMLSchema namespace, this worked.  I've actually put the above schema
document in the advertised place, so other implementors can see if it
works with their processors. . .

Position 3: XML Core and Schema WGs issue a joint WG Note defining a
sort of XInclude-on-steriods, which is an XML application which
does the following:

 1) Runs XInclude on its 'input';
 2) Remove xml:base from the resulting infoset;
 3) (Optional) Do schema-validity assessment on the resulting infoset
    with zero or more specified schema documents;
 4) (Optional) Absolutise any relative URIs wrt the appropriate [base
    URI] value either looking for EIIs or AIIs in the resulting
    (possible PSV)infoset which
      a) Match a specified XPath
     or
      b) (if (3) was done) match element(*,xs:anyURI) or attribute(*,xs:anyURI)
 5) Serialise the resulting infoset.

Note that (4) "does the right thing" because the XInclude REC requires
not only that xml:base be added, but also that the [base URI]
properties be updated.

ht
-- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]

Received on Monday, 18 April 2005 14:33:47 UTC