XML Schema WG Comments on XInclude from Mary Holstege on 2001-06-19 (www-xml-xinclude-comments@w3.org from June 2001)

From: Mary Holstege <holstege@mathling.com>
Date: Tue, 19 Jun 2001 16:59:47 -0700
To: www-xml-xinclude-comments@w3.org, w3c-xml-core-wg@w3.org
Message-Id: <15151.59251.704000.128392@gargle.gargle.HOWL>
XML Schema Working Group comments on XML Inclusions Last Call Working Draft

We believe that the XInclude specification defines a foundation
specification that has to be harmonized carefully with the other
foundation specifications. The following points outline our concerns
with the specification as it stands;

(1) While the Infoset specification countenances synthetic infosets
    that do not maintain the normal consistency relations of infosets
    created directly by parsing XML, we consider it a poor idea in
    general to take advantage of that laxity, particularly in the case
    of such a foundational specification.

    We believe the XInclude specification must be crystal clear which
    Infoset properties are adjusted, and how, and further that it
    should specify rules so that core invariants are maintained. Since
    a downstream application has no markers in an Infoset that the
    XInclude process has occurred, it is unacceptable to create an
    Infoset that cannot be processed in the normal way. The XInclude
    specification itself highlights several situtations that call out
    for special processing. We call on the specification not to
    satisfy itself  with highlighting the problems, but to solve
    them. Among these are:

    namespace handling
    base handling
    name collisions on notations and entities
    PSVI properties

	We find the statement that PSVI properties be carried across
	untouched particularly troubling: this decision makes it
	impossible to build reliable type-aware applications in an
	environment where XInclude processing may occur. 

   We respectfully dissentand request that:
    (a) the specification enumerate precisely which infoset properties are
        affected by the inclusion operation, and how they are affected;
    (b) the specification require that infoset consistency be preserved;
    (c) most particularly that PSVI properties be either carried across
        so as to maintain consistency or not be carried across at all.

(2) We are deeply concerned that the XInclude specification interferes
    with meaningful type-aware processing. Some of the arguments are
    similar to those of
    http://tigger.uic.edu/~cmsmcq/tech/xml/munging.html. By raising
    an infrastructural process to the same architectural level as an
    application process, an ambiguity arises. Since it is not possible 
    to know whether XInclude will be applied before or after validation 
    it becomes difficult to write schemas (and/or DTDs) that correctly 
    describe instances that use XInclude, requiring the schema to either 
    use an overabundance of disjunctions (xml:include | myElement) throughout
    the schema or "lie" at some point in the processing about the logical 
    structure of the instances. Ubiquitous disjunctions are non-trivial 
    to implement and may substantially harm the logical model of a schema.

    Some of our members have suggested that replacing the magic
    element with a magic type or a magic wildcard (any) would smooth
    the integration, but we have no consensus or concrete proposals at
    this time.

    In general, there are architectural questions raised by the
    ambiguities inherent in combining, for example, XInclude with
    type-aware XPath. We believe these questions must be carefully
    considered and resolved. We recognize that resolving these
    questions should not fall solely on the XInclude specification
    alone: they are larger questions.

    We hope to work with the Core WG to help resolve these important
    architectural questions, which be believe must be resolved, and look
    forward to the Processing Model Workshop as a forum for progress on these
    issues. 

(3) We consider it a mistake to erase all record that XInclude
    processing has occurred. This damages the usability of this
    specification for many applications, such as distributed editing,
    document packaging, and so on. Leaving a trace may well be part of a
    solution to (2) above. We do not find the fact that the current
    Infoset specification does not mandate properties recording a trace of
    external entities a reason for XInclude to do likewise for two reasons:
    (1) some feel that that decision for Infoset was not a wise one, and
    (2) XInclude processing, unlike external entity resolution, is not 
    guaranteed to occur before parsing and validation (and indeed that is 
    the point of using an XML syntax for inclusion!). The preponderance of
    the opinion in the Schemas WG was that this is a very important issue than
    must be addressed, although a minority felt it was less crucial.

(4) We wonder why the decision was made to specifically violate the
    RFCs for how fragment identifiers should be interpreted, in favour
    of a mandated interpretation. We do not consider it wise, in
    general, to run counter to the relevant IETF specifications. We do
    not see the rationale of forbidding, say, a schema-specific
    pointing syntax defined at the logical component model level being
    used with XInclude to compose schema documents. We raise this as a
    general architectural question and ask for clarification of the rationale.

(5) The included XML Schema fragment does not quite capture the
    expressed constraints. We suggest that the attribute 'parse'
    should be defined with use='default' and value='xml' and that the
    anyAttribute be defined with namespace='##other'. Also the DTD
    specifies that the include element must be empty while the schema
    specifies that the include element can have character information
    item children.

    We suggest the schema should be;

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:xi="http://www.w3.org/2001/XInclude"
           targetNamespace="http://www.w3.org/2001/XInclude">

  <xs:element name="include">
    <xs:complexType>
      <xs:attribute name="href" type="xs:anyURI" use="required" />
      <xs:attribute name="parse" use="optional" default="xml" >
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="xml"/>
            <xs:enumeration value="text"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:attribute>
      <xs:attribute name="encoding" use="optional" type="xs:string" />
      <xs:anyAttribute namespace="##other" />
    </xs:complexType>
  </xs:element>

</xs:schema>


(6) We are doubtful whether it is appropriate to mandate normalized
    characters in all circumstances. We reiterate our comments on the
    Character Model for the Web:

    "Early uniform normalization appears to have a laudable goal, but
    it is no clear that it is a reliable way, let alone the best way,
    to achieve that goal. It places a heavy burden on
    footprint-constrained software, and (as defined in this document)
    leaves downstream users more or less at the mercy of upstream
    software over which they have no control. We believe serious
    attention should be given to other normalization forms for Unicode
    (e.g. the decomposed normal form) and to other regimes for
    deciding who should normalize when."

    We raise this as a general important architectural question, and suggest
    that if the Character Model specification backs off from requiring
    early normalization, the XInclude specification do likewise.

Respectfully,

the XML Schema WG
Received on Tuesday, 19 June 2001 19:58:37 UTC