This document provides a technical review of the XML Information Set in the W3C Working Draft of 20 December 1999 (which for brevity we refer to as "the Infoset draft" or "the Infoset spec"). This review was prepared by a Task Force of the W3C XML Schema Working Group, and has been approved by the XML Schema WG as an accurate representation of the WG's collective judgement on technical issues relating to XML Schema. This document does not address technical issues not directly related to XML Schema.
The XML Schema WG believes that:
The review follows the structure of the Infoset draft.
In paragraph 5, the use of the term maximal information
set appears to imply that the set of information items and
properties so designated is in some sense maximal, and that a
processor which claims to conform to this spec must not supply
information to downstream applications if that information is not
defined as part of the maximal information set. This implication
appears to be unintended, since it contradicts the section on
conformance, which says explicitly "XML Processors may optionally
provide additional information not found in the XML Information Set
..."); to avoid confusing the reader, the paragraph needs to be
revised, perhaps by changing the initial clause to read "For any given
XML document, there are a number of
corresponding information sets are defined by this
specification: ...".
Since in practice specialized processors of various kinds (XML Schema processors or link processors, to name two) will need to provide information not included in the maximal information set defined here, it is highly desirable that the Infoset spec make clear that what it defines is expected to be simply one body of information (or set of information items) among several, and that other specifications may and should describe explicitly the body of information specialized processors of particular types must supply to their own downstream applications. This is not now sufficiently explicit in the text of the Infoset draft.
Some members of the WG feel that it would be highly desirable if the Infoset spec described briefly but explicitly how other specifications (such as XML Schema) should go about defining the body of information their processors are expected to supply. Such an explicit meta-description would allow the task of elaborating the info set for various specialized purposes to be distributed, help avoid a continuous stream of requests that the infoset be extended to handle this or that specialized form of information, and encourage consistent terminology and practice among XML-related specifications.
For example, a first cut at such a metadescription might be:
A definition of a package of information items P should specify:
- a name for P
- the scope and expected use of P
- a set of information items belonging to P; these may be defined from scratch, or may be information items already defined in other packages (Q, R, ...)
For each information item defined from scratch, the definition of P should specify:
- a name for the information item (e.g. "The Document information item")
- a set of properties associated with the information item
- whether the information item is a core information item, or a peripheral information item
For each information item already defined in another package (e.g. Q), the definition of P should specify:
- the existing package Q in which the information item is defined
- the name of the information item
- a set of properties to be added, in P, to those defined in Q
- whether the item is core or peripheral in P
- whether the properties defined in Q are core or peripheral properties
For each property not already defined in another package, the definition of P should specify:
- the name of the property
- a description, in natural-language prose, of the property
- whether the property is core or peripheral (may vary from item to item, if the same property is associated with multiple information items)
- whether the property's value is a singular value, a sequence of values, a set of values, or a bag (family) of values
N.B. This is sample text only. Not all members of the XML Schema WG are happy with the use of the term package in this sample text. The term is intended to mean simply "an identifiable set of information items defined by a particular specification"; the use of a special term is not intended to suggest that individual specifications will or should define more than one such package each.
A peripheral property named [parent] should be defined for elements; its value should be a reference to the element's parent element. The prose should make clear that this property is redundant, since it may be constructed by inverting the relation between an element and the elements among its [children]. We understand that many would prefer that the information set be defined so as to avoid gratuitous redundancies. On the other hand, it is obvious that the information set as defined is not intended as a formally minimal specification; it does include redundant information. In this case, the benefit of having a standard term for the property [parent] seems to us to outweigh the drawback of introducing further redundancy. In the absence of such a property, XML Schema will have to define such a property, if only to simplify the exposition of various constraints on elements in XML schemas.
The definition of property attributes appears to some members of the WG to contradict both the XML Recommendation and the "Namespaces in XML" Recommendation, by stipulating that namespace declarations are not represented as attribute information items. Namespace declarations are clearly attribute value specifications within the meaning of the XML 1.0 Recommendation (production 41), and the Namespaces Recommendation explicitly states that they are in fact attributes.
Since there is clearly disagreement on this question among participants in the XML Activity, both within working groups and among working groups, the XML Schema WG believes the question should be addressed and resolved by the XML Plenary and/or Coordination Group. The topic is on the agenda of the XML Plenary meeting scheduled for 2 February 2000 in Berkeley
Para 1,
There is one attribute information item for each attribute (specified or defaulted) for each element in the document instance. Namespace declarations are represented using namespace declaration information items, not attribute information items.
appears self-contradictory to some members of the XML Schema WG. Unless this spec provides an alternative definition for the term attribute, most readers will and should assume that the definition in the XML 1.0 Recommendation applies. But according to that definition (and according to the Namespaces Recommendation), namespace declarations are attributes. Other members of the WG regard the distinction between attributes and namespace declarations as correct. As noted above, we believe this should be dealt with as a coordination issue.
A (redundant) peripheral property should be defined for attributes, the value of which is a reference to the element within whose [attributes] property the attributes are referenced; the rationale is the same as for the redundant [parent] property we propose for elements: the XML Schema spec, and other specs, will need to refer to this property regularly, and the Infoset spec is the most convenient place to provide a standard term for the relationship. The XML Schema WG takes no position on whether the property should be called [parent] or be given some other name.
It will be a common operation in schema processors to compare the namespace URI associated with various schema constructs, including not only elements and attributes, but also namespace declarations themselves. Both XPath and XML Schema treat certain attribute values as qualified names, and thus need access to the namespace URI from namespace declarations in a form directly comparable to that provided for qualified names of elements and attributes. It is therefore essential that the namespace URI be provided as a core information item, to support this common operation. We believe therefore that the property [namespace URI] should be identified as a core property on the Namespace Declaration information item, as well as for element and attribute information items. The [children] property of namespace declarations can then be made peripheral.
The terminology used in defining the various properties named [namespace URI] for elements, attributes, and namespace declarations is neither internally consistent, nor consistent with the usage in the Namespaces Recommendation; this inconsistency is a real problem for XML Schema. The Hamespace Recommendation defines the terms local part (which is used consistently in the definitions of the properties called [local name]) and namespace name (which could be used in the definition of the properties named [namespace URI], but is not). Note that it has been suggested that the Namespaces Recommendation's own definition of the term namespace name is unclear and possibly incorrect; it is open to the Infoset specification to comment on this definition or suggest a correction, but the terminology used in the definition of the Infoset must connect somehow with that used in the Namespace Recommendation.