CR Feedback and Implementation

As part of our work on the implementation of the XML Schema Candidate
Recommendation (CR) in both the graphical schema editor of XML Spy 3.5 and
the schema-guided XML validation engine within the XML editor of XML Spy, we
have collected the following feedback within our corporation with regards to
both comments on the CR and the question of actual implementation feedback
requested during the CR phase, that I would like to publicly contribute here
as representative of our organization within the XML Schema WG:


1) the CR should perhaps expressedly inform the reader, that any schema
document that uses a default namespace (ie no prefix) to refer to
"http://www.w3.org/2000/10/XMLSchema" must have a targetNamespace -
otherwise any type="..." or ref="..." can not be correctly attributed to
either the built-in types of XML schema or the types that the user defines
in his/her schema


2) consider the question of non-deterministic content models with respect to
local element declarations, e.g. in the following schema:

	<?xml version="1.0" encoding="UTF-8"?>
	<!-- edited with XML Spy v3.5 NT beta 2 build Dec 11 2000
(http://www.xmlspy.com) by Alexander Falk (Altova, Inc.) -->
	<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
elementFormDefault="qualified">
		<xsd:element name="a">
			<xsd:complexType>
				<xsd:sequence>
					<xsd:element name="b" minOccurs="0"
maxOccurs="unbounded">
						<xsd:complexType>
							<xsd:sequence>
								<xsd:element
name="c" type="xsd:string"/>
								<xsd:element
name="d" type="xsd:string" minOccurs="0"/>
							</xsd:sequence>
						</xsd:complexType>
					</xsd:element>
					<xsd:element name="b" minOccurs="0"
maxOccurs="unbounded">
						<xsd:complexType>
							<xsd:sequence>
								<xsd:element
name="c" type="xsd:integer"/>
								<xsd:element
name="e" type="xsd:string" minOccurs="0"/>
							</xsd:sequence>
						</xsd:complexType>
					</xsd:element>
				</xsd:sequence>
			</xsd:complexType>
		</xsd:element>
	</xsd:schema>

which declares two local elements <b> with different content models, that
make the validation process entirely non-deterministic when it comes to the
following example XML instance document based on this schema:

	<?xml version="1.0" encoding="UTF-8"?>
	<!-- edited with XML Spy v3.5 NT beta 2 build Dec 11 2000
(http://www.xmlspy.com) by Alexander Falk (Altova, Inc.) -->
	<a xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="multilocaldiff.xsd">
		<b>
			<c>0</c>
		</b>
	</a>

Is the <b> element in this example the first or second local element b in
the above schema - or more specifically - should the contents of c be
interpreted as an xsd:string or xsd:integer? Only if the XML instance
document is actually extended a bit:

	<?xml version="1.0" encoding="UTF-8"?>
	<!-- edited with XML Spy v3.5 NT beta 2 build Dec 11 2000
(http://www.xmlspy.com) by Alexander Falk (Altova, Inc.) -->
	<a xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="multilocaldiff.xsd">
		<b>
			<c>0</c>
			<e/>
		</b>
	</a>

it would theoretically be possible to validate this document against the
schema, although, the fact that this kind of "grammar" defined by the schema
entirely violates any LL1 condition, allowing such a construct makes it very
hard to achieve compatibility between different vendors of schema-based
validation.

I would, therefore, recommend that the use of multiple locally declared
elements with the same name within one parent are either discouraged or
outright forbidden by the Schema specification.


3) the CR allows the chaining of substitutionGroups, which is illustrated in
the following skeleton example (complexTypes and extensions are omitted for
clarity):

	<?xml version="1.0" encoding="UTF-8"?>
	<!-- edited with XML Spy v3.5 NT beta 2 build Dec 11 2000
(http://www.xmlspy.com) by Alexander Falk (Altova, Inc.) -->
	<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
elementFormDefault="qualified">
		<xsd:element name="a"/>
		<xsd:element name="b" substitutionGroup="a"/>
		<xsd:element name="c" substitutionGroup="b"/>
	</xsd:schema>

which results in a schema, where a <b> or <c> element can be substituted for
any <a> and also a <c> can be substituted for any <b>. If, however, a schema
author accidentially modifies the definition of element <a> to read

		<xsd:element name="a" substitutionGroup="c"/>

then we have a cyclic link in the substitutionGroup chain, which should
perhaps be explicitely forbidden by the schema rec.


4) the CR asks for explicit feedback on xsi:null -- our take is that we
would prefer NULL values (e.g. when importing from a database-like system)
to be represented by elements omitted from the instance document (i.e. the
schema needs to have a minOccurs="0" for nullable elements), whereas emtpy
string contents should be specified by empty elements, thereby eliminating
the need for xsi:null


5) the CR also asks for explicit feedbck on xsi:type -- while we do already
support substitutionGroups in XML Spy, we have not yet implemented xsi:type,
because in contrast to a substitution group (which simply allows a global
element to be substituted for another global element) xsi:type requires the
validation engine to keep information about complexTypes in memory at all
times, which is both a performance and memory size issue. Our current
implementation of the validation engine discards all complexType and
simpleType information, once the element content model has been built in
memory, which allows for much more efficient processing. Furthermore, the
ability for the XML instance document author to directly access/change the
underlying type of an element introduces an entirely new level of complexity
or risk that is - in our opinion - unwarranted.

I do, therefore, suggest that xsi:type be dropped, because the goal of
xsi:type to support derived types in a schema can as easily be met by the
use of global elements and substitutionGroups, as is outlined in this
modifed IPO example excerpt from the "Part 0: Primer" chapter:

	<shipTo>
		<ipo:UK-Address export-code="1">
			<name>Helen Zoe</name>
			<street>47 Eden Street</street>
			<city>Cambridge</city>
			<postcode>12A QZ6</postcode>
		</ipo:UK-Address>
	</shipTo>
	<billTo>
		<ipo:US-Address>
			<name>Robert Smith</name>
			<street>8 Oak Avenue</street>
			<city>Old Town</city>
			<state>AK</state>
			<zip>95819</zip>
		</ipo:US-Address>
	</billTo>

which only requires adding of one element layer and modifying the schema so
that both <shipTo> and <billTo> include a global element <Address> which is
the head of a substitutionGroup that can accomodate different derived types:

	<element name="shipTo">
		<complexType>
			<sequence>
				<element ref="ipo:Address"/>
			</sequence>
		</complexType>
	</element>
	<element name="billTo">
		<complexType>
			<sequence>
				<element ref="ipo:Address"/>
			</sequence>
		</complexType>
	</element>
	...
	<element name="Address" type="ipo:Address"/>
	<element name="US-Address" type="ipo:US-Address"
substitutionGroup="ipo:Address"/>
	<element name="UK-Address" type="ipo:UK-Address"
substitutionGroup="ipo:Address"/>

The benefit of this solution is that it is more easily readable by both
human and machine and eliminates the risk of having the author of the XML
instance document deal with dynamically changing the type of an element at
all.


6) an important difference in interpretation/description has been noted
between "Part 0: Primer" in the second row of table
http://www.w3.org/TR/xmlschema-0/#cardinalityTable and "Part 1: Structures"
in the definition of value-constraints on attributes appearing as children
of <schema> (global attributes) at
http://www.w3.org/TR/2000/CR-xmlschema-1-20001024/#declare-attribute where
it clearly says:

	If there is no value [attribute], then absent, otherwise a pair
consisting of the normalized value (with respect to the {type definition})
of that [attribute] and fixed, if the normalized value of the use
[attribute] <http://www.w3.org/TR/xml-infoset> is fixed, otherwise default.

Contrast this with the second row in the primer table cited above which
states that an attribute with use="required" and value="37" is to be
interpreted as having a fixed value of 37, which contradicts the
specification of Part 1.

However, it should also be noted that in the definition of value-constraints
on attributes appearing as children of complexTypes or attributeGroups at
http://www.w3.org/TR/2000/CR-xmlschema-1-20001024/#declare-attribute (second
table of properties and presentation for the attribute component) it then
says:

	If there is no value [attribute], then absent, otherwise a pair
consisting of the normalized value (with respect to the {type definition})
of that [attribute] and default, if the normalized value of the use
[attribute] <http://www.w3.org/TR/xml-infoset> is default, otherwise fixed.

Which is in agreement with the second row of the table in Part 0 mentioned
above. The imporant question is - which one of these is correct, because I
do suppose we don't really want to keep these two cases different.


7) the CR also requests feedback regarding global schema-level defaults --
while we think they are generally a good thing, we would urge the WG to make
the default for elementFormDefault be "qualified", because from our
every-day experience with technical support during the beta phase of our
product, we can safely conclude that most people do NOT understand the
concept, that locally declared elements within a schema do per default
belong to NO namespace as opposed to the targetNamespace. When any schema
that keeps elementFormDefault="unqualified" is used in an XML instance
document, the result is that such a document may not use a default namespace
for its schema (otherwise there would be no way to access the local elements
that are not part of the targetNamespace!!). It is our opinion, that common
users will simply not understand, why an XML document may need a namespace
prefix for certain elements, while others need to be unprefixed. We will,
therefore, automatically create all new XML schema documents with
elementFormDefault="qualified" and recommend this to all users - and we
would certainly like to suggest to the WG that this be made the default
within the schema spec.


Sincerely,

Alexander Falk

... Alexander Falk
... President, CEO
... Altova, Inc. - The XML Spy Company

=========================================================================
XML Spy 3.0  -  the first true Integrated Development Environment for XML
Visit http://www.xmlspy.com/ to download a free 30-day evaluation version
=========================================================================

Received on Thursday, 14 December 2000 11:23:53 UTC