root element in schema

Hello,

Recently, I raised an issue here at work regarding global and root elements
in xml-schema.  Our xml-specialist did not have an answer immediately, but
later pointed me to a discussion about the subject :
http://lists.w3.org/Archives/Public/xmlschema-dev/2001Jun/0074.html.

I must say I didn't feel comfortable with some statements made there, and
thought I might add my point of view on the subject.



Mr. Mendelsohn states that someone might want to be able to have two
different elements as a root.  I really don't see how this could be a
necessity to anyone.  The root-element itself enables you to name the
schema that rules the xml-document.  It is perfectly possible to refer to a
BOOKLIST.XSD in a <BOOKLIST> root and refer to a BOOK.XSD in a <BOOK> root.
With proper include-mechanisms in place, there is little extra effort
involved in having these two different schemas, instead of only one that
allows different root-element-types.  So I can't really agree with him
there.  And I totally can't agree with what is said about "partial
validation".  This goes against everything xsd stands for.  I clearly
recall having read the guidelines saying that "a parser should stop passing
data from the moment it finds an error.  Furthermore, programs receiving an
error-message from a parser should consider all data they already parsed
from the document as non-existant".  This leads me to conclude that "valid
xml" (according to xsd) is (meant to be) an all-or-nothing proposition.
There is no such thing as "partially valid".  And the fact that some
programmer might want to do something like partial validation, is not a
good reason to "accept" this line of thinking.  Programmers have been
interpreting standards and guidelines in this fashion ("I will use what
comes to good use and ignore whatever I don't like") for as long as I
remember (unfortunately).  They have always been and will always stay the
main reason why so many efforts toward standardisation prove useless and
simply fail.

Think about it for a moment.  Two organisations (be it two companies, or a
company and the government, or two departments within a company, or
whatever ...) decide to exchange data about, let's say, "customers" in
xml-format.  They agree on a <customer> root-element which holds several
subordinate elements, <custnr> (mandatory), followed by either a
<legalperson> element, or a <naturalperson> element.  The <legalperson>
contains <name> and <legalform> elements, the <naturalperson> contains
<surname>, <firstname> and <initials> elements.  Now, in this example, if
one side sent an xml-form with only a <firstname>-element (and thus without
the customer number), then a validation process based on xsd would not mark
this form as "invalid", even though elements which were clearly intended
and declared to be mandatory (<custnr> e.g.), aren't there at all ?  Come
on guys, let's be serious for a moment.

It would seem obvious to me that :
a) a receiving party cannot do anything with just the <firstname> element,
it will always need at least the customer number, before it is able to
perform whatever useful processing it could do with this message.
b) a receiving party would therefore expect its "validation process" to
mark this "<firstname>-only" message as "invalid", because it lacks
essential data.  Rightfully so.
c) If the receiving party cannot rely on xsd to do just that, then what
good is xsd anyway to anybody ?

I think this little example shows clear enough that there is indeed a need
for being able do designate some element as being the root in xmlschema.



Now for how to achieve this ?  To do that, we need some information that
enables us to distinguish between an element that is "global", and which
element(s) is(are) actually present (or possibly present) in the xml
described by the schema.  In fact, these "global" elements apparently serve
the purpose of "declaring" the structure of some type of element, not
declaring the (possible) presence of such element in an xml-document.

Apparently, xsd now has two distinct meanings for the <element>-element :
1) as a declaration of a certain type that can be referred to later in the
schema.
2) as a declaration of the possible occurrence of such element in an
xml-document.

To my idea, this is flat out WRONG.  If two distinct sorts of information
are needed (here the "type-declaration" and the "xml-element-declaration",
then they should have different names, or be recognisable as such in
whatever way is appropriate.  The xsd-syntax apparently does not allow
this.  There is no way to determine unambiguously what "meaning" has to be
assigned to an <element> in a schema.  I feel this is a major design error
in the xsd syntax, which should be removed as soon as possible.

Designers do have a way to avoid this problem (by using <simpletype> and
<complextype> for declarations, and using <element> for actual xml-element
description, assigning them type-information by "type=typeref"), but this
is no solution for someone writing a schema-validation process.  The
authors of schema validation processes cannot rely on the fact that every
schema-author will use this method.

Received on Wednesday, 16 April 2003 08:09:42 UTC