- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 18 Apr 2003 15:09:52 -0400
- To: Erwin.Smout@ksz-bcss.fgov.be
- Cc: xmlschema-dev@w3.org
Erwin Smout writes:
>> It is perfectly possible to refer to a BOOKLIST.XSD
>> in a <BOOKLIST> root and refer to a BOOK.XSD in a <BOOK>
>> root. With proper include-mechanisms in place, there
>> is little extra effort involved in having these two
>> different schemas, instead of only one that allows
>> different root-element-types.
Thank you for your comments. I understand you to be suggesting: let each
schema document declare exactly one root, which is to be honored if that
schema document is referenced explicitly by a schemaLocation in the
instance, but not if it is the target of an <xsd:include> from another
schema. That seems to me to be fragile in a number of dimensions. First
of all, there are many, many situations (such as the typical purchase
order) in which you either can't get a schemaLocation into the instance,
or in which you wouldn't trust it if it were there. That's why it's a
hint. What do we do for all those instances that can't "name" a schema
document?
Furthermore, we've generally declined to have a schema document mean
something different when it's included than when it's referenced in some
other manner. You can wind up with rather tricky scenarios in which the
same schema document is referenced from multiple places (processor command
line, schemaLocation in the instance, <xsd:include>). If the rules for
root depend on which of these ways you find it, then it becomes a
constraint that all processors encounter these in the same order. That
makes it very hard to build streaming processors that work the same way as
those that precompile schemas.
Here's how I think I would design a mechanism to do what I think you want:
* I would add a new boolean property to elementDeclaration to be called
"okAsDocumentRoot", which could be set to "true" on one or more global
element declarations.
* I would add a new attribute to the XML form of an element declaration
allowing <xsd:element name="n" OKAsDocumentRoot="true">. This would set
the component property in the obvious manner.
* I would add a new mode of validation:
- In full document mode, it would only be legal to start validation if the
element decl that matched the root element had the boolean set to true
- To meet the need for incremental validation (see below), you would have
an additional validation mode that would ignore the property and allow
validation to proceed from any global element declaration. In other
words, do what we do today.
Is this worthwhile? I'm not convinced, but I'm not strongly against it
either. It's a new property, a new attribute, and a new validation model.
What it does is to allow you to mark in a schema document the elements
that you intend to be a root and to have that checked. Frankly, most of
the applications I write know exactly what the root is to be: if I'm a
purchasing application, I know perfectly well that the root better be
"purchaseOrder" and I check that very easily. There may indeed be other
examples where the above would be useful, and if there were a groundswell
of support for it, I wouldn't be opposed. As I say, we've heard this
request only occasionally, and I'm not currently convinced it makes the
80/20 cut we've tried for.
Let me comment briefly on the partial validation question. Here are a few
use cases: let's say you have a purchase order xml format, a fairly
common example, and it includes a sub element named "shipping address".
<purchaseOrder>
....
<shippingAddress>
<street> ... </street>
<city>...</city>
<state>..</state>
<zip>...</zip>
</shippingAddress>
</purchaseOrder>
You are building a shipping application that prints the address lables for
the items to be shipped. It's important that some outer application
(which may have done a schema validation on the PO or may have used some
other means to make sure that its overall structure is sufficiently
trustworthy) passes just the shipping address element to the shipping
application. That shipping application chooses to use schema validation
on just the shiuppingAddress element. That's what I mean by partial
validation, and it is important for many such application decomposition
scenarios. Do I really need to separate the address into a different
schema document? There would be lots of them, and it seems to tie my
processing model unnecessarily to the packaging of the documents. If a
book publisher's association wants to publish a vocuabulary for describing
books, authors, etc., I don't want them to have to think about the
different fragments of book descriptions or catalog entries that I may
wish to validate in my applications. They should just publish a schema
document to define their namespace and elements, and I should use the ones
I need. Not all applications of XML schema are document-oriented.
Another very important scenario is taking that entire purchase order and
wrapping it in a soap envelope (namespace decls skipped for brevity):
<soap:envelope>
<soap:body>
<po:purchaseOrder>
...
</po:purchaseOrder>
</soap:body>
<soap:envelope>
Sometimes you want to validate the whole envelope including the purchase
order. Sometimes you don't validate the purchase order until it's been
extracted and handed to some purchasing application. So, sometimes
purchaseOrder is the root, sometimes not.
There are also editing scenarios in which an editor gathers the
information for a document out of order. While sooner or later the entire
document may be validated or maybe not, it's very useful to be able to
validate the fragments as they are gathered. Similar scenarios come up in
the design of languages like XML query, which assemble pieces of documents
dynamically. It's nice to be able to discuss the validity of those
fragments in isolation, as well as in the context of an overall document.
So I hope you can see that, while your scenarios involve a very strong
notion of "document" and "root", not all do. The question is whether to
build a special mechanism to model that, and so-far we've decided that
it's reasonable on balance to leave such modeling outside of the language.
Again, thank you for your comments, and I'm sure I speak for the Schema
WG in saying that we take to heart your concerns that our current
mechanisms don't exactly fit your needs. Thank you.
Noah
------------------------------------------------------------------
Noah Mendelsohn Voice: 1-617-693-4036
IBM Corporation Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Erwin.Smout@ksz-bcss.fgov.be
Sent by: xmlschema-dev-request@w3.org
04/16/2003 07:30 AM
To: xmlschema-dev@w3.org
cc:
Subject: root element in schema
Hello,
Recently, I raised an issue here at work regarding global and root
elements
in xml-schema. Our xml-specialist did not have an answer immediately, but
later pointed me to a discussion about the subject :
http://lists.w3.org/Archives/Public/xmlschema-dev/2001Jun/0074.html.
I must say I didn't feel comfortable with some statements made there, and
thought I might add my point of view on the subject.
Mr. Mendelsohn states that someone might want to be able to have two
different elements as a root. I really don't see how this could be a
necessity to anyone. The root-element itself enables you to name the
schema that rules the xml-document. It is perfectly possible to refer to
a
BOOKLIST.XSD in a <BOOKLIST> root and refer to a BOOK.XSD in a <BOOK>
root.
With proper include-mechanisms in place, there is little extra effort
involved in having these two different schemas, instead of only one that
allows different root-element-types. So I can't really agree with him
there. And I totally can't agree with what is said about "partial
validation". This goes against everything xsd stands for. I clearly
recall having read the guidelines saying that "a parser should stop
passing
data from the moment it finds an error. Furthermore, programs receiving
an
error-message from a parser should consider all data they already parsed
from the document as non-existant". This leads me to conclude that "valid
xml" (according to xsd) is (meant to be) an all-or-nothing proposition.
There is no such thing as "partially valid". And the fact that some
programmer might want to do something like partial validation, is not a
good reason to "accept" this line of thinking. Programmers have been
interpreting standards and guidelines in this fashion ("I will use what
comes to good use and ignore whatever I don't like") for as long as I
remember (unfortunately). They have always been and will always stay the
main reason why so many efforts toward standardisation prove useless and
simply fail.
Think about it for a moment. Two organisations (be it two companies, or a
company and the government, or two departments within a company, or
whatever ...) decide to exchange data about, let's say, "customers" in
xml-format. They agree on a <customer> root-element which holds several
subordinate elements, <custnr> (mandatory), followed by either a
<legalperson> element, or a <naturalperson> element. The <legalperson>
contains <name> and <legalform> elements, the <naturalperson> contains
<surname>, <firstname> and <initials> elements. Now, in this example, if
one side sent an xml-form with only a <firstname>-element (and thus
without
the customer number), then a validation process based on xsd would not
mark
this form as "invalid", even though elements which were clearly intended
and declared to be mandatory (<custnr> e.g.), aren't there at all ? Come
on guys, let's be serious for a moment.
It would seem obvious to me that :
a) a receiving party cannot do anything with just the <firstname> element,
it will always need at least the customer number, before it is able to
perform whatever useful processing it could do with this message.
b) a receiving party would therefore expect its "validation process" to
mark this "<firstname>-only" message as "invalid", because it lacks
essential data. Rightfully so.
c) If the receiving party cannot rely on xsd to do just that, then what
good is xsd anyway to anybody ?
I think this little example shows clear enough that there is indeed a need
for being able do designate some element as being the root in xmlschema.
Now for how to achieve this ? To do that, we need some information that
enables us to distinguish between an element that is "global", and which
element(s) is(are) actually present (or possibly present) in the xml
described by the schema. In fact, these "global" elements apparently
serve
the purpose of "declaring" the structure of some type of element, not
declaring the (possible) presence of such element in an xml-document.
Apparently, xsd now has two distinct meanings for the <element>-element :
1) as a declaration of a certain type that can be referred to later in the
schema.
2) as a declaration of the possible occurrence of such element in an
xml-document.
To my idea, this is flat out WRONG. If two distinct sorts of information
are needed (here the "type-declaration" and the "xml-element-declaration",
then they should have different names, or be recognisable as such in
whatever way is appropriate. The xsd-syntax apparently does not allow
this. There is no way to determine unambiguously what "meaning" has to be
assigned to an <element> in a schema. I feel this is a major design error
in the xsd syntax, which should be removed as soon as possible.
Designers do have a way to avoid this problem (by using <simpletype> and
<complextype> for declarations, and using <element> for actual xml-element
description, assigning them type-information by "type=typeref"), but this
is no solution for someone writing a schema-validation process. The
authors of schema validation processes cannot rely on the fact that every
schema-author will use this method.
Received on Friday, 18 April 2003 15:18:28 UTC