- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 18 Apr 2003 15:09:52 -0400
- To: Erwin.Smout@ksz-bcss.fgov.be
- Cc: xmlschema-dev@w3.org
Erwin Smout writes: >> It is perfectly possible to refer to a BOOKLIST.XSD >> in a <BOOKLIST> root and refer to a BOOK.XSD in a <BOOK> >> root. With proper include-mechanisms in place, there >> is little extra effort involved in having these two >> different schemas, instead of only one that allows >> different root-element-types. Thank you for your comments. I understand you to be suggesting: let each schema document declare exactly one root, which is to be honored if that schema document is referenced explicitly by a schemaLocation in the instance, but not if it is the target of an <xsd:include> from another schema. That seems to me to be fragile in a number of dimensions. First of all, there are many, many situations (such as the typical purchase order) in which you either can't get a schemaLocation into the instance, or in which you wouldn't trust it if it were there. That's why it's a hint. What do we do for all those instances that can't "name" a schema document? Furthermore, we've generally declined to have a schema document mean something different when it's included than when it's referenced in some other manner. You can wind up with rather tricky scenarios in which the same schema document is referenced from multiple places (processor command line, schemaLocation in the instance, <xsd:include>). If the rules for root depend on which of these ways you find it, then it becomes a constraint that all processors encounter these in the same order. That makes it very hard to build streaming processors that work the same way as those that precompile schemas. Here's how I think I would design a mechanism to do what I think you want: * I would add a new boolean property to elementDeclaration to be called "okAsDocumentRoot", which could be set to "true" on one or more global element declarations. * I would add a new attribute to the XML form of an element declaration allowing <xsd:element name="n" OKAsDocumentRoot="true">. This would set the component property in the obvious manner. * I would add a new mode of validation: - In full document mode, it would only be legal to start validation if the element decl that matched the root element had the boolean set to true - To meet the need for incremental validation (see below), you would have an additional validation mode that would ignore the property and allow validation to proceed from any global element declaration. In other words, do what we do today. Is this worthwhile? I'm not convinced, but I'm not strongly against it either. It's a new property, a new attribute, and a new validation model. What it does is to allow you to mark in a schema document the elements that you intend to be a root and to have that checked. Frankly, most of the applications I write know exactly what the root is to be: if I'm a purchasing application, I know perfectly well that the root better be "purchaseOrder" and I check that very easily. There may indeed be other examples where the above would be useful, and if there were a groundswell of support for it, I wouldn't be opposed. As I say, we've heard this request only occasionally, and I'm not currently convinced it makes the 80/20 cut we've tried for. Let me comment briefly on the partial validation question. Here are a few use cases: let's say you have a purchase order xml format, a fairly common example, and it includes a sub element named "shipping address". <purchaseOrder> .... <shippingAddress> <street> ... </street> <city>...</city> <state>..</state> <zip>...</zip> </shippingAddress> </purchaseOrder> You are building a shipping application that prints the address lables for the items to be shipped. It's important that some outer application (which may have done a schema validation on the PO or may have used some other means to make sure that its overall structure is sufficiently trustworthy) passes just the shipping address element to the shipping application. That shipping application chooses to use schema validation on just the shiuppingAddress element. That's what I mean by partial validation, and it is important for many such application decomposition scenarios. Do I really need to separate the address into a different schema document? There would be lots of them, and it seems to tie my processing model unnecessarily to the packaging of the documents. If a book publisher's association wants to publish a vocuabulary for describing books, authors, etc., I don't want them to have to think about the different fragments of book descriptions or catalog entries that I may wish to validate in my applications. They should just publish a schema document to define their namespace and elements, and I should use the ones I need. Not all applications of XML schema are document-oriented. Another very important scenario is taking that entire purchase order and wrapping it in a soap envelope (namespace decls skipped for brevity): <soap:envelope> <soap:body> <po:purchaseOrder> ... </po:purchaseOrder> </soap:body> <soap:envelope> Sometimes you want to validate the whole envelope including the purchase order. Sometimes you don't validate the purchase order until it's been extracted and handed to some purchasing application. So, sometimes purchaseOrder is the root, sometimes not. There are also editing scenarios in which an editor gathers the information for a document out of order. While sooner or later the entire document may be validated or maybe not, it's very useful to be able to validate the fragments as they are gathered. Similar scenarios come up in the design of languages like XML query, which assemble pieces of documents dynamically. It's nice to be able to discuss the validity of those fragments in isolation, as well as in the context of an overall document. So I hope you can see that, while your scenarios involve a very strong notion of "document" and "root", not all do. The question is whether to build a special mechanism to model that, and so-far we've decided that it's reasonable on balance to leave such modeling outside of the language. Again, thank you for your comments, and I'm sure I speak for the Schema WG in saying that we take to heart your concerns that our current mechanisms don't exactly fit your needs. Thank you. Noah ------------------------------------------------------------------ Noah Mendelsohn Voice: 1-617-693-4036 IBM Corporation Fax: 1-617-693-8676 One Rogers Street Cambridge, MA 02142 ------------------------------------------------------------------ Erwin.Smout@ksz-bcss.fgov.be Sent by: xmlschema-dev-request@w3.org 04/16/2003 07:30 AM To: xmlschema-dev@w3.org cc: Subject: root element in schema Hello, Recently, I raised an issue here at work regarding global and root elements in xml-schema. Our xml-specialist did not have an answer immediately, but later pointed me to a discussion about the subject : http://lists.w3.org/Archives/Public/xmlschema-dev/2001Jun/0074.html. I must say I didn't feel comfortable with some statements made there, and thought I might add my point of view on the subject. Mr. Mendelsohn states that someone might want to be able to have two different elements as a root. I really don't see how this could be a necessity to anyone. The root-element itself enables you to name the schema that rules the xml-document. It is perfectly possible to refer to a BOOKLIST.XSD in a <BOOKLIST> root and refer to a BOOK.XSD in a <BOOK> root. With proper include-mechanisms in place, there is little extra effort involved in having these two different schemas, instead of only one that allows different root-element-types. So I can't really agree with him there. And I totally can't agree with what is said about "partial validation". This goes against everything xsd stands for. I clearly recall having read the guidelines saying that "a parser should stop passing data from the moment it finds an error. Furthermore, programs receiving an error-message from a parser should consider all data they already parsed from the document as non-existant". This leads me to conclude that "valid xml" (according to xsd) is (meant to be) an all-or-nothing proposition. There is no such thing as "partially valid". And the fact that some programmer might want to do something like partial validation, is not a good reason to "accept" this line of thinking. Programmers have been interpreting standards and guidelines in this fashion ("I will use what comes to good use and ignore whatever I don't like") for as long as I remember (unfortunately). They have always been and will always stay the main reason why so many efforts toward standardisation prove useless and simply fail. Think about it for a moment. Two organisations (be it two companies, or a company and the government, or two departments within a company, or whatever ...) decide to exchange data about, let's say, "customers" in xml-format. They agree on a <customer> root-element which holds several subordinate elements, <custnr> (mandatory), followed by either a <legalperson> element, or a <naturalperson> element. The <legalperson> contains <name> and <legalform> elements, the <naturalperson> contains <surname>, <firstname> and <initials> elements. Now, in this example, if one side sent an xml-form with only a <firstname>-element (and thus without the customer number), then a validation process based on xsd would not mark this form as "invalid", even though elements which were clearly intended and declared to be mandatory (<custnr> e.g.), aren't there at all ? Come on guys, let's be serious for a moment. It would seem obvious to me that : a) a receiving party cannot do anything with just the <firstname> element, it will always need at least the customer number, before it is able to perform whatever useful processing it could do with this message. b) a receiving party would therefore expect its "validation process" to mark this "<firstname>-only" message as "invalid", because it lacks essential data. Rightfully so. c) If the receiving party cannot rely on xsd to do just that, then what good is xsd anyway to anybody ? I think this little example shows clear enough that there is indeed a need for being able do designate some element as being the root in xmlschema. Now for how to achieve this ? To do that, we need some information that enables us to distinguish between an element that is "global", and which element(s) is(are) actually present (or possibly present) in the xml described by the schema. In fact, these "global" elements apparently serve the purpose of "declaring" the structure of some type of element, not declaring the (possible) presence of such element in an xml-document. Apparently, xsd now has two distinct meanings for the <element>-element : 1) as a declaration of a certain type that can be referred to later in the schema. 2) as a declaration of the possible occurrence of such element in an xml-document. To my idea, this is flat out WRONG. If two distinct sorts of information are needed (here the "type-declaration" and the "xml-element-declaration", then they should have different names, or be recognisable as such in whatever way is appropriate. The xsd-syntax apparently does not allow this. There is no way to determine unambiguously what "meaning" has to be assigned to an <element> in a schema. I feel this is a major design error in the xsd syntax, which should be removed as soon as possible. Designers do have a way to avoid this problem (by using <simpletype> and <complextype> for declarations, and using <element> for actual xml-element description, assigning them type-information by "type=typeref"), but this is no solution for someone writing a schema-validation process. The authors of schema validation processes cannot rely on the fact that every schema-author will use this method.
Received on Friday, 18 April 2003 15:18:28 UTC