Re: additional constraints validation variant from Paul Kiel on 2002-07-31 (xmlschema-dev@w3.org from August 2002)

From: Paul Kiel <paul@hr-xml.org>
Date: Wed, 31 Jul 2002 14:47:15 -0400 (EDT)
To: "Mark Feblowitz" <mfeblowitz@frictionless.com>, <xmlschema-dev@w3.org>
Cc: "David Connelly $E-mail$" <dconnelly@openapplications.org>, "Duane Krahn $E-mail$" <duane.krahn@irista.com>, "Satish Ramanathan $E-mail$" <Satish.Ramanathan@mro.com>, "Andrew Warren $E-mail$" <awarren@openapplications.org>, "Kurt A Kanaskie $Kurt$ $E-mail$" <kkanaskie@lucent.com>, "Mark Feblowitz" <mfeblowitz@frictionless.com>, "Michael Rowell $E-mail$" <mrowell@openapplications.org>
Message-ID: <00e001c238c3$79b46980$6401a8c0@pkiel2>

Mark,

I see you are ahead of the curve as usual.

>>The reasons we rejected it had to do with complexity: first, it's complex to manage multiple schemas and to link to "the right" generated schema. That's not so bad, but can be daunting.

[Paul] A clear issue I realize. But not a deal breaker. The obvious suspect would be naming conventions of some type.

>>Let's say I have a PurchaseOrder used in 6 different contexts. I generate 6 variants from the one relaxed model plus the 6 sets of separately-specified cardinality constraints. Now I derive an extension to PurchaseOrder. That means I have 6 more generated variants (36 if I'm foolish enough to allow extensions to the first 6). Several challenges arise:

>>First, I must make sure that my derived, extended set also follows the original cardinality constraints. This also means that I must invent a constraint language that mirrors the extensibility of my schema.

>>The big challenge comes when making sure that any user of the generated, extended variants uses the correct one, from the correct set. What if I have two layers of extension? More? In theory, it's possible, but practically speaking, we guessed that getting the cascading schemaLocation hints correct would be a significant challenge.

[Paul] Not sure I understand. Could one not create an xslt that would take any constraint and apply it to a schema? So applying constraints to extensions would use the same xslt. It's only limited by the ability of mapping an xpath in a schematron constraint asstertion to its schema definitions (not easy, as anyone who has created stylesheets for schemas knows, but still doable).

[Paul] If the derived schemas are static with a clear naming convention, could not the problems associated with "finding the right one" be mitigated to some extent?

[Paul] Your point on extensions is well taken. I guess what I would need to ask is - not how much of a driver extension is but what kind? My experience is that for the most part people need only simple extensions, and rarely need layers upon layers upon layers of extension. Should the need to simplify the use of many layers of extension be valued above the ability to simplify the processing model? I speak of usability here. Would a simpler processing model with simple extensions meet the 80-20 rule?

>>Another, somewhat unrelated reason for rejecting this approach was that we liked that we could use Schematron for other, non-cardinality-oriented constraints, such as the much-sought-after co-occurrence constraint. With development-time generation, the only things that could be transformed into Schema were the things that were supportable in Schema. Co-occurrence and other similar constraints could not be supported development-time, simply because there is no equivalent to transform to in Schema.

[Paul] Another good point. I don't see how this conflicts however. The "XSDValidationFlow" - a poor name I know - uses schematron constraints in an xml file just as the "InstanceValidationFlow". In both cases co-occurance constraints could be done in a second layer validation as it would have to be, since it cannot be represented in schema as you state. I see this as a separate issue and not in conflict.

Thanks for your insights Mark,

Paul

----- Original Message -----
From: Mark Feblowitz
To: 'Paul Kiel' ; xmlschema-dev@w3.org
Cc: David Connelly (E-mail) ; Duane Krahn (E-mail) ; Satish Ramanathan (E-mail) ; Andrew Warren (E-mail) ; Kurt A Kanaskie (Kurt) (E-mail) ; Mark Feblowitz ; Michael Rowell (E-mail)
Sent: Wednesday, July 31, 2002 1:52 PM
Subject: RE: additional constraints validation variant

Paul -

Your idea is quite compelling. In fact, it was one of the many we considered (and ultimately abandoned). I liked this approach so much I even mocked it up myself.

For the unfamiliar, the problem could be summarized as follows:

How does one support multiple uses of "the same" complexType, with different minimum cardinalities in each content model?

The problem arises in OAGIS when we want to apply, e.g., the noun "PurchaseOrder" in different contexts: CancelPurchaseOrder requires only minimal, identifying PurchaseOrder content (but could contain any), and ProcessPurchaseOrder requires most of the PurchaseOrder content. In the former, most of the content would be optional (minOccurs="0"); in the latter, most would be required (minOccurs="1").

With complexType derivation by restriction being a non-starter for us, we were forced to come up with an alternative. What we settled on, after months of painstaking exploration, was a "relaxed" model of all-optional content (all element content with minOccurs="0"), with separately specified cardinality constraints layered on via post-validation Schematron processing. This requires two-pass validation: schema-validation and Schematron processing. That extra step, although achievable using standard technologies (schema-validating parser plus XSLT processor), offends some sensibilities and raises efficiency concerns. (Some efficiency concerns will be addressed when XSLT processors facilitate schema-validation plus transformation, which should be very soon).

Paul's suggested approach is a development-time alternative: rather than performing interchange-time constraint checking, he proposes that the same cardinality constraints be used to guide a transformation of the relaxed schema into other, cardinality-constrained schemas.

The benefits are obvious: no extra runtime machinery is required - only a schema-validating parser is necessary to check for correct structure, types and cardinalities.

The reasons we rejected it had to do with complexity: first, it's complex to manage multiple schemas and to link to "the right" generated schema. That's not so bad, but can be daunting. But the real difficulty comes in managing the fan-out in the face of further extensions.

Let's say I have a PurchaseOrder used in 6 different contexts. I generate 6 variants from the one relaxed model plus the 6 sets of separately-specified cardinality constraints. Now I derive an extension to PurchaseOrder. That means I have 6 more generated variants (36 if I'm foolish enough to allow extensions to the first 6). Several challenges arise:

First, I must make sure that my derived, extended set also follows the original cardinality constraints. This also means that I must invent a constraint language that mirrors the extensibility of my schema.

The big challenge comes when making sure that any user of the generated, extended variants uses the correct one, from the correct set. What if I have two layers of extension? More? In theory, it's possible, but practically speaking, we guessed that getting the cascading schemaLocation hints correct would be a significant challenge.

Another, somewhat unrelated reason for rejecting this approach was that we liked that we could use Schematron for other, non-cardinality-oriented constraints, such as the much-sought-after co-occurrence constraint. With development-time generation, the only things that could be transformed into Schema were the things that were supportable in Schema. Co-occurrence and other similar constraints could not be supported development-time, simply because there is no equivalent to transform to in Schema.

I'd be happy to discuss this all further. I'd be even happier if at least this subset of constraints was somehow incorporated into Schema.

Mark

-----Original Message-----
From: Paul Kiel [mailto:paul@hr-xml.org]
Sent: Wednesday, July 31, 2002 12:18 PM
To: xmlschema-dev@w3.org
Cc: Mark Feblowitz
Subject: additional constraints validation variant

Greetings folks,

I have been working with the schematron "adding additional constraints" issues that are most accurately addressed in the OAGIS8.0 design. This design solves the problem quite well of the desire for a single general model that is constrained by context. Nice job folks! (For example, in one of our cases having a general HR-XML TimeCard with contextual variations such as "DeleteTimeCard", "CreateTimeCard", "UpdateTimeCard" etc.)

The use of schematron here is perfect. I would like to add a wrinkle for perhaps a variant to this approach. The links below illustrate two methods of achieving the same goals, both using schematron to document constraints. However where these constraints are applied differs.

The first link, "InstanceValidationFlow", shows how one may use a document (in this case an HR-XML TimeCard) in a validation flow. The two step approach (parser plus xslt) works well.

http://ns.hr-xml.org/temp/InstanceValidationFlow.gif

The second link, XSDValidationFlow", shows a flow where validation occurs via a derivation of the schema itself instead of the instance in a second step. This would maintain the goal of general model with context-specific constraints but without a second step validation (which is where I get the push back from my constituents who otherwise like the use of schematron).

http://ns.hr-xml.org/temp/XSDValidationFlow.gif

What do you think of this approach? I haven't decided if I like it yet, but I thought enough of it to merit a thread here.

The development of a Constraints2XSD stylesheet would not be simple, but I would think doable - and reusable! [I talked with Mark Feblowitz about this once and he was, I believe, intrigued by it -- Mark, is that the case??] Might anyone out there be interested in collaboratively creating such an animal?

Pluses and Minuses:

Method 1 - InstanceValidationFlow
+ transforming constraints to validating xslt easily replicated (i.e. via schematron skeleton xsl)
- results in many xslts laying around for validation (one for each context)
- requires another validation layer via XSLT (performance)

Method 2 - XSDValidationFlow
+ single validation layer
+ makes most use of parser
- results in many schemas laying around for validation (one for each context)
- xslt for transformation of constraints to xsd not developed (yet!?!)

W. Paul Kiel
HR-XML Consortium

Received on Friday, 2 August 2002 05:54:07 UTC