Schema-induced challenges and OAGIS [ was Re: additional constraints validation variant] from Paul Kiel on 2002-08-01 (xmlschema-dev@w3.org from August 2002)

From: Paul Kiel <paul@hr-xml.org>
Date: Thu, 1 Aug 2002 13:43:22 -0400 (EDT)
To: "Mark Feblowitz" <mfeblowitz@frictionless.com>, <xmlschema-dev@w3.org>
Cc: "David Connelly $E-mail$" <dconnelly@openapplications.org>, "Duane Krahn $E-mail$" <duane.krahn@irista.com>, "Satish Ramanathan $E-mail$" <Satish.Ramanathan@mro.com>, "Andrew Warren $E-mail$" <awarren@openapplications.org>, "Kurt A Kanaskie $Kurt$ $E-mail$" <kkanaskie@lucent.com>, "Michael Rowell $E-mail$" <mrowell@openapplications.org>, "TSC" <tsc@lists.hr-xml.org>
Message-ID: <001b01c23983$a90373f0$6401a8c0@pkiel2>

Let me know when it goes too far off-topic and I'll stop CCing the list.

[Paul] I updated the subject line to reflect this.

In some cases, though, I fear that the approach may be barely tractable. A difficult part comes in assembling a Noun from shared Components. It's conceivable that a particular Noun might need only one part of one component, another part of another, .. (Simple XPath expressions stating the required parts are the easiest to apply and maintain.) Since all components are stored in a single Components.xsd, it's conceivable that there would have to be a huge number of versions of the Components file to accommodate all needed combinations of cardinalities. One could pre-assemble the Noun from its components prior to overlaying the constraints, but now you're doing even more of the parser's job.

Or one could carve up the components into separate files. Other standards have done this. This could amount to a significant number of files, with significant runtime overhead of including/importing each component file (variant) for each of the many components used.

In addition to assembly from components, there's also the problem of inheritance: If I have a PurchaseOrder of type Order, my constraints could (and would) apply to both the local content of the PurchaseOrder and the inherited content of Order. Still a mere "logisticality" - you'd have to generate the Order variants and pick the right one for the context. The trick is in getting your preprocessor to correctly derive the right set of Order variaints for each of the PurchaseOrder, SalesOrder,. Fanout, again.

[Paul] I see your point more clearly here. The underlying motivator is of course usability - and it seems that reduction in processing model heaves lots of overhead on development. A trade off once again. Do you still have your previous attempt at this that you mentioned? I'd love to see what you did. What I did was a first shot and I had no idea if it would scale. It does seem there are problems scaling.

<digression>Also on the usability front, a lesser issue of needing "pretty pictures" would have been solved by this solution. The domain people don't give hoot about xml schema and having a static schema that would generate a nice gif (read TurboXML or XMLSpy) of their transaction in their context is invaluable when modelling processes. It is sometimes tough to point to a lax model picture and have to explain it. But this can be overcome in other ways.</digression>

Since we can imagine a scenario where the constraints can be generated and trivially evaluated by a validating parser, it seems as though this is something that Schema could support. The fact that it represents multiple, alternative constraint sets is what makes it more of a challenge to achieve. If only derivation by restriction were usable here. Or if Schema supported an embedded constraint facility. But I repeat myself.

[Choir] Amen!

Only a conflict if your desire was to do away with the added overhead of two-pass validation (which, BTW, I don't see on the XSDValidationFlow diagram). IMHO, having the combined architecture that both pre-applies the cardinality constraints and does post-validation checks would be optimal as far as processing efficiency, but I would fear that the size and complexity of the solution would be too large.

[Paul] I may say one-step or two-step validation, but that is of course not the reality. There may be other layers such as taxonomic validation (validating against a known value set - such as an official taxonomy for an element listed as "string" which is passed in anticipation of a transaction) as well as business process validation, which is reflected in the gifs. So this brings us to at least 4 possible validation steps: parser, xslt constraints, taxonomic, and proprietary application or business logic. And this does not even address enveloping/messaging issues! Hence my desire to eliminate a step if possible.

So I guess I cannot dispute your key issue about scalability of this solution. I'm having a hard time giving up on it however <stubborn/>. I may keep this in my back pocket and fidget with it during fits of insomnia <pigheaded/>. I'll make a plea again for any code you have on this.

You all should write a book on your travels - not what you ended up with, but what you tried and WHY you ended up the way you did.

Paul

----- Original Message -----
From: Mark Feblowitz
To: 'Paul Kiel' ; xmlschema-dev@w3.org
Cc: David Connelly (E-mail) ; Duane Krahn (E-mail) ; Satish Ramanathan (E-mail) ; Andrew Warren (E-mail) ; Kurt A Kanaskie (Kurt) (E-mail) ; Michael Rowell (E-mail)
Sent: Wednesday, July 31, 2002 6:11 PM
Subject: RE: additional constraints validation variant

Schema-dev people. Pardon us while we have an OAGIS-oriented conversation. I'd like to keep y'all in the loop, since this particular issue is one of the stickiest of the Schema-induced challenges we faced as we developed OAGIS on Schema. And challenges to our approach persist.

You may want to skip down to the discussion of the second validation step.

We'd love to look ahead to some relief, in the form of future Schema Rec evolution. Perhaps you Working Group people can read and comment.

Let me know when it goes too far off-topic and I'll stop CCing the list.

Read on.

-----Original Message-----
From: Paul Kiel [mailto:paul@hr-xml.org]
Sent: Wednesday, July 31, 2002 2:53 PM
To: Mark Feblowitz; xmlschema-dev@w3.org
Cc: David Connelly (E-mail); Duane Krahn (E-mail); Satish Ramanathan (E-mail); Andrew Warren (E-mail); Kurt A Kanaskie (Kurt) (E-mail); Mark Feblowitz; Michael Rowell (E-mail)
Subject: Re: additional constraints validation variant

Mark,

I see you are ahead of the curve as usual.

Gosh.

>>The reasons we rejected it had to do with complexity: first, it's complex to manage multiple schemas and to link to "the right" generated schema. That's not so bad, but can be daunting.

[Paul] A clear issue I realize. But not a deal breaker. The obvious suspect would be naming conventions of some type.

True. Not impossible, just complex. In fact, none of the rationale say that your suggestion is impossible. It comes down to a judgment call as to whether it's worth the effort and relative complexity. I liked the idea and still do. But I am not yet convinced that it's a complete fit with the rest of the OAGIS 8 architecture. Perhaps.

>>Let's say I have a PurchaseOrder used in 6 different contexts. I generate 6 variants from the one relaxed model plus the 6 sets of separately-specified cardinality constraints. Now I derive an extension to PurchaseOrder. That means I have 6 more generated variants (36 if I'm foolish enough to allow extensions to the first 6). Several challenges arise:

>>First, I must make sure that my derived, extended set also follows the original cardinality constraints. This also means that I must invent a constraint language that mirrors the extensibility of my schema.

>>The big challenge comes when making sure that any user of the generated, extended variants uses the correct one, from the correct set. What if I have two layers of extension? More? In theory, it's possible, but practically speaking, we guessed that getting the cascading schemaLocation hints correct would be a significant challenge.

[Paul] Not sure I understand. Could one not create an xslt that would take any constraint and apply it to a schema? So applying constraints to extensions would use the same xslt. It's only limited by the ability of mapping an xpath in a schematron constraint asstertion to its schema definitions (not easy, as anyone who has created stylesheets for schemas knows, but still doable).

Again, not a deal-breaker, but difficult. Yes, you could assume extension only. You could come up with an approach whereby a new set of extended xsd files could be generated, based on the base set of constraints plus the sets for the overlays, by progressively transforming according to the base set of constraints/XSLT, and then further transforming by an extended set for each overlay. You'd have to be quite careful that the constraint language has regular and predictable semantics, compatible with your extension facility. Correctly crafting that language and the constraints could get tricky, but not impossible.

[Paul] If the derived schemas are static with a clear naming convention, could not the problems associated with "finding the right one" be mitigated to some extent?

Seems so. The cleverness would not be limited to the having a clever storage scheme, but would require appropriate adjustments to the schemaLocations for all import/include statements.

[Paul] Your point on extensions is well taken. I guess what I would need to ask is - not how much of a driver extension is but what kind? My experience is that for the most part people need only simple extensions, and rarely need layers upon layers upon layers of extension. Should the need to simplify the use of many layers of extension be valued above the ability to simplify the processing model? I speak of usability here. Would a simpler processing model with simple extensions meet the 80-20 rule?

Indeed, it's a tradeoff. Limiting vertical extensibility is an option. Trouble is, who's to say how many layers would be enough? Let's say you have the base OAGIS vocabulary. You layer on the HR-XML vocabulary. You're done, but what about your client organizations? Would they be willing to stop there, or would they want to further customize/localize? It is a tradeoff. OAGI opted for a simple and unlimited layering, and also a simple (albeit two-step) validation model. We tried other options, and things were either too arbitrarily limited, or simply too messy for the average user (usually both).

That second validation step, which is arguably duplicative of validating parser functionality, is what most people seem to be pushing back hard against. It's likely the apparent "wastefulness" that triggers so much pushback, and the mere existence of an extra validation step, requiring additional infrastructure. Mostly, though, it offends our sensibilities - especially those of us who were trained to cherish every cycle. So far, we've seen no evidence that the second pass is prohibitively expensive in practice. And, like validation, it can be turned off if needed. Still.

Do recall that cardinality checking in validating parsers is not static. Regardless of the mechanism, there is active code evaluating cardinality violations. The key question is this: what additional overhead must be devoted to the XSLT processing? If it's done in the same parse (same JVM, if it's java), it amounts to additional cost of applying each XPath expression (plus some other overhead). Is that an affordable inefficiency? We need more data.

Development-time transformation is indeed a cool idea. And one that I tried pursue. It may be worth another try. Just be mindful that it must play well with the remainder of the representation architecture and language features.

>>Another, somewhat unrelated reason for rejecting this approach was that we liked that we could use Schematron for other, non-cardinality-oriented constraints, such as the much-sought-after co-occurrence constraint. With development-time generation, the only things that could be transformed into Schema were the things that were supportable in Schema. Co-occurrence and other similar constraints could not be supported development-time, simply because there is no equivalent to transform to in Schema.

[Paul] Another good point. I don't see how this conflicts however. The "XSDValidationFlow" - a poor name I know - uses schematron constraints in an xml file just as the "InstanceValidationFlow". In both cases co-occurance constraints could be done in a second layer validation as it would have to be, since it cannot be represented in schema as you state. I see this as a separate issue and not in conflict.

Thanks for your insights Mark,

Paul

And for yours,

Mark

----- Original Message -----

From: Mark Feblowitz

To: 'Paul Kiel' ; xmlschema-dev@w3.org

Cc: David Connelly (E-mail) ; Duane Krahn (E-mail) ; Satish Ramanathan (E-mail) ; Andrew Warren (E-mail) ; Kurt A Kanaskie (Kurt) (E-mail) ; Mark Feblowitz ; Michael Rowell (E-mail)

Sent: Wednesday, July 31, 2002 1:52 PM

Subject: RE: additional constraints validation variant

Paul -

Your idea is quite compelling. In fact, it was one of the many we considered (and ultimately abandoned). I liked this approach so much I even mocked it up myself.

For the unfamiliar, the problem could be summarized as follows:

How does one support multiple uses of "the same" complexType, with different minimum cardinalities in each content model?

The problem arises in OAGIS when we want to apply, e.g., the noun "PurchaseOrder" in different contexts: CancelPurchaseOrder requires only minimal, identifying PurchaseOrder content (but could contain any), and ProcessPurchaseOrder requires most of the PurchaseOrder content. In the former, most of the content would be optional (minOccurs="0"); in the latter, most would be required (minOccurs="1").

With complexType derivation by restriction being a non-starter for us, we were forced to come up with an alternative. What we settled on, after months of painstaking exploration, was a "relaxed" model of all-optional content (all element content with minOccurs="0"), with separately specified cardinality constraints layered on via post-validation Schematron processing. This requires two-pass validation: schema-validation and Schematron processing. That extra step, although achievable using standard technologies (schema-validating parser plus XSLT processor), offends some sensibilities and raises efficiency concerns. (Some efficiency concerns will be addressed when XSLT processors facilitate schema-validation plus transformation, which should be very soon).

Paul's suggested approach is a development-time alternative: rather than performing interchange-time constraint checking, he proposes that the same cardinality constraints be used to guide a transformation of the relaxed schema into other, cardinality-constrained schemas.

The benefits are obvious: no extra runtime machinery is required - only a schema-validating parser is necessary to check for correct structure, types and cardinalities.

The reasons we rejected it had to do with complexity: first, it's complex to manage multiple schemas and to link to "the right" generated schema. That's not so bad, but can be daunting. But the real difficulty comes in managing the fan-out in the face of further extensions.

Let's say I have a PurchaseOrder used in 6 different contexts. I generate 6 variants from the one relaxed model plus the 6 sets of separately-specified cardinality constraints. Now I derive an extension to PurchaseOrder. That means I have 6 more generated variants (36 if I'm foolish enough to allow extensions to the first 6). Several challenges arise:

First, I must make sure that my derived, extended set also follows the original cardinality constraints. This also means that I must invent a constraint language that mirrors the extensibility of my schema.

The big challenge comes when making sure that any user of the generated, extended variants uses the correct one, from the correct set. What if I have two layers of extension? More? In theory, it's possible, but practically speaking, we guessed that getting the cascading schemaLocation hints correct would be a significant challenge.

Another, somewhat unrelated reason for rejecting this approach was that we liked that we could use Schematron for other, non-cardinality-oriented constraints, such as the much-sought-after co-occurrence constraint. With development-time generation, the only things that could be transformed into Schema were the things that were supportable in Schema. Co-occurrence and other similar constraints could not be supported development-time, simply because there is no equivalent to transform to in Schema.

I'd be happy to discuss this all further. I'd be even happier if at least this subset of constraints was somehow incorporated into Schema.

Mark

-----Original Message-----
From: Paul Kiel [mailto:paul@hr-xml.org]
Sent: Wednesday, July 31, 2002 12:18 PM
To: xmlschema-dev@w3.org
Cc: Mark Feblowitz
Subject: additional constraints validation variant

Greetings folks,

I have been working with the schematron "adding additional constraints" issues that are most accurately addressed in the OAGIS8.0 design. This design solves the problem quite well of the desire for a single general model that is constrained by context. Nice job folks! (For example, in one of our cases having a general HR-XML TimeCard with contextual variations such as "DeleteTimeCard", "CreateTimeCard", "UpdateTimeCard" etc.)

The use of schematron here is perfect. I would like to add a wrinkle for perhaps a variant to this approach. The links below illustrate two methods of achieving the same goals, both using schematron to document constraints. However where these constraints are applied differs.

The first link, "InstanceValidationFlow", shows how one may use a document (in this case an HR-XML TimeCard) in a validation flow. The two step approach (parser plus xslt) works well.

http://ns.hr-xml.org/temp/InstanceValidationFlow.gif

The second link, XSDValidationFlow", shows a flow where validation occurs via a derivation of the schema itself instead of the instance in a second step. This would maintain the goal of general model with context-specific constraints but without a second step validation (which is where I get the push back from my constituents who otherwise like the use of schematron).

http://ns.hr-xml.org/temp/XSDValidationFlow.gif

What do you think of this approach? I haven't decided if I like it yet, but I thought enough of it to merit a thread here.

The development of a Constraints2XSD stylesheet would not be simple, but I would think doable - and reusable! [I talked with Mark Feblowitz about this once and he was, I believe, intrigued by it -- Mark, is that the case??] Might anyone out there be interested in collaboratively creating such an animal?

Pluses and Minuses:

Method 1 - InstanceValidationFlow
+ transforming constraints to validating xslt easily replicated (i.e. via schematron skeleton xsl)
- results in many xslts laying around for validation (one for each context)
- requires another validation layer via XSLT (performance)

Method 2 - XSDValidationFlow
+ single validation layer
+ makes most use of parser
- results in many schemas laying around for validation (one for each context)
- xslt for transformation of constraints to xsd not developed (yet!?!)

W. Paul Kiel
HR-XML Consortium

Received on Friday, 2 August 2002 05:58:32 UTC