Re: Major problem with schema needs immediate attention. from John Boyer on 2007-11-02 (public-forms@w3.org from November 2007)

From: John Boyer <boyerj@ca.ibm.com>
Date: Thu, 1 Nov 2007 17:11:46 -0700
To: ebruchez@orbeon.com
Cc: "Forms WG (new)" <public-forms@w3.org>, public-forms-request@w3.org
Message-ID: <OFDA5FA200.15ADC839-ON88257386.0082CE70-88257387.00013183@ca.ibm.com>
Hi Erik,

The use case for "strict when possible" is really straightforward. 

Schemas are coming from data architects, who expect that the process 
contents default *defined* in XML Schema 1.0 will be observed.  That 
default is strict.  In other words, when the data architect writes a 
schema, he expects strict unless *he* says otherwise.  He also expects, 
despite classification as optional to implement, that lax validation will 
occur within content once a strict validation failure occurs.

These expectations arise because the server side code that processes 
submitted schema instance results from forms expects to be able to invoke 
the schema validity check in default mode and will reject the input if the 
schema validation fails.  Within minutes of that happening, I will end up 
with a priority 1 crit sit demanding that we fix our xforms processor to 
not send data that fails schema validation, and I will be compelled to 
"fix it" despite any claims of lax validation appearing in the actual 1.1 
specification.  This will do more harm than good for xforms.

All prior versions of the spec do not say what kind of validation is 
performed, they simply reference XML Schema 1.0, which by my read means 
that strict processing is what occurs in the absence of a processContent 
declarations to the contrary in the schema.

In the wild, we have had some implementations that have dropped down to 
lax in obvious cases, like a type lib, that have arisen over time in 
practice.  Sadly, none of these implementation experiences landed in the 
spec, and the feature demand did not percolate up until your LC comment 
(i.e. nobody has taken the action item to fix the problem in years now). 
We have to address the LC comment, which could be done by deferring it or 
by doing something about it.  If we choose other than defer, we have to be 
sure that no big objections are going to happen, and removing the strict 
validation will assuredly cause that to happen.  Maybe fixing it the way 
we have been discussing it will also cause an objection, e.g. from Orbeon. 
 In that case, we're pretty much stuck with deferral.  Frankly, I'd rather 
defer and lose official, interoperable support of typelibs over going from 
strict to lax.

FWIW, I do like best the approach Leigh is taking because it's clear there 
are multiple ways to implement and it seems it can be overriden by XSLT 
2.0 style attributes later, so we're not getting boxed in now.  So the 
question would be whether you could live with it... :-)

Best regards,
John M. Boyer, Ph.D.
STSM: Lotus Forms Architect and Researcher
Chair, W3C Forms Working Group
Workplace, Portal and Collaboration Software
IBM Victoria Software Lab
E-Mail: boyerj@ca.ibm.com 

Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer





Erik Bruchez <ebruchez@orbeon.com> 
Sent by: public-forms-request@w3.org
11/01/2007 04:43 PM
Please respond to
ebruchez@orbeon.com


To
"Forms WG (new)" <public-forms@w3.org>
cc

Subject
Re: Major problem with schema needs immediate attention.







John,

 > I think it is not lax.

That's a shame ;-)

 > I think it is strict except that type libs are imported in a way
 > that makes them be applied in a lax way.

It very similar to lax in that it has similar recursive processing to
lax. It seems that the only difference between what you and Leigh are
proposing and lax is in the process of determining whether a type
exists or not.

You say that if the element or attribute is in a namespace of a schema
with top-level element or attribute definitions, then 1) a definition
must exist and 2) it needs to be strictly valid.

lax just says that only if a definition exists, then it needs to be
strictly valid.

 > If you say lax validation, and you have a schema for a namespace
 > that contains top-level element or attribute declarations, then
 > undeclared elements in that namespace will not be flagged as errors
 > because the mode is lax.

Right, as said in XSLT 2.0:

   "In practice this means that the element or attribute being
    validated must conform to its declaration if a top-level
    declaration is available. If no such declaration is available, then
    the element or attribute is not validated, but its attributes and
    children are validated, again with lax validation. "

 > This is not desirable.

I am not sure why this is not desirable. That seems to say that lax is
not desirable. But lax is supported in XML Schema 1.0 and in XSLT
2.0. Do people really complain that lax doesn't solve an actual
problem? Maybe they do, I don't know.

It seems likely that there are use cases both ways. For lax, I may
have an incomplete schema for a particular namespace, and want the
elements that I define to be used for validation, but not the ones I
have not yet defined. This is not completely unreasonable I think.

 > If the schema for a namespace contains top-level element or
 > attribute declarations, then that structural validation should
 > apply.  So if instance data contains elements in the namespace that
 > are undeclared by the schema, then a validation error needs to
 > occur.

May occur, if that's what you happen to desire. It really doesn't
*need* to occur.

 > We only want to switch to lax mode when the schema for a namespace
 > contains no top-level element or attribute definitions since in this
 > case one is guaranteed to always fail validation, but such a schema
 > is useful for providing a type lib that declares no structures.

Well again, that's a possibility, it's not the only one.

I am really not convinced that there is a compelling case for a
processing model different for lax at this point (XForms 1.1). lax has
the benefit of being fully defined in XML Schema and supported by XSLT
2.0. We can add "strict" in XForms 1.2/2.0, which will cover most
other useful use cases. Just for the love of reuse I would choose lax
rather than our own processing model if our model doesn't have
substantive benefits over lax.

Maybe we should get the input of external schema heads out there.

Finally, there is the argument from Leigh that existing
implementations seem to follow that proposal. How compelling is that
evidence for XForms implementations that are actually used and under
development?

-Erik

-- 
Orbeon Forms - Web Forms for the Enterprise Done the Right Way
http://www.orbeon.com/
Received on Friday, 2 November 2007 00:13:43 UTC