RE: Major problem with schema needs immediate attention. from Klotz, Leigh on 2007-10-19 (public-forms@w3.org from October 2007)

From: Klotz, Leigh <Leigh.Klotz@xerox.com>
Date: Fri, 19 Oct 2007 13:05:18 -0700
To: "John Boyer" <boyerj@ca.ibm.com>, <ebruchez@orbeon.com>
Cc: "Forms WG (new)" <public-forms@w3.org>, <public-forms-request@w3.org>
Message-ID: <E254B0A7E0268949ABFE5EA97B7D0CF403A26DFB@USA7061MS01.na.xerox.net>
I don't see the need for any changes.  The XML schema processor doesn't say what interfaces must be provided by the XML schema validation software.
This issue is assuming that the policy is strict validation and that the presence of even a single type library with no element declarations would invalidate all instances.
That's simply not the case; the application itself (in this case, XForms) is in charge of deciding how to apply XML Schema validation.
Granted, if you use Xerces in Java and just say "go" it will try to validate everything, but others, (I believe Saxon-SA) won't, and offer more advanced interfaces.
But any issues I see here are about using OTS open source software with insufficienct interfaces to implement what's clearly allowed by the XML Schema standard.
 
Take a look at what Noah has to say on this issue:
 
>From http://www.schemavalid.com/faq/xml-schema.html#d4

Can schemas validate parts of an instance document?


Yes, for example XSV <http://www.w3.org/2000/09/webdata/xsv> , for example, will use "strict" mode if every element from the root down is schema-validatable, but "lax" mode if the root node - or any other element which is allowed to appear in some context - cannot itself be schema-validated. 

[Noah Mendelsohn] From xmlschema-1 <http://www.w3.org/TR/xmlschema-1/#validation_outcome> : "With a schema which satisfies the conditions expressed in Errors in Schema Construction and Structure (§7.1) above, the schema-validity of an element information item can be assessed.". It then goes on to say exactly how and against which declarations. Note that it says you can validate an "element", not necessarily the root element of a document. 

Net answer to your question: conforming processors can be written to validate any element you like. Not all processors need provide this service: buy or use processors that validate the information you need validated. By the way, the detailed rules give the processor a choice of validating the element against some particular identified element declaration, some particular identified complex type, or to use the mechanisms of strict, lax etc. to determine what to validate based on what declarations happen to be available. All of this is explained at xmlschema-1 <http://www.w3.org/TR/xmlschema-1/#validation_outcome> . 



________________________________

From: public-forms-request@w3.org [mailto:public-forms-request@w3.org] On Behalf Of John Boyer
Sent: Thursday, October 18, 2007 5:51 PM
To: ebruchez@orbeon.com
Cc: Forms WG (new); public-forms-request@w3.org
Subject: Re: Major problem with schema needs immediate attention.



Hi Erik, 

Regarding validate vs. schema attribute on instance, are you saying that if you have 

A.xsd: schema targetnamespace="A" 
B.xsd: schema targetnamespace="B" 

<instance validation="lax">  <e xmlns="A"/> 

Then both schema A and B will still apply to the instance but both will be applied with lax validation? 

This compared to 

<instance schema="A.xsd"> <e xmlns="A"/> 

To me, the latter is more compelling.  It directly says what schema are applicable, not how to apply the schema.  It does get even more compelling the more instances (and hence schema) become involved. 

Frankly, I do actually think there is a also a use for the validation attribute you advocate, which is interesting because it is another datapoint to suggest that the two are separate things.  Still a third point would be that XSLTs validation attribute is designed much more like our type MIP.  It is actually applicable anywhere in the result tree, so putting it on instance may be the *wrong* choice.  I could easily see schema on instance and validation as a MIP. 

Anyway, regarding 'not invented here' syndrome, I'd have to say the Forms team has done a pretty good job of demonstrating that we don't have the problem.  Proof points would be XPath and XML Schema.  Even though both create a few rough edges for us, they solved many many more problems than they created.  I think the same will be true of things like XSLT2's validation attribute (only that's a much smaller scale).  If we find we need it, it'll get pulled in, but if I had to guess then, as I said above, it would probably be as a MIP. 

That still leaves us with selecting schemas to apply.  To that end, I would say that we should not be so worried about 'not invented here' syndrome that we refuse to adopt new ideas *because* another group didn't think of it first, even when faced with a problem in the same domain. 

John M. Boyer, Ph.D.
STSM: Lotus Forms Architect and Researcher
Chair, W3C Forms Working Group
Workplace, Portal and Collaboration Software
IBM Victoria Software Lab
E-Mail: boyerj@ca.ibm.com  

Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer <http://www.ibm.com/developerworks/blogs/page/JohnBoyer> 





Erik Bruchez <ebruchez@orbeon.com> 
Sent by: public-forms-request@w3.org 

10/18/2007 04:08 PM 
Please respond to
ebruchez@orbeon.com


To
"Forms WG (new)" <public-forms@w3.org> 
cc
Subject
Re: Major problem with schema needs immediate attention.

	





John & all,

Sorry for the over-long reply below.

> Michael Sperberg-McQueen gave a talk at XML 2005 in which he did
> indeed describe the appropriateness of deciding to use different
> schema on the same data for different purposes. It is absolutely
> under the control of the application to decide whether and which
> schema to apply to *any* XML, and it has nothing at all to do with
> having a "pseudo-lax" schema method.  What I believe we are talking
> about is how the application called XForms should decide whether and
> which schema to apply to pieces of XML that it manages.  Once the
> schema are selected, the validation is strict.

Agreed, XForms has 100% control over how it decides to apply schema
validation.

However, a design principle I believe is good consists in leveraging
or reuse as much as possible from existing (good) work. This has many
benefits, including less time spent devising new solutions to existing
problems (the XForms WG really doesn't have the bandwidth to lose time
on such things), and consistency across specifications, in this case
between XSLT 2.0 and XForms.

This is the exact same reason I am very much against us defining new
XPath functions to solve problems that already have been solved by
XPath 2.0. I really don't want to duplicate the work or, even worse,
propose solutions that are not better than existing ones but that are
simply different.

So yes, we can devise our own way of applying schemas to instance
data, but if XSLT 2.0 has done something that can work for us, then I
think that we should do a maximum to adopt that. And I believe at the
moment that XSLT 2.0 does solve our issues so I am pushing things in
that directly.

The bottom line: I would like to make sure we are not having a case of
"not invented here" syndrome.

> For my own part, I presented the method we currently use to work
> around the problem using a "common sense works pretty well but not
> perfectly" approach that does not happen to require extension
> attributes unsupported by the XForms schema.

The reason I used "pseudo-lax" was BTW not to be derogatory, but
because the "lax" validation algorithm does what you propose, except
it recurses down the XML document to test all the elements and
attributes. So your solution is a sort of "one-level lax" validation
mode, if you prefer ;-)

> You've proposed that perhaps we should use attributes like those in
> XSLT 2.0.  Based on what I've seen in your last call comment and in
> a brief look at Section 19 of XSLT2, it doesn't look appropriate for
> XForms 1.1.  It seems like the XSLT2 attributes solve a different
> problem.

I don't think so, and that's what I have been trying to argue!

> 1) One problem is that XML Schema 1.0 does not mandate that
> implementations provide an execution mode other than strict, and I
> know at least one mainstream schema engine that could support the
> XSLT 2.0 attributes as applied to XForms, which may not even have
> the same semantic as XSLT2 in any case.

Are you saying that it would be an issue to have to require tighter
integration with the schema validator, like XSLT mandates?

If so I can relate to that. For example, at our last f2f, I expressed
surprise at the (very late) realization that we cannot simply use a
stock XPath validator to implement the depency system!

However, my experience is that you can implement lax validation fairly
easily with some existing validators, in our case MSV. Lax validation,
if not directly supported by your validator, requires that you to
obtain from the validator a list of top-level types, and then the
capability of validating a sub-tree according to a type when you find
a matching element or attribute.

> 2) The validate attribute seems to hit the wrong problem, the
> modality of the schema engine as opposed to the applicability of
> schema to the instance data.  I have long felt that XForms 1.0 has
> the design flaw that the schema attribute is attached to the model
> and not to the instance element.

> I think this may be left over from the good old days when a model
> only had one instance.  If we did go with a solution for XForms 1.1
> that added markup, I would rather see this:
>
> <instance id="X" schema="X.xsd Y.xsd #inline-schema"> ...

> If you don't include a schema attribute on an instance, then I think no
> schema should applicable to it.
> The schema attribute on instance would make the selection of schema for
> the instance direct and explicit, and it makes processing most efficient.

I agree partly with this.

However it is intersting to note that XSLT 2.0 actually does things
very much the XForms way by allowing you to import a number of schemas
into a stylesheet! Then XSLT defines with attributes how those schemas
apply to a resulting XML documents. We have a very similar situation
in XForms, except our resulting documents are instances. Really, the
parallel is striking to me!

Given what XSLT 2.0 has done, I now think that importing schemas at
the top-level in an XForms model is perfectly acceptable, as long as
we add to instances attributes similar to what was done in XSLT.

Also note that XSLT allows you to also specify a @type attribute, if
you really want to mandate a particular type for a document. This
would do the job of selecting which exact schema definition, from the
list of imported schemas, must apply to the root element of the
instance.

In addition, just adding a schema or list of schemas on an instance
does seem less powerful than what XSLT 2.0 allows you do to.

> 3) I also didn't like the validation attribute because I didn't feel
> adding it to XForms is not really "futureproof".  We have long felt
> that we need to make the schema engine a pluggable component of
> XForms.  The validation attribute and its values are very XML Schema
> centric, i.e.  they configure the processing model of the XML schema
> engine, so the attribute would be useful and possibly even confusing
> when another schema engine is being used.  By comparison, a schema
> attribute on instance is just a schema selector to indicate which
> schema are applicable to the instance, so it is schema engine
> neutral.

This is a good point.

I agree we need to make sure we can be as schema-neutral as possible,
and at least not close doors. This may be unfortunately, as Leigh
suggested during the last call, something we must work on after 1.1
though.

We now have schemas imported on the xforms:model element. We can't
really get rid of this feature easily I think. And we have xsi:type
processing taking place.

Still, I can see how such attributes could have a defined meaning only
with certain schema languages. But even with Relax NG, a @type
attribute can have meaning.

> In conclusion, if we could settle on a method a little more like
> what I first proposed, it might help us to provide guidance for
> XForms 1.0 processors today, not just XForms 1.1.

I get this point. I don't dislike your solution, and it is simple, but
I really dislike the fact that it doesn't seem to match something done
in other specs, specifically XSLT 2.0. It is also not just a subset of
what XSLT 2.0 does, i.e. if you then decide that lax validation is
what you want by default (as we do now in Orbeon Forms), then the
outcome may be different from just checking the root element.

> But if a new attribute is required, it seems to be schema and not
> validate that is needed.

Given my blah-blah above, at the moment I don't agree with this last
statement.

-Erik

-- 
Orbeon Forms - Web Forms for the Enterprise Done the Right Way
http://www.orbeon.com/ <http://www.orbeon.com/>
Received on Friday, 19 October 2007 20:07:34 UTC