Major problem with schema needs immediate attention.

We have only three last call comments for which no resolution has been 
made.
One of them is a bit hard, and so we really need you guys to have a close 
look at the issues in the next few days.

Please see issue 87 
http://htmlwg.mn.aptest.com/cgi-bin/xforms-issues/Model?id=87;user=guest;statetype=1;upostype=-1;changetype=-1;restype=-1

I added some additional comments.

The problem happens because we say that we validate nodes against all 
"applicable" schema declarations, but we do not rigorously define what 
"applicable" means.  Many of us think we have an idea, but the idea I have 
heard expressed does not actually work out very well in the context of the 
XML schema algorithm itself, over which we have no control.  You can get 
all the "applicable" schema declarations, to be sure, but the only way to 
get them is to run validations that would invalidate most forms that 
attempt to use schema.

The usual claim we make is that "all schema are available to every 
instance" because any instance could use elements from the namespace of 
any schema. But unfortunately, XML schema defines the fact that "strict" 
is the default mode, and I see nothing that overrides that unless the 
schema itself declares a lax or skip mode for a particular element OR 
schema engines *may* lax validate an element's content after strict 
validation has failed due to not finding an appropriate declaration for 
the element.

So here is a trivial example:

schema targetnamespace="A"
schema targetnamespace="B"

instance <e xmlns="A"/>
instance <f xmlns="B"/>

If we claim that all schemas apply to all instances, then because schema 
validation is strict, you will find that instance A is invalid by schema B 
and instance B is invalid by schema A. 
Hence, you will not be able to submit any data simply because you decided 
to use two instances in the same form!
In both cases, the errors occur because element declarations cannot be 
found. 
Yet, this type of error cannot be ignored because we do expect that if 
element e must have children A and B only, then you will get an error if 
you put in some element C for which the schema has no declaration.

We hit an even simpler case in the field quite some time ago involving 
just one schema!
schema targetnamespace="A"
instance <e xmlns="A"/>
instance <f xmlns=""/>

The second instance is always invalid even though there are really no 
schema that one might reasonably conclude are "applicable"

So, cards on the table time.

We handle this issue by defining "applicable" as follows:  the namespace 
of the root element of the instance must match the target namespace of the 
schema.  By only applying that one schema, the processor is faster, but it 
also doesn't step on the above landmines.  We took the view that an 
instance of the form <A:a><B:b/></A:a> would correspond to a schema for A 
that *included* the schema for B.  I think the screw case there is the 
soap envelope for a web service, but I can't remember for sure and it's 
2:30am right now, so I'll have to get back to you later on that.

Meanwhile, maybe the above solution is satisfactory or maybe it isn't. 
Please take the time to weigh in on this issue in the next day or two so 
we can have the right kind of discussion on the list that will allow us to 
close this on the next telecon.

Thank you,
John M. Boyer, Ph.D.
STSM: Lotus Forms Architect and Researcher
Chair, W3C Forms Working Group
Workplace, Portal and Collaboration Software
IBM Victoria Software Lab
E-Mail: boyerj@ca.ibm.com 

Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer

Received on Thursday, 18 October 2007 09:37:55 UTC