Re: the UPA-constraint and danish word division

> So I followed Mchaels advice and raised the issue with the W3C 
> schema WG. I cite a few paragraphs here: 

I'm sure I speak for others in the Schema workgroup when I say that we 
appreciate the care you've taken in documenting this issue, and bringing 
it to the workgroup in a manner that facilitates our consideration of it.
 
> "For document authoring purposes, it is of the greatest importance, 
> that authors feel confident, that the underlying schema actually 
> tells him exactly what he is allowed to – or what possiblities he 
> has. Running a post-editing process to find out that the insertion 
> you made of some element   is actually invalid (and you made it 
> because the schema-aware software actually proposed this operation 
> to you!), would possibly weaken your confidence in the schema as 
> being a precise and trust-worthy implementation of the editorial 
> principles, that rules the type of text, you work with. 

Speaking for myself, I do think that schema-driven editing systems are a 
good idea, but I also think that the vision implied by your statement 
above is a bit strong.  As the recommendation itself says regarding the 
purpose of the language:

"Any application that consumes well-formed XML can use the XML Schema: 
Structures  formalism to express syntactic, structural and value 
constraints applicable to its document instances. The XML Schema: 
Structures formalism allows a useful level of constraint checking to be 
described and implemented for a wide spectrum of XML applications. 
However, the language defined by this specification does not attempt to 
provide all the facilities that might be needed by any  application. Some 
applications may require constraint capabilities not expressible in this 
language, and so may need to perform their own additional validations."

The conundrum is that it's tempting for any given application to say: 
surely that caveat doesn't apply in this case;  I'd like the schema to 
directly capture all of my interesting constraints.  Or stated 
differently, I'd like to write a schema that allows me to validate exactly 
the instance language that I find convenient for expressing my 
information, and to correctly reject all other instances.  A mathematician 
might reasonably say:  I want to define a "prime number" type, as a 
restriction of xsd:integer.  To validate primeness we would presumably 
have to provide a non-declarative Turing-complete validation language, 
something we felt had other downsides.  In general, there are and probably 
should be limits to the degree of validation one can expect out of the 
schema language.  Presumably you have some wish or expectation that XML 
Schema could capture the constraints on correct hyphenation of Danish 
words, but I doubt you would expect it to capture the rules for their 
spelling, for example.  I do understand that you are trying to capture the 
hyphenation with explicit XML markup, and so it's tempting to expect 
schema to enforce the rules at just that level.

So I think the right question isn't:  would there be useful grammars you 
can't write today that you could write if UPA were eliminated.  The answer 
is clearly yes.  I think it's also clearly useful at times to build 
schema-driven or certainly schema-aware editing systems.  The question is 
whether the advantages of expressing your application constraints exactly 
in the schema language outweigh the advantages that some (though not all) 
users of schema and implementors of validators derive from the strict 
particle to instance matching that's given by UPA.  That clearly is a 
question on which members of the XML community and members of the Schema 
workgroup disagree, sometimes passionately.

In any case, having your input to the ongoing discussion of that question 
is indeed very useful and much appreciated.  Thank you!

Noah


[1] http://www.w3.org/TR/2004/PER-xmlschema-1-20040318/#intro-purpose


--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Received on Wednesday, 20 September 2006 12:21:53 UTC