W3C home > Mailing lists > Public > xmlschema-dev@w3.org > November 2005

Re: SV: SV: SV: SV: Schema help

From: Jack Lindsey <tuquenukem@hotmail.com>
Date: Sun, 20 Nov 2005 01:12:42 -0500
Message-ID: <BAY102-F301EF3AEFFEA983EB6A491D7500@phx.gbl>
To: noah_mendelsohn@us.ibm.com
Cc: brs@itst.dk, mike@saxonica.com, petexmldev@tech-know-ware.com, xmlschema-dev@w3.org

XSD-Schematron Synergy

I can't believe the Schema WG is even considering using its apparently 
meagre resources to address co-occurence constraints.  I always assumed the 
lack of any progress in this area was due to philosophical objections among 
your luminaries.

The strength of XSD is that it is easy (using graphical IDEs anyway) to 
quickly model extensive data structures.  Doing this in Schematron would be 
lengthy and laborious.  But Schematron's rule-based approach is ideal for 
specifying co-occurence constraints and co-occurence constraints by value, 

1) can be expressed either positively or negatively,
2) can be applied equally to both elements and attributes, and
3) can create dependencies between schema elements regardless of their 
relative positions within a data structure.

I can't imagine how you could cobble this functionality on to XSD and retain 
its simplicity and elegance.  What is more, Schematron achieves this through 
XPath expressions and so everything is still within the W3C family.  Except 
that Schematron itself is (still) about to become an ISO standard.  Perhaps 
there is still time to repatriate it?  Together they are very complementary 
- a great tag team.

I agree with Bryan that something better than the appinfo invocation is 
required.  I hear that some consider that a potential security flaw (the 
GJXDM community?).  Also I think I read recently that there is already a 
.Net thing that let's you specify both a schema and a Schematron stylesheet 
and executes them in a single phase???

Perhaps some of your resources would be better spent making sure that 
functionality is not duplicated and the W3C family of vocabularies does not 
become more un-orthogonal (?).  Case in point, Bryan's other recent post 
about the validation incompatibility between XSD and XInclude concerning 
xml:* attributes.  There again Microsoft is helping its users over the bump 
in the road with its own improvised solution.  Too much of that kind of 
thing can't be good for the W3C's health!

xsd:restriction Headaches

Harking back to the example at the beginning of this thread, both E-R and 
UML modelers would find it very natural to implement a taxonomy of task 
types using XSD's Substitution Group feature.  A supertype complexType 
called TaskType would define the common task elements.  Then subtype 
complexTypes would be derived by extension to add the specific elements 
required by TaskType1 versus TaskType2, etc..  The TaskType1 and TaskType2 
elements would then declare themselves members of the Task element's group.  
That way they could be validly substituted in any situation where Task could 
be used.  So in an instance, instead of:

    <!-- common task elements here -->
        <!-- Task 1 things -->

You get:

    <!-- common task elements here -->
    <!-- Task 1 things -->

The tags retain their semantic value and inheritance cures the bloat.  I 
have depended on this approach to implement multi-level class hierarchies 
and I think it is a great feature of XSD.

However, this kind of inheritance can often result in the availability of 
elements that do not make sense in the context of certain leaf level 
subtypes.  And in general, depending on the stage in a business process, XML 
transactions with very similar content in fact have different rules 
concerning which elements are required, optional or prohibited.  Then 
everyone wants to start applying restrictions and the first instinct is to 
screw everything to the floor by applying XSD's validation features to the 
extreme, especially your DBA folk. This is where the xsd:restriction 
headaches begin.

1) They are a maintenance liability because you have to spell out everything 
you want to allow again, except for attributes which are the opposite 
because by default they are all permitted unless you specifcally prohibit 
them (I did once convince myself that that made some sense but I've 
forgotten why!).

2) You cannot apply both extensions and restrictions in a single step, 
forcing the creation of intermediate artifacts that are not intended for use 
by anyone and serve only to confuse and sometimes alarm people you thought 
were your closest friends (a slight change in syntax could avoid this by 
allowing both restriction and extension elements within a complexType 
definition - I'm not asking for any change in the underlying rules, just the 
avoidance of the useless intermediate type.

3) Creating partner-specific variants of a standard community schema 
involving restrictions against deeply nestedl data models is hopelessly 
impractical.  The resulting schema cuts the original to ribbons and is 
barely still recognizable.

This is the point where you say, "Y'know there is no way W3C Schema can 
apply all the validation your programmers will demand, like co-occurence by 
value, so is it really worth doing all this when it will be largely 
duplicated by the receiving programs anyway?"

So what is the answer?  Relaxed common schemas with context-specific rules 
applied by Schematron-generated XSLT stylesheets.  At least that was the 
conclusion OAGIS came to when they switched from DTDs to XSDs, and they 
posted their disenchantment with xsd:restriction in this very forum.  How 
many years ago was that?  Furthermore, how many times have Jeni, Mike and 
others ended their responses with words like "...and you may want to 
investigate other techologies such as Schematron to achieve what you want."

But there is resistence because the project leaders say, "But Schematron 
isn't W3C, and you want *another* validation phase?"   But this is where I 
came in.

I apologize in advance, for I am bound to be stepping on someone's corns 
when I ask, might I be forgiven for suggesting that perhaps you guys on the 
Schema WG just haven't been listening?


Jack Lindsey

>From: noah_mendelsohn@us.ibm.com
>To: Bryan Rasmussen <brs@itst.dk>
>CC: "'Michael Kay'" <mike@saxonica.com>,        
>"'petexmldev@tech-know-ware.com'" <petexmldev@tech-know-ware.com>,        
>"'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>
>Subject: Re: SV: SV: SV: SV: Schema help
>Date: Fri, 18 Nov 2005 09:39:31 -0500
>I  understand, and although I don't speak officially for the workgroup, I
>want to be sure you feel that your suggestions are being heard.  One thing
>that would help, if you have not already done so, would be to mail this
>suggestion to www-xml-schema-comments@w3.org, which is the official
>comments list for the schema specification.  We formally review the
>comments received at that list, and we either open new trackable issues or
>ensure that issues we are already tracking cover them.  Please make clear
>that you are specifically endorsing schematron as a solution, as otherwise
>twithhe WG might just view this as just another request for some form of
>co-constraints, and that's been a tracked request for some time.  I'm also
>copying David Ezell, or WG chair, on this reply.   Thank you very much.
>Noah Mendelsohn
>IBM Corporation
>One Rogers Street
>Cambridge, MA 02142
>Bryan Rasmussen <brs@itst.dk>
>11/18/2005 03:31 AM
>         To:     "'noah_mendelsohn@us.ibm.com'"
>         cc:     "'petexmldev@tech-know-ware.com'"
><petexmldev@tech-know-ware.com>, "'xmlschema-dev@w3.org'"
><xmlschema-dev@w3.org>, "'Michael Kay'" <mike@saxonica.com>
>         Subject:        SV: SV: SV: SV: Schema help
>Well on the subject of co-occurence constraints I would just like to
>reiterate what I said earlier, with some extension:
>Given that there is likely to be some argument in W3C as to how far such
>constraints should be implemented I doubt they will come out as powerful
>Schematron constraints, furthermore I have a hard time seeing this as
>producing a syntax as nice as Schematron, therefore I would really like to
>see something like:
>1. XML Schema adopts Schematron as an extension language of some sort.
>2. XML Schema puts some thought into how Schematron can be combined with
>Schema to the benefit of both, beyond the normal  method of drop
>in appinfo.
>I have some ideas on #2, but I'm somewhat conflicted about them - what
>makes sense, syntax etc. so I don't really want to just blurt out with it.
>I'd be more interested in hearing what kinds of things other people could
>see connecting the two languages.
>Bryan Rasmussen
>-----Oprindelig meddelelse-----
>Fra: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com]
>Sendt: 17. november 2005 18:51
>Til: Bryan Rasmussen
>Cc: 'Michael Kay'; 'petexmldev@tech-know-ware.com';
>Emne: Re: SV: SV: SV: Schema help
>Well, I think there are good reasons from time to time to revisit the
>effectiveness of the W3C process and the compromises embodied therein. I'm
>not convinced that a deep dive on that is the best use of this particular
>mailing list.   I happen to like the working groups I've been on that do
>their work in public (in my case, both the TAG and XMLP) and I'd be happy
>for schema to go the same way.  Then again, I really don't think that's a
>substitute for having people who have 30% of their time committed to
>working on a technology.  There's a lot of detail work and care required
>to revise a specification even if there's agreement on the general ideas.
>The discussions need to involve people who have the knowledge and the time
>commitment to work through interactions with existing features of the
>specification.  In the case of co-constraints, it would seem to me that
>there ought to be a careful look taken at the relationship between the
>existing key/keyref/unique constraint mechanisms and anything new that's
>proposed.  It would be nice to believe that we wouldn't just be sprouting
>new and uncoordinated ways of doing things every few years.
>So, I personally welcome broader input, but what we're really short of are
>the people who can edit the specification text, draft prose, be
>responsible for the details, etc.  Of course, there are also lots of other
>messy issues to consider when you change the working mode of a group
>including anti-trust laws in various jurisdictions, IP issues, etc.  If
>people feel that they have ideas for how the W3C can do these things
>better, I think the right place to go would be to the W3C staff and/or the
>workgroup chairs.  I personally would not be against having the schema WG
>switch to using a public mailing list for its discussions.  I suspect that
>requires a recharter, but in principle I'm fine with it.  I don't think
>that will solve much of our resource problems.  We don't lack for people
>with good ideas, in email or in person.  We're missing the people to do
>the archticture and drafting work that goes into making all the details
>fit together.   It's hard to do that well without meeting F2F from time to
>Noah Mendelsohn
>IBM Corporation
>One Rogers Street
>Cambridge, MA 02142
>Bryan Rasmussen <brs@itst.dk>
>Sent by: xmlschema-dev-request@w3.org
>11/17/2005 04:59 AM
>         To:     "'Michael Kay'" <mike@saxonica.com>
>         cc:     "'xmlschema-dev@w3.org'" <xmlschema-dev@w3.org>,
>"'petexmldev@tech-know-ware.com'" <petexmldev@tech-know-ware.com>, (bcc:
>Noah Mendelsohn/Cambridge/IBM)
>         Subject:        SV: SV: SV: Schema help
>Damn, an earlier typo in the email address of Pete Cordell added in by me
>was replicated in your email. Just on the off chance this thread goes any
>further I thought I should correct it. I've cc'ed Pete on this mail. Sorry
>for the problem.
>Bryan Rasmussen
>-----Oprindelig meddelelse-----
>Fra: Michael Kay [mailto:mike@saxonica.com]
>Sendt: 17. november 2005 10:47
>Til: noah_mendelsohn@us.ibm.com; Bryan Rasmussen
>Cc: xmlschema-dev@w3.org; ',petexmldev@tech-know-ware.com'
>Emne: RE: SV: SV: Schema help
> > 1) Although most widely used schema validators are fairly
> > slow, one can in
> > fact implement the XML schema rules at quite high speed.  My team is
> > hoping to publish some work in that area in coming months,
> > and I suspect
> > that others in the industry are working in the same
> > direction.  I think
> > it's important to the success of any technology we choose
> > that it be able
> > to meet the performance needs of our customers.
>I would resist this kind of thinking. SQL was successful because it put
>functionality first, and left implementors to devise optimisation
>strategies. Users need a constraint language that is capable of expressing
>arbitrary constraints on the content of a document, and it should be left
>the implementor to work out which of these constraints can be evaluated in
>streaming mode and which can't.
>SQL today allows the full power of the query language to be used to
>integrity contraints, and users learn when they need to restrict their
>ambitions to meet performance requirements. 90% of applications aren't
>performance critical anyway.
>There's no point telling users to go and use some other technology to do
>their validation, the other technology isn't going to be fast either.
>Michael Kay

Take charge with a pop-up guard built on patented Microsoft® SmartScreen 
  Start enjoying all the benefits of MSN® Premium right now and get the 
first two months FREE*.
Received on Sunday, 20 November 2005 06:12:56 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:56:09 UTC