W3C home > Mailing lists > Public > www-xml-schema-comments@w3.org > April to June 2009

RE: Comment on XSD 1.1

From: Michael Kay <mike@saxonica.com>
Date: Wed, 13 May 2009 17:01:12 +0100
To: "'Rick Jelliffe'" <rjelliffe@allette.com.au>, <www-xml-schema-comments@w3.org>, <www-tag@w3.org>
Message-ID: <077FDF6E2A724677ABA1B530EBE71042@Sealion>
Personal response:

The goals for XSD 1.1 were relatively modest: they are described here

http://www.w3.org/TR/2003/WD-xmlschema-11-req-20030121/

The specification was not designed to address the known problems with the
complexity of the 1.0 specification, which arise in large measure because of
the sheer variety of requirements it originally set out to meet. Rather, XSD
was designed as a modest backwards-compatible increment from XSD 1.0
designed to remove some of the deficiencies of that specification in terms
of its technical capabilities; in my view, it has succeeded admirably in
doing that. Though of course everyone would like it to have been done rather
sooner, and most of us who were involved I'm sure feel that it could have
been done better if only there weren't so many different views on each
topic. 

You are arguing that it would have been better to do something more radical.
Well, if we were having a debate in 2003 about how to allocate resources,
then you might be right. But it's a false premise to suggest that the two
were alternatives (there's no reason why one can't produce a better version
of language A in parallel with producing a new language B); moreover,
there's no finite pool of resources to be allocated (the people interested
in improving A are probably not the same people as those interested in
designing B). Furthermore, we are no longer in 2003, and XSD 1.1 is
essentially finished, so the only choice the community now needs to make is
to decide whether it is an improvement on XSD 1.0 (that is, a sufficient
improvement to justify the cost of implementing and adopting it). You don't
seem to say anything to suggest that this is not the case, so it's hard to
see how your arguments are relevant to the decisions that the community now
needs to make.

By saying that there are big problems with XSD, you aren't saying anything
new or anything that we don't all know. But the fact is, that the software
industry and the user community has an enormous investment in this
technology, and XSD 1.1 needs to be judged on the benefits it offers to this
community, not on the benefits that some hypothetical alternative might have
delivered if we had taken a different direction in 2003.

W3C organized a workshop in 2005 designed to analyze user experience with
XSD 1.0: see

http://www.w3.org/2005/03/xml-schema-user-program

Some hoped that this would provide a springboard to generate the
requirements for a refactoring or layering of the kind you described.
Unfortunately, it failed to do so: although I was not present, my
understanding is that it essentially confirmed that all the requirements
that XSD 1.0 aimed to satisfy were real, and that all the features of XSD
1.0 were needed by someone.


Michael Kay
http://www.saxonica.com/



 
> -----Original Message-----
> From: www-xml-schema-comments-request@w3.org 
> [mailto:www-xml-schema-comments-request@w3.org] On Behalf Of 
> Rick Jelliffe
> Sent: 13 May 2009 16:26
> To: www-xml-schema-comments@w3.org; www-tag@w3.org
> Subject: Comment on XSD 1.1
> 
> I would like to register with the W3C TAG and the W3C XML 
> Schema WG that, on having considered the XSD 1.1 draft, I 
> think it is exactly the wrong direction for the WG and W3C to 
> be taking.  That is, while each individual decision may be 
> well-founded, and each change justifiable and beneficial, the 
> total effect will not help get us out of the mess that XML 
> Schemas has created, but mire us further in it.
> 
> I see this as highly analogous to the situation with the SGML 
> 5-year review at ISO in the early 1990s. Many small solutions 
> to individual problems had been made, and many wizz-bang new 
> ideas added, and there were many worthy new things on the cards.
> 
> But the fundamental problem was SGML was too big. The 
> approach was of course to slim it down to XML, and to 
> reintroduce many of the cast-off features and ideas (DTDs, 
> modules) into layers on top of XML (schemas,
> namespaces.)
> 
> (A further parallel may indeed be that a change in forum was 
> necessary in order to get this change: in a certain sense the 
> original developers of SGML were "part of the problem" not 
> "part of the solution."  Not because of malice or ineptitude; 
> quite the reverse. The dynamics, personalities and goals of 
> the working group were only capable of change in the 
> direction of neatness and expansion. Indeed, I know that many 
> on the W3C Schema WG are acutely aware of these issues, but 
> perhaps the stars have never been aligned to address this. 
> Since the W3C TAG itself has such a rich representation from 
> the XML Schema WG, I hope that they may be conduits for 
> fresh-thinking from the TAG and not conduits for 
> rationalizations from the Schema WG.)
> 
> Comments on the problem
> ---------------------------
> 
> That XML Schemas is in a crisis and has failed to meet some 
> of its basic goals can be seen by the work on XML Schema 
> Patterns for Databinding. 
> That two such comprehensive lists were necessary is a sign of 
> bad layering.
> 
> Indeed, if considering the original requirements document for 
> XML Schemas, http://www.w3.org/TR/NOTE-xml-schema-req, its 
> shortcomings become more manifest. For example, in the Usage 
> Scenarios, XML Schemas has not been successful for
>  
>  4) Traditional document authoring/editing governed by schema 
> constraints. 
>     (DTDs and RELAX NG have large inroads in this area. For 
> example OASIS ODF. I note that even for the
>     XML Schemas for ISO OOXML [DIS29500], which had been 
> written to use a very conservative subset of
>     XML Schemas, it turned out that Xerces would not accept 
> schemas allowed by Microsoft's validators,
>     both of which being well-regarded and mature implementations. )
> 
>  5) Use schema to help query formulation and optimization. 
>     (The current draft has to change its type model to fit XQuery)
> 
>  6) Open and uniform transfer of data between applications, 
> including databases
>     (See the databinding comments above.)
> 
> Furthermore even in the online application scenarios 1, 2, 3, 
> and 7, the heavy weight processing that XML Schemas requires 
> and the complexity of its concepts has meant that it is 
> rarely actually used for validation, even as it is so 
> inadequate for databinding. 
> 
> So if it is not congenial for validation, and it is not a 
> success for reliable databinding, is it at least good for 
> documentation?  In fact, the verbosity of XML Schemas makes 
> it utterly unusable for presenting to humans to understand a 
> document's structure. In this regard, I note that the recent 
> HTML 5 drafts have reverted to something akin to RELAX NG 
> Compact Syntax (which looks like DTD content models and has a 
> standard mapping to the XML form.) 
> 
> Further if XML Schemas is not useable for documentation, is 
> it useful for generating useful validation messages for 
> humans? The answer is clearly that the messages produced by 
> implementations of XML Schemas are not much use, particular 
> the obscure structural messages. As someone who has both 
> implemented most of XML Schemas  (a converter to Schematron) 
> and who has customized the messages from various schema 
> processors, I don't see how some of the messages can be made 
> human-friendly, since they relate to obscure rules in XML Schemas.
> 
> And if XML Schemas is not good for validation, does it redeem 
> itself by winning over implementers with a good standard?  It 
> is no secret that the XML Schemas Structures standard is the 
> very model of an impenetrable, guru-inducing standard. But, 
> having work in the W3C XML Schema WG at the time of the first 
> release, and deeply respecting the editors and working group 
> members, I believe this is not a fixable fault with the 
> documentation, but a reflection of the brain-numbing technology.
> 
> I have two personal anecdotes about this. In 2001 I had a 
> contract from Manning Press to write a book on XML Schemas, 
> in particular explaining the standard. After three months of 
> full-time work on this, I abandoned the project and repaid my 
> advances at my loss, because I decided that trying to make a 
> silk purse out of a sow's ear would be either impossible or 
> irresponsible.  The second anecdote is that when making our 
> implementation of XML Schemas (a project initially funded by 
> JSTOR which is making its leisurely way towards open source ) 
> we twice had programmers threaten to resign because working 
> on XML Schemas implementation was too unpleasant. One of 
> these programmers was subsequently headhunted by Microsoft 
> and the other is currently working on his PhD. in Computer 
> Science so they are not idiots or defeatists; and we have a 
> history of high retention rates.
> 
> Continuing further looking at the original requirements, we 
> see the following puported design principles:
> 
>    1. more expressive than XML DTDs;
>    2. expressed in XML;
>    3. self-describing;
>    4. usable by a wide variety of applications that employ XML;
>    5. straightforwardly usable on the Internet;
>    6. optimized for interoperability;
>    7. simple enough to implement with modest design and 
> runtime resources;
>    8. coordinated with relevant W3C specs (XML Information Set, Links,
>       Namespaces, Pointers, Style and Syntax, as well as DOM, 
> HTML, and
>       RDF Schema).
> 
> I contend that it is apparent that the changes proposed for 
> XML Schemas
> 1.1 do nothing to address the shortfalls in meeting these 
> goals that have been a bugbear since XML Schemas 1.0.  In 
> particular, it fails
>      4. see the databinding and related comments above
>      5. there is nothing straightforward about  XSD, and  it 
> is too verbose to  download
>      6. see the databinding and related comments above: it is 
> manestly a disaster for interoperability
>      7. XSD is manifestly not simple to implement
>      8. the PSVI (post-schema validation infoset) represents 
> a fundamental break with the basic relevant XML Specs. 
> Indeed, it might be said that XML Schemas are not schemas for 
> documents, but schemas for databases that have an XML 
> serialization. The two are not the same.
> 
> So, allowing for argument that XML Schemas may be so 
> deficient in these areas and so complex, can it justify 
> itself as allowing very sophisticated document constraints? 
> Clearly the answer is no, certainly for Part 1 Structures. 
> The rival language to XML Schemas (i.e. OASIS/ISO RELAX NG) 
> is far more powerful, and the alternative (which is also a
> complement) for non-grammar/non-datatype constraints and 
> assertions (i.e. ISO Schematron) is far more powerful.
> 
> XML Schemas has a very poor bang-per-buck ratio. There are 
> many significant classes of document structures it is 
> incapable of being useful for: for example, SVG, XSLT.  
> Indeed, it may be argued that these kinds of tricky 
> structures are exactly the kinds of structures most calling 
> out for validation.
> 
> Finally, if the language is not very good for structural 
> constraints, is it at least good for document evolution? The 
> answer here again is no. 
> Experience with large schemas has shown that the XML Schemas 
> complex type derivation facilities are quite bogus:  the type 
> extension mechanism introduces not only an extra concept, but 
> causes a fragile base-class-like problem for maintenance. And 
> the type derivation by restriction mechanism does not 
> simplify declarations. 
> 
> I do have many other specific issues as well, which I won't 
> bore readers
> with: they can be summarized by the comment that XML Schemas 
> 1.1 may address the kinds of problems that you might want to 
> validate in 1999, but not the kinds of problems found in XML 
> as practised in 2009: for example, foreign codeslists, and 
> the abandonment of large XML documents in favour of either 
> XML-in-ZIP or XML-on-filesystem collections of smaller 
> documents linked by URL and other IDs.
> 
> I should acknowledge that there are indeed many successful 
> uses of XML Schemas. I see no evidence that these successful 
> uses are because of any particular excellence in XML Schemas 
> that would not be possible in other schema languages.
> 
> A proposed solution
> ---------------------
> 
> I therefore ask the TAG to instruct, influence or otherwise 
> encourage the XML Schema Working Group to put XSD 1.1 on hold 
> and instead to work on a radical relayering into a two-layer 
> model. Some of the XSD 1.1 changes would make their way into 
> the basic layer, some would make their way into the advanced 
> layer which would be equivalent to the proposed XSD 1.1.
> 
> In concrete terms, I propose this:
> 
> 1) A radically simpler schema language, compatible as much as 
> a possible with the current XSD 1.0 syntax, be created. It 
> should have the following properties:
> 
>      i) It should follow ISO RELAX NG in all relevant design 
> decisions, and be trivially translatable to and from RELAX NG.
>      ii) In doing so, it should remove as many of the 
> patterns identified as problematic for databinding
>      iii) It should have no concept of structural type 
> derivation: no extension or restriction of complex types. It 
> need not support any simple type derivation or facets, though 
> it would support those the built-in derived types of XSD.
>      iv) It should have no obscure rules such as UPA that are 
> not required by RELAX NG.
>      v) It should have no constraints or requirements for 
> streamable implementation
> 
>  2) A secondary layer which adds:
>     
>      i) Complex type derivation
>      ii) UPA, naming, and other obscure rules
>      iii) Features problematic for databinding and to allow 
> streaming validation would be allowed
> 
> The bottom line is that the new simpler language would not be 
> type-based, nor would it require 1-unambigous schemas. Both 
> those things, which are currently presented as core to the 
> mechanics of XML Schemas would become additional assertions 
> to be used or checked by the full language and its processors.
> 
> There are many details and issues, of course, but I believe 
> this is more straightforward than may be thought. In any 
> case, it is necessary to bring XML Schemas to its full 
> potential for being useful on the web, rather than the 
> hindrance and snare it currently is.  There is a 
> misapprehension, in particular, that RELAX NG cannot be used 
> for databinding; in fact, the Java API for ODF was created by 
> a databinding tool for RELAX NG, so this is hardly
> 
> Cheers
> Rick Jelliffe
> 
> Editor, ISO/IEC 19757-3:2006 Information technology -- 
> Document Schema Definition Language (DSDL) -- Part 3: 
> Rule-based validation -- Schematron
> 
> Invited expert, ISO/IEC SC34 WG1 Schema languages Invited 
> expert, ISO/IEC SC34 WG4 Office Open XML Formally Australian 
> delegate, ISO/IEC WG 8 (e.g. SC34) Formerly member (for 
> Academia Sinica), W3C XML Schema WG Formerly invited expert, 
> W3C I18n SIG Formerly invited expert, W3C XML WG (e.g. SIG)
> 
> Author, The XML & SGML Cookbook, Recipes for Structured 
> Information Management,
>    Charles Goldfarb series, Prentice Hall, 1998.
> 
> 
Received on Wednesday, 13 May 2009 16:01:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 6 December 2009 18:13:17 GMT