Suggestions for Schema 1.1

First of all ideally you would throw away XML Schema and use Relax NG instead.  I think this would be best for the industry since I believe that XML Schema is fatally flawed.  But since I gather you won't do this I would suggest the following:

1.  You need to provide an EBNF grammar for XML Schema.  You need to define what the EBNF grammar is (like all the IETF RFCs do) and you need to define the Schema using the grammar.  You also need to provide the grammar for a parser generator (lex/yacc, flex/bison, JavaCC, ANTLR take your pick although ideally you would provide them all) so that someone can create a parser easily.  I CANNOT STRESS THE IMPORTANCE OF THIS ENOUGH!

2.  Any part of XML Schema you cannot write an EBNF grammar for should be removed.  If you can write a grammar for it then it is certainly too unwieldy a construct to have in the specification.  Likewise remove any context sensitive grammar constructs.

3.  Remove 'all', it is redundant with the other complex types.

4.  Remove all inheritance constructs, for example base/restriction/extension.

5.  Remove all object oriented constructs: remember most of the code in the world is NOT written in an object oriented language.

6.  Either remove or greatly simplify and fix the name space handling.  Name spaces, despite the hype, are actually a very bad construct which greatly complicate the parsing of XML documents.  Also despite claims to the contrary programming APIs (i.e. DOM) don't handle name spaces well at all.  It has already been noted that a large amount of software uses the prefix instead of the URI and also that people attempt to 'dereference' the URIs an indication the a lot of people don't understand their usage.  XML Schema completely messes up name space processing what with include/import and target name space.  

To the greatest extent possible name spaces should be decoupled from XML Schema

7.  Split the specification into levels or parts where implementation conform to only certain parts.  One part should be a serialization schema which should provide lowest common denominator capability.  This would mean only the primitive types in C plus arrays and structures.  This would make serialization in things like SOAP a whole lot more sensible.  You might have a primitive type schema (int, float, double...) an extended type part (date, time..) an aggregate type part (array, struct...), a serialization schema (see above), a data base schema...  These schemas could be layered and build on top of one another this would allow schema to address appropriate markets and not try to solve all problems.

Matthew Jones

Received on Wednesday, 24 April 2002 13:25:40 UTC