Re: Comment on XSD 1.1 from Rick Jelliffe on 2009-05-20 (www-tag@w3.org from May 2009)

From: Rick Jelliffe <rjelliffe@allette.com.au>
Date: Wed, 20 May 2009 17:03:13 +1000
To: www-tag@w3.org
CC: www-xml-schema-comments@w3.org
Message-ID: <4A13AB31.9010004@allette.com.au>
Pete Cordell wrote:
> I'm just trying to understand what you're proposing.  We come from 
> different worlds.  In my telecoms background, higher layers add 
> functionality based on the services of the lower layers.  Hence I 
> think we have a terminology mis-match.  Since even within the telecoms 
> world there are terminology confusions (and terms often vary according 
> to the context!) I don't think it's too surprising that in a 
> cross-displine discussion some terminology needs to be clarified.
Of course.
>
> In the terminology of my world I think what you're proposing is more 
> akin to defining profiles, each of which may, in a deviation from pure 
> profiling principles, add a small bit of additional functionality.
Lets flop to a vertical terminology then: if you imagine a continuum 
with RELAX NG at the left and XSD at the right.

Moving from the left to the right involves adding more apparatus for 
labelling productions (in terms of higher concepts such as simple types, 
complex types, groups and so on.)  While RELAX NG has the ability to 
express these kinds of groupings, it does not have a mechanism label 
them nor to make assertions about productions with different labels 
(e.g. type derivation by restriction checks): you can make a production 
that contains only the enumeration of text values, and use this 
production to constrain a particular value, but you never say "I am 
declaring a simple type."*  

(We may recall that creating XSD involved, to a large extent, an 
excercise of "reconstructing" DTD facilities: looking at what DTD 
parameter entities did, and giving these uses first class status such as 
complexType and attributeGroup. Converting from XSD largely involves 
throwing away this typing apparatus, while still validating the 
constraints in them.)

Moving from the right to the left involves increasing the power of the 
language, measured as the different class of documents that can be 
described and validated.  RELAX NG is more powerful. XSD has its UPA and 
name restrictions, while RELAX NG does not. Indeed, this relaxation of 
constraints is what gives RELAX NG its name, IIRC.

I contend that the move to the right is indeed merely an adding of 
apparatus, and therefore (could be implemented as) exactly what Peter 
would think of as a layer (as far as the grammars go.)  An XSD schema 
can be assembled, checked for UPA and non-RELAX NG schema constraints, 
then the document validated using that schema converted to RELAX NG 
(then addition non-RELAX NG document constraints checked, if any 
exist).  Take UPA violation: this is determined by examining the content 
model, not the document. So if the content model is tested to not 
violate UPA, the instance can be validated even by a parser that is not 
restricted to one-ambiguity.

Moving from the left to the right also involves an augmented set of 
standardized outcomes.

* Not to be confused with the RELAX NG datatype library mechanism.

So I think that XSD can be imagined as a "layer" above RELAX NG, in that

  1) It adds certain content model checks and adds on typing apparatus
  2) It restricts the power
  3) It adds more outcomes

This is not a layer in the way that HTTP is a layer on top of TCP/IP (or 
UDP/IP or whatever), but a layer in the sense that C++ is a layer above 
the C language.

>  
> I think a modular approach would be better.  
It certainly would be. Indeed, that is the premise of ISO DSDL: smaller 
targeted languages that can be combined and built upon. And, indeed, 
that is the premise of my suggestion: that full XSD and the databinding 
patterns profile can both be formulated in terms of layer/modules on top 
of RELAX NG.   But insisting on full modularity in XSD as a big bang 
looks like a fantasy at this time: it is XSD 2.0, while I am concerned 
with XSD 1.n Minimal.

I don't want to obscure my point: which is that we have two real 
terminal points (RELAX NG and XSD 1.0) and a concrete, objective hull 
and kernel of features in the middle (the Databinding patterns 
documents) with a set of features basically of DTDs plus XSD's built-in 
datatypes.   We have lots of evidence about which features of XSD are 
less common and which are more common, even outside the periphery.

We can use this objectively to say "What is the language created by the 
intersection of RELAX NG and XSD that also fits between the hull and 
kernel of the datatyping patterns?"  I think this is entirely pragmatic 
and workable, and would avoid the paralysis-by-analysis that has reigned 
so far, and the inability to justify anything apart from growth by the 
XSD WG. More to the point, it would provide a much more reasonable 
target for users and systems that only need a modest XSD, such as the 
databinding. (To the extent that a modular approach would open up to 
innovation and theoretizing and stray from these achievable goals, 
modularization fwould be a bad move.)

Just as WF XML is also conforming SGML (with only the slightest change 
to the SGML standard to accomodate it), I am wanting the stripped-down 
XSD to also work with full XSD systems with only perhaps the slightest 
change to XSD syntax if necessary. (And also to be convertable to RELAX NG.)

And this is why I think XSD 1.1 should be put on hold. Finding this core 
common language would not be a death march task: and it would be better 
for XSD 1.1's changes to be made in the context of any tweaks needed to 
support this core.

Cheers
Rick Jelliffe
Received on Wednesday, 20 May 2009 07:04:06 UTC